PlatPhorm Evals · Evidence-Backed QA

Prove your tools work.

Run evidence-backed evaluations across PlatPhormNews sites, APIs, MCP tools, browser journeys, schemas, workflows, and releases. Evals turns discovery, tests, traces, screenshots, sandbox runs, and model grades into scorecards and release decisions.

PlatPhorm Evals tests every site, API, MCP tool, workflow, UI, schema, and release path in the PlatPhormNews network, then gives humans and agents public-safe evidence, scorecards, and release gates.

Evaluate a Service Run the AgentUI demo View Scorecards API Docs MCP Docs

Source: 3 persisted, 44 built in

Capabilities Indexed

active

1010

Source: 219 persisted, 791 expected

Eval Runs Today

degraded

Source: database

Current readiness state

Ready for public read and dry-run

Registry, discovery, scorecard, Web Status, and public dry-run surfaces are source-labeled and available.

Web Status Web Status API

Latest real evidence

degraded

Run

709872b4-f606-4d21-a74c-dd6642b9af43

Suite

26f6658d-38d9-4602-82f1-bdad119b060a

Score

Latest degraded run: 709872b4-f606-4d21-a74c-dd6642b9af43. Review what failed before treating the target as release-ready.

What Evals Does

Evaluation Mesh and Release-Control Mesh

Evals is the canonical quality, regression, evidence, scorecard, release-control, tool-validation, MCP/API evaluation, BrowserOps journey evaluation, Spec contract validation, Sandbox execution verification, AgentUI render validation, Claws orchestration evaluation, and LLM-as-judge platform for PlatPhormNews.

Step 1

Discover targets

Step 2

Generate suites

Step 3

Run checks

Step 4

Capture evidence

Step 5

Score results

Step 6

Gate releases

Step 7

Publish reports

What this service owns

scorecards

evidence grading

release gates

eval suites

findings

regression comparisons

public-safe readiness signals

confirmation URLs for Evals artifacts

What this service does not own

BrowserOps screenshots

Spec contract authoring

MCP registry mutation

Sandbox execution

Docs publishing

Sheets exports

Trace storage

AgentUI workflow orchestration

Who uses this?

humans reviewing releases

agents validating tool chains

developers testing APIs

operators checking service health

CI jobs blocking regressions

MCP clients validating tools

BrowserOps validating UI

Sandbox validating execution

Spec validating contracts

What gets evaluated?

APIs

MCP tools

OpenAPI schemas

AgentUI forms

BrowserOps journeys

Sandbox commands

Claws workflows

discovery files

policies

traces

RSS/sitemaps

route health

release readiness

Eval Suites

Persisted suites plus built-in Phase 2 templates; demo/test suites are excluded from active counts.

View all suites

Agent Policy Validation

agent-policy-validation

active

Checks well-known agent, AI, trust, and robots policies.

Validates MCP schema renderability in AgentUI with BrowserOps evidence when available.

Source: built_in

AgentUI Form Render Validation

agentui-form-render-validation

degraded

Scores AgentUI schema rendering and form behavior from real evidence.

Source: built_in

AgentUI Phorm-to-Workflow Check

agentui-phorm-to-workflow-check

degraded

Validates AgentUI schema rendering and workflow handoff metadata.

Source: built_in

Recent Runs

Latest persisted evaluation run results

View all runs

Run ID	Status	Score	Date
709872b4-f60	degraded	74.0%	5/25/2026, 6:32:53 PM
e654e0c7-614	passed	100.0%	5/23/2026, 12:50:54 AM
17fd51eb-ae3	degraded	84.0%	5/22/2026, 9:44:29 PM
8c1408b1-820	degraded	84.0%	5/22/2026, 9:44:13 PM
259bf792-2bd	degraded	51.0%	5/22/2026, 8:31:08 PM