Prove your tools work.
Run evidence-backed evaluations across PlatPhormNews sites, APIs, MCP tools, browser journeys, schemas, workflows, and releases. Evals turns discovery, tests, traces, screenshots, sandbox runs, and model grades into scorecards and release decisions.
PlatPhorm Evals tests every site, API, MCP tool, workflow, UI, schema, and release path in the PlatPhormNews network, then gives humans and agents public-safe evidence, scorecards, and release gates.
Evaluation Mesh and Release-Control Mesh
Evals is the canonical quality, regression, evidence, scorecard, release-control, tool-validation, MCP/API evaluation, BrowserOps journey evaluation, Spec contract validation, Sandbox execution verification, AgentUI render validation, Claws orchestration evaluation, and LLM-as-judge platform for PlatPhormNews.
Who uses this?
What gets evaluated?
Eval Suites
Persisted suites plus built-in Phase 2 templates; demo/test suites are excluded from active counts.
Checks well-known agent, AI, trust, and robots policies.
Validates MCP schema renderability in AgentUI with BrowserOps evidence when available.
Scores AgentUI schema rendering and form behavior from real evidence.
Validates AgentUI schema rendering and workflow handoff metadata.
Recent Runs
Latest persisted evaluation run results
| Run ID | Status | Score | Date |
|---|---|---|---|
| e654e0c7-614 | passed | 100.0% | 5/23/2026, 12:50:54 AM |
| 17fd51eb-ae3 | degraded | 84.0% | 5/22/2026, 9:44:29 PM |
| 8c1408b1-820 | degraded | 84.0% | 5/22/2026, 9:44:13 PM |
| 259bf792-2bd | degraded | 51.0% | 5/22/2026, 8:31:08 PM |
| e69f7a4f-920 | passed | 100.0% | 5/22/2026, 8:30:45 PM |
Network Coverage
213 persisted services merged with fallback targets
Actionable integration status
Cards show persisted live status when synced. Static fallback targets are labeled pending sync and do not count as passing provider evidence.
Known target from the public fallback registry; run protected registry sync to persist live status.
Guided launch
Run a first public-safe discovery, OpenAPI, MCP, AgentUI, workflow, or CLI registry eval.
Evidence objects
Inspect public-safe artifacts, trace links, empty states, and redaction boundaries.
platphormctl
Use the CLI harness for repeatable discovery, MCP, policy, and dry-run validation.