PlatPhorm Evals · Evidence-Backed QA

Prove your tools work.

Run evidence-backed evaluations across PlatPhormNews sites, APIs, MCP tools, browser journeys, schemas, workflows, and releases. Evals turns discovery, tests, traces, screenshots, sandbox runs, and model grades into scorecards and release decisions.

PlatPhorm Evals tests every site, API, MCP tool, workflow, UI, schema, and release path in the PlatPhormNews network, then gives humans and agents public-safe evidence, scorecards, and release gates.

What Evals Does

Evaluation Mesh and Release-Control Mesh

Evals is the canonical quality, regression, evidence, scorecard, release-control, tool-validation, MCP/API evaluation, BrowserOps journey evaluation, Spec contract validation, Sandbox execution verification, AgentUI render validation, Claws orchestration evaluation, and LLM-as-judge platform for PlatPhormNews.

Step 1
Discover targets
Step 2
Generate suites
Step 3
Run checks
Step 4
Capture evidence
Step 5
Score results
Step 6
Gate releases
Step 7
Publish reports

Who uses this?

humans reviewing releases
agents validating tool chains
developers testing APIs
operators checking service health
CI jobs blocking regressions
MCP clients validating tools
BrowserOps validating UI
Sandbox validating execution
Spec validating contracts

What gets evaluated?

APIs
MCP tools
OpenAPI schemas
AgentUI forms
BrowserOps journeys
Sandbox commands
Claws workflows
discovery files
policies
traces
RSS/sitemaps
route health
release readiness

Recent Runs

Latest persisted evaluation run results

View all runs
Run IDStatusScoreDate
e654e0c7-614passed100.0%5/23/2026, 12:50:54 AM
17fd51eb-ae3degraded84.0%5/22/2026, 9:44:29 PM
8c1408b1-820degraded84.0%5/22/2026, 9:44:13 PM
259bf792-2bddegraded51.0%5/22/2026, 8:31:08 PM
e69f7a4f-920passed100.0%5/22/2026, 8:30:45 PM

Network Coverage

213 persisted services merged with fallback targets

View full registry
degraded
Avg Coverage
217
Services
843
Capabilities
merged
Source
Network Integrations

Actionable integration status

Cards show persisted live status when synced. Static fallback targets are labeled pending sync and do not count as passing provider evidence.

View full matrix
unavailable
Capabilities 21
Recent evals 9
Last checked 5/22/2026
Trace Open
pending_sync
Capabilities 20
Recent evals 0
Last checked pending sync
Trace Open

Known target from the public fallback registry; run protected registry sync to persist live status.

indexed
Capabilities 20
Recent evals 1
Last checked 5/22/2026
Trace Open
Capabilities 185
Recent evals 0
Last checked 5/22/2026
Trace Open
Capabilities 21
Recent evals 3
Last checked 5/22/2026
Trace Open
Capabilities 27
Recent evals 0
Last checked 5/22/2026
Trace Open
Capabilities 20
Recent evals 0
Last checked 5/22/2026
Trace Open
Capabilities 20
Recent evals 5
Last checked 5/22/2026
Trace Open
Capabilities 20
Recent evals 7
Last checked 5/22/2026
Trace Open
Capabilities 20
Recent evals 1
Last checked 5/22/2026
Trace Open
Capabilities 20
Recent evals 13
Last checked 5/22/2026
Trace Open
Capabilities 20
Recent evals 0
Last checked 5/22/2026
Trace Open

Guided launch

Run a first public-safe discovery, OpenAPI, MCP, AgentUI, workflow, or CLI registry eval.

Evidence objects

Inspect public-safe artifacts, trace links, empty states, and redaction boundaries.

platphormctl

Use the CLI harness for repeatable discovery, MCP, policy, and dry-run validation.