# PlatPhorm Evals

> Registry-driven evaluation mesh for the PlatPhorm ecosystem.

## Product Role
- Evals is the evaluation brain for PlatPhormNews.
- Evals produces evidence-backed scorecards and release decisions.
- It defines, discovers, runs, grades, compares, publishes, and gates evaluations across the network.
- Integrations such as MCP, Claws, Spec, Sandbox, BrowserOps, AgentUI, Trace, Docs, Sheets, Catalog, and Monitor are evaluated by Evals; they are not Evals' product identity.

## Live Registry Counts
- Services tracked: 217
- Active suites: 36
- Capabilities indexed: 948
- Eval runs today: 0
- Count source: merged
- Database persistence: AWS_POSTGRES_* is primary; DATABASE_URL is only a lower-priority migration compatibility fallback.

## Public API
- Health: GET https://evals.platphormnews.com/api/health
- Versioned health: GET https://evals.platphormnews.com/api/v1/health
- API docs: GET https://evals.platphormnews.com/api/docs
- OpenAPI YAML: GET https://evals.platphormnews.com/openapi.yaml
- OpenAPI JSON: GET https://evals.platphormnews.com/openapi.json
- AsyncAPI YAML: GET https://evals.platphormnews.com/asyncapi.yaml
- Web Status manifest: GET https://evals.platphormnews.com/.well-known/web.json
- Web Status: GET https://evals.platphormnews.com/api/web/status
- Web Status scorecard: GET https://evals.platphormnews.com/api/web/scorecard
- Public Web Status fingerprints: GET https://evals.platphormnews.com/api/web/fingerprints
- Legacy Web Status aliases: GET https://evals.platphormnews.com/.well-known/web4.json and /api/web4/status
- Provenance lookup: GET https://evals.platphormnews.com/api/provenance/lookup
- MCP descriptor: GET https://evals.platphormnews.com/api/mcp
- Registry status: GET /api/v1/registry/status
- Targets: GET /api/v1/targets
- Capabilities: GET /api/v1/capabilities
- Suites: GET /api/v1/suites
- Runs: GET /api/v1/runs
- Scorecards: GET /api/v1/scorecards
- Confirmable scorecard artifacts: GET /api/evals/scorecards and GET /api/evals/scorecards/{id}
- Scorecard confirmation: GET /api/evals/scorecards/{id}/confirm
- Release gates: GET /api/v1/release-gates
- Integrations: GET /api/v1/integrations/status
- Agent policy: GET /api/v1/agent-policy
- Canonical status: GET /api/evals/status
- Database status: GET /api/evals/database-status
- Suite registry: GET /api/evals/suite-registry

## Public Evaluation Endpoints
- Discovery eval: POST /api/v1/evaluate/discovery
- OpenAPI eval: POST /api/v1/evaluate/openapi
- MCP eval: POST /api/v1/evaluate/mcp
- CLI eval: POST /api/v1/evaluate/cli
- Evals dry-run: POST /api/evals/dry-run
- Scorecard artifact execution: POST /api/evals/score returns scorecardUrl, scorecardApiUrl, and confirmationUrl
- Send handoff preview: POST /api/evals/send-handoff with dryRun=true
- AgentUI, Claws, workflow, regression evals return honest degraded states unless a real provider run is configured.

## Protected Actions
- Registry sync: POST /api/v1/registry/sync
- Registry sync canonical: POST /api/evals/registry/sync
- Suite create/update/delete: POST/PATCH/DELETE /api/v1/suites
- Builder suite creation and optional run: POST /api/v1/builder/evals
- Expensive BrowserOps, Sandbox, LLM judge, release gate, and report export actions require PLATPHORM_API_KEY.
- Accepted auth: Authorization: Bearer $PLATPHORM_API_KEY or X-PlatPhorm-API-Key: $PLATPHORM_API_KEY.
- Dry-run validates suites, targets, expected steps, evidence, and release-gate policy without persisting fake results.
- Protected runs require PLATPHORM_API_KEY and may persist only when AWS Postgres writes succeed.

## AsyncAPI Events
- eval.registry.synced
- eval.run.queued
- eval.run.started
- eval.run.step.started
- eval.run.step.completed
- eval.evidence.created
- eval.scorecard.created
- eval.finding.created
- eval.release_gate.decided
- eval.handoff.created
- eval.handoff.sent
- eval.handoff.completed
- eval.run.completed
- eval.run.failed
- eval.run.degraded

## Suite Integration Matrix
- mcp: Evals scores tool registry, schemas, protected/public boundaries, resources, prompts, gateway policies, count drift, and capability graph.
- agentui: Evals scores schema rendering, forms, workflow state, approvals, artifacts, delegations, and accessibility.
- spec: Evals consumes Spec reports as contract evidence and scores release readiness from those reports.
- sandbox: Evals scores Sandbox execution packets without claiming command execution itself.
- browserops: Evals scores BrowserOps reports without becoming the browser runner.
- trace: Evals scores trace context acceptance, propagation, timeline completeness, and redaction.
- webhooks: Evals scores event delivery and replay evidence from WebhookLab.
- monitor: Evals scores health and alert posture without replacing Monitor.
- docs: Evals scores docs publication and can hand public reports to Docs.
- sheets: Evals scores tabular exports, schemas, and matrix evidence from Sheets.
- claws: Evals scores Claws plan quality, remediation evidence, and toolchain validation.
- json: Evals scores JSON validation service behavior.
- xml: Evals scores XML/RSS/sitemap validation behavior.
- markdown: Evals scores Markdown report validation behavior.
- fingerprint: Evals scores fingerprint privacy class and contract-anchor eligibility.
- platphormctl: Evals scores CLI discovery, dry-run output, and reporting contracts.
- root: Evals scores root graph, public manifest, and canonical discovery alignment.

## Public MCP Tools
- get_eval_info
- get_dashboard
- get_registry_status
- list_targets
- get_target
- list_capabilities
- get_capability
- list_suites
- get_suite
- list_runs
- get_run
- get_run_results
- get_run_evidence
- get_scorecard
- list_templates
- get_template
- list_benchmarks
- get_integration_status
- evaluate_discovery
- evaluate_openapi
- evaluate_mcp
- evaluate_agent_policy
- get_agent_policy
- list_agent_platforms
- get_agent_platform
- get_cli_examples
- get_health
- get_evals_status
- get_evals_database_status
- list_eval_suites
- get_eval_suite
- list_eval_cases
- get_eval_case
- list_public_eval_runs
- get_public_eval_run
- get_eval_scorecard
- list_eval_findings
- get_eval_finding
- list_eval_benchmarks
- get_eval_benchmark
- get_evals_registry
- get_evals_integration_status
- get_evals_web4_manifest
- get_evals_web4_status
- get_evals_scorecard
- list_evals_fingerprints
- lookup_evals_provenance
- verify_evals_provenance
- dry_run_eval_suite
- get_route_compliance
- get_discovery_compliance
- compare_eval_runs
- detect_regressions

## Protected MCP Tools
- sync_registry
- sync_eval_registry
- sync_network_registry
- import_targets_from_mcp
- import_targets_from_spec
- create_eval_suite
- update_eval_suite
- create_eval_case
- update_eval_case
- create_suite
- update_suite
- delete_suite
- generate_eval_cases
- create_dataset
- create_grader
- run_eval_suite
- publish_eval_scorecard
- approve_release_gate
- reject_release_gate
- resolve_eval_finding
- send_findings_to_claws
- publish_eval_report_to_docs
- export_eval_results_to_sheets
- send_eval_trace_update
- send_eval_handoff
- rebuild_evals_fingerprints
- create_evals_provenance
- sign_evals_provenance
- run_model_grade
- run_suite
- rerun_eval
- cancel_eval
- evaluate_browserops
- evaluate_sandbox
- evaluate_claws
- evaluate_agentui
- evaluate_workflow
- evaluate_llm_judge
- evaluate_cli
- run_release_gate
- generate_scorecard
- gate_release
- publish_scorecard
- create_docs_report
- create_sheet_report
- create_deck_summary
- update_agent_policy

## Templates
- required-route-compliance: Evaluate all required public discovery and platform routes.
- sitemap-no-dead-link-compliance: Validate sitemap URLs and ensure no protected mutation route is listed.
- rss-feed-validity: Validate RSS and Atom public-safe feed output.
- llms-readability: Check llms.txt, llms-full.txt, and llms-index.json readability and counts.
- openapi-validity: Validate OpenAPI schema, auth, examples, and public/protected route clarity.
- mcp-initialize-list-call: Validate public MCP JSON-RPC methods and protected rejection.
- protected-action-rejection: Confirm protected operations require PLATPHORM_API_KEY.
- agentui-form-render: Render MCP tool schemas and inspect required fields.
- browserops-page-load: Run page-load and no-stuck-loading checks through BrowserOps.
- sandbox-command-execution: Run a bounded command and validate real command output.
- claws-tool-orchestration: Validate public Claws tool registry and orchestration metadata.
- docs-publish: Publish a public-safe evaluation methodology or report to Docs.
- sheets-export: Export run matrices and scorecards to Sheets.
- opencontent-ingest: Validate OpenContent public submit and docs export workflow.
- dictionary-search-submit: Check dictionary search and protected submission boundary.
- emoji-score-proposal: Check emoji scoring and protected proposal flow.
- reader-translation: Check public reader translation degraded states and protected actions.
- echo-content-monitor: Check content monitor public status and alert delivery boundaries.
- calendar-kanban-task-flow: Evaluate task handoff across calendar and kanban surfaces.
- platphorm-cli-harness-run: Validate CLI-generated evidence against Evals scorecard requirements.
- x-vercel-ja4-digest-redaction: Ensure fingerprint-adjacent request metadata is hashed or redacted before public display.
- browserops-link-check: Traverse public links through BrowserOps when a real provider run is configured.
- browserops-run-api-protected-post-check: Confirm BrowserOps run triggers reject unauthenticated protected POSTs.
- api-catalog-contract-check: Validate API Catalog OpenAPI entries, auth naming, and response envelope shape.
- mcp-gateway-truth-model-check: Compare MCP Hub registry state with public JSON-RPC introspection.
- agentui-phorm-to-workflow-check: Validate AgentUI Phorm rendering, submission state, and workflow handoff metadata.
- msi-static-check: Validate MSI static checks and JSON/XML export boundaries when that target is present.
- desa-script-health: Validate DESA script health, bounded execution notes, and remediation-card handoff.
- json-xml-markdown-report: Validate generated artifacts across JSON, XML, and Markdown validators.
- podcasts-feed-ingest: Validate podcast feed ingestion, Reader summary handoff, and Docs export boundaries.
- reader-summary-docs-export: Validate Reader summaries and Docs export as public-safe evidence.
- claws-remediation-card: Validate Claws orchestration metadata and Kanban remediation-card creation boundary.
- release-gate-decision: Build a release decision from required suites, blockers, evidence links, and trace links.

## Benchmarks
- platform-contract-baseline: Required public route, discovery, auth, and policy checks for every trusted site.
- mcp-tool-quality: JSON-RPC correctness, schema validity, public-safe introspection, and protected action rejection.
- release-control: Scorecard and gate criteria for release promotion decisions.
- browserops-journey-evidence: Page-load, link, form, screenshot, and accessibility evidence when BrowserOps provider runs are available.
- tool-workflow-chain: Cross-service workflow evidence from MCP through Spec, Sandbox, Evals, Docs, Sheets, or Decks.

## platphormctl Examples
- platphormctl site inspect evals: Inspect Evals public route, policy, discovery, and health surfaces.
- platphormctl mcp validate evals: Validate Evals MCP JSON-RPC introspection and tool schema metadata.
- platphormctl policy inspect evals: Inspect Evals agent, AI, trust, security, and robots policies.
- platphormctl evals list: List public-safe Evals suites, templates, gates, and recent run summaries.
- platphormctl evals run-site mcp: Run a site-level evaluation plan for the MCP Hub target.
- platphormctl evals run-mcp mcp: Run public-safe MCP introspection checks for MCP Hub.
- platphormctl evals grade-tool mcp get_health: Grade the MCP get_health tool using deterministic output checks where possible.
- platphormctl harness run discovery-full --trace: Run the full discovery harness with trace propagation.
- platphormctl harness run developer-validation --target https://evals.platphormnews.com --dry-run: Preview developer validation without protected execution. (dry-run)
- platphormctl harness run spec-evals-browserops-loop --dry-run: Preview the Spec to Evals to BrowserOps loop without claiming provider evidence. (dry-run)

## Public Pages In Sitemap
- /
- /dashboard
- /launch
- /registry
- /targets
- /capabilities
- /suites
- /runs
- /evidence
- /scorecards
- /release-gates
- /findings
- /reports
- /benchmarks
- /templates
- /builder/evals/new
- /integrations
- /methodology
- /clients/cli
- /clients/platphormctl
- /docs
- /faq
- /privacy
- /terms
- /security
- /data
- /agents
- /trust
- /web4
- /web/status
- /web4/status
- /web4/evidence
- /web4/fingerprints
- /web4/provenance
- /api/docs
- /openapi.yaml
- /openapi.json
- /asyncapi.yaml
- /llms.txt
- /llms-full.txt
- /llms-index.json
- /robots.txt
- /.well-known/agents.json
- /.well-known/agent-policy.json
- /.well-known/ai-policy.json
- /.well-known/trust.json
- /.well-known/web.json
- /.well-known/web4.json
- /.well-known/provenance.json
- /.well-known/security.txt
- /.well-known/mcp.json

## Evidence Rules
- Deterministic checks take precedence over model-assisted grades.
- Model-assisted grades must be labeled with prompt, version, and model when configured.
- BrowserOps screenshots, Sandbox output, Claws workflow results, and release approvals are never fabricated.
- Raw x-vercel-ja4-digest is fingerprint-adjacent metadata and is redacted or hashed before public display.
- Visitor, browser, device, behavioral, raw request-header, raw JA4, raw x-vercel-ja4-digest, session, private logs, private artifacts, protected run payloads, provider tokens, and private model-grade prompts are never public provenance or contract-anchor eligible.

## Network integrations
- platphormnews.com/api/network/graph
- mcp.platphormnews.com
- claws.platphormnews.com
- spec.platphormnews.com
- trace.platphormnews.com
- docs.platphormnews.com
- browserops.platphormnews.com
- agentui.platphormnews.com
- sandbox.platphormnews.com
- webhooklab.platphormnews.com

## Standards
- Manifest: https://evals.platphormnews.com/manifest.webmanifest
- MCP well-known: https://evals.platphormnews.com/.well-known/mcp.json
- Agent well-known: https://evals.platphormnews.com/.well-known/agents.json
- Agent policy: https://evals.platphormnews.com/.well-known/agent-policy.json
- AI policy: https://evals.platphormnews.com/.well-known/ai-policy.json
- Security: https://evals.platphormnews.com/.well-known/security.txt