# PlatPhorm Evals > Registry-driven evaluation mesh for the PlatPhorm ecosystem. ## Product Role - Evals is the evaluation brain for PlatPhormNews. - Evals produces evidence-backed scorecards and release decisions. - It defines, discovers, runs, grades, compares, publishes, and gates evaluations across the network. - Integrations such as MCP, Claws, Spec, Sandbox, BrowserOps, AgentUI, Trace, Docs, Sheets, Catalog, and Monitor are evaluated by Evals; they are not Evals' product identity. ## Live Registry Counts - Services tracked: 217 - Active suites: 36 - Capabilities indexed: 948 - Eval runs today: 0 - Count source: merged - Database persistence: AWS_POSTGRES_* is primary; DATABASE_URL is only a lower-priority migration compatibility fallback. ## Public API - Health: GET https://evals.platphormnews.com/api/health - Versioned health: GET https://evals.platphormnews.com/api/v1/health - API docs: GET https://evals.platphormnews.com/api/docs - OpenAPI YAML: GET https://evals.platphormnews.com/openapi.yaml - OpenAPI JSON: GET https://evals.platphormnews.com/openapi.json - AsyncAPI YAML: GET https://evals.platphormnews.com/asyncapi.yaml - Web Status manifest: GET https://evals.platphormnews.com/.well-known/web.json - Web Status: GET https://evals.platphormnews.com/api/web/status - Web Status scorecard: GET https://evals.platphormnews.com/api/web/scorecard - Public Web Status fingerprints: GET https://evals.platphormnews.com/api/web/fingerprints - Legacy Web Status aliases: GET https://evals.platphormnews.com/.well-known/web4.json and /api/web4/status - Provenance lookup: GET https://evals.platphormnews.com/api/provenance/lookup - MCP descriptor: GET https://evals.platphormnews.com/api/mcp - Registry status: GET /api/v1/registry/status - Targets: GET /api/v1/targets - Capabilities: GET /api/v1/capabilities - Suites: GET /api/v1/suites - Runs: GET /api/v1/runs - Scorecards: GET /api/v1/scorecards - Confirmable scorecard artifacts: GET /api/evals/scorecards and GET /api/evals/scorecards/{id} - Scorecard confirmation: GET /api/evals/scorecards/{id}/confirm - Release gates: GET /api/v1/release-gates - Integrations: GET /api/v1/integrations/status - Agent policy: GET /api/v1/agent-policy - Canonical status: GET /api/evals/status - Database status: GET /api/evals/database-status - Suite registry: GET /api/evals/suite-registry ## Public Evaluation Endpoints - Discovery eval: POST /api/v1/evaluate/discovery - OpenAPI eval: POST /api/v1/evaluate/openapi - MCP eval: POST /api/v1/evaluate/mcp - CLI eval: POST /api/v1/evaluate/cli - Evals dry-run: POST /api/evals/dry-run - Scorecard artifact execution: POST /api/evals/score returns scorecardUrl, scorecardApiUrl, and confirmationUrl - Send handoff preview: POST /api/evals/send-handoff with dryRun=true - AgentUI, Claws, workflow, regression evals return honest degraded states unless a real provider run is configured. ## Protected Actions - Registry sync: POST /api/v1/registry/sync - Registry sync canonical: POST /api/evals/registry/sync - Suite create/update/delete: POST/PATCH/DELETE /api/v1/suites - Builder suite creation and optional run: POST /api/v1/builder/evals - Expensive BrowserOps, Sandbox, LLM judge, release gate, and report export actions require PLATPHORM_API_KEY. - Accepted auth: Authorization: Bearer $PLATPHORM_API_KEY or X-PlatPhorm-API-Key: $PLATPHORM_API_KEY. - Dry-run validates suites, targets, expected steps, evidence, and release-gate policy without persisting fake results. - Protected runs require PLATPHORM_API_KEY and may persist only when AWS Postgres writes succeed. ## AsyncAPI Events - eval.registry.synced - eval.run.queued - eval.run.started - eval.run.step.started - eval.run.step.completed - eval.evidence.created - eval.scorecard.created - eval.finding.created - eval.release_gate.decided - eval.handoff.created - eval.handoff.sent - eval.handoff.completed - eval.run.completed - eval.run.failed - eval.run.degraded ## Suite Integration Matrix - mcp: Evals scores tool registry, schemas, protected/public boundaries, resources, prompts, gateway policies, count drift, and capability graph. - agentui: Evals scores schema rendering, forms, workflow state, approvals, artifacts, delegations, and accessibility. - spec: Evals consumes Spec reports as contract evidence and scores release readiness from those reports. - sandbox: Evals scores Sandbox execution packets without claiming command execution itself. - browserops: Evals scores BrowserOps reports without becoming the browser runner. - trace: Evals scores trace context acceptance, propagation, timeline completeness, and redaction. - webhooks: Evals scores event delivery and replay evidence from WebhookLab. - monitor: Evals scores health and alert posture without replacing Monitor. - docs: Evals scores docs publication and can hand public reports to Docs. - sheets: Evals scores tabular exports, schemas, and matrix evidence from Sheets. - claws: Evals scores Claws plan quality, remediation evidence, and toolchain validation. - json: Evals scores JSON validation service behavior. - xml: Evals scores XML/RSS/sitemap validation behavior. - markdown: Evals scores Markdown report validation behavior. - fingerprint: Evals scores fingerprint privacy class and contract-anchor eligibility. - platphormctl: Evals scores CLI discovery, dry-run output, and reporting contracts. - root: Evals scores root graph, public manifest, and canonical discovery alignment. ## Public MCP Tools - get_eval_info - get_dashboard - get_registry_status - list_targets - get_target - list_capabilities - get_capability - list_suites - get_suite - list_runs - get_run - get_run_results - get_run_evidence - get_scorecard - list_templates - get_template - list_benchmarks - get_integration_status - evaluate_discovery - evaluate_openapi - evaluate_mcp - evaluate_agent_policy - get_agent_policy - list_agent_platforms - get_agent_platform - get_cli_examples - get_health - get_evals_status - get_evals_database_status - list_eval_suites - get_eval_suite - list_eval_cases - get_eval_case - list_public_eval_runs - get_public_eval_run - get_eval_scorecard - list_eval_findings - get_eval_finding - list_eval_benchmarks - get_eval_benchmark - get_evals_registry - get_evals_integration_status - get_evals_web4_manifest - get_evals_web4_status - get_evals_scorecard - list_evals_fingerprints - lookup_evals_provenance - verify_evals_provenance - dry_run_eval_suite - get_route_compliance - get_discovery_compliance - compare_eval_runs - detect_regressions ## Protected MCP Tools - sync_registry - sync_eval_registry - sync_network_registry - import_targets_from_mcp - import_targets_from_spec - create_eval_suite - update_eval_suite - create_eval_case - update_eval_case - create_suite - update_suite - delete_suite - generate_eval_cases - create_dataset - create_grader - run_eval_suite - publish_eval_scorecard - approve_release_gate - reject_release_gate - resolve_eval_finding - send_findings_to_claws - publish_eval_report_to_docs - export_eval_results_to_sheets - send_eval_trace_update - send_eval_handoff - rebuild_evals_fingerprints - create_evals_provenance - sign_evals_provenance - run_model_grade - run_suite - rerun_eval - cancel_eval - evaluate_browserops - evaluate_sandbox - evaluate_claws - evaluate_agentui - evaluate_workflow - evaluate_llm_judge - evaluate_cli - run_release_gate - generate_scorecard - gate_release - publish_scorecard - create_docs_report - create_sheet_report - create_deck_summary - update_agent_policy ## Templates - required-route-compliance: Evaluate all required public discovery and platform routes. - sitemap-no-dead-link-compliance: Validate sitemap URLs and ensure no protected mutation route is listed. - rss-feed-validity: Validate RSS and Atom public-safe feed output. - llms-readability: Check llms.txt, llms-full.txt, and llms-index.json readability and counts. - openapi-validity: Validate OpenAPI schema, auth, examples, and public/protected route clarity. - mcp-initialize-list-call: Validate public MCP JSON-RPC methods and protected rejection. - protected-action-rejection: Confirm protected operations require PLATPHORM_API_KEY. - agentui-form-render: Render MCP tool schemas and inspect required fields. - browserops-page-load: Run page-load and no-stuck-loading checks through BrowserOps. - sandbox-command-execution: Run a bounded command and validate real command output. - claws-tool-orchestration: Validate public Claws tool registry and orchestration metadata. - docs-publish: Publish a public-safe evaluation methodology or report to Docs. - sheets-export: Export run matrices and scorecards to Sheets. - opencontent-ingest: Validate OpenContent public submit and docs export workflow. - dictionary-search-submit: Check dictionary search and protected submission boundary. - emoji-score-proposal: Check emoji scoring and protected proposal flow. - reader-translation: Check public reader translation degraded states and protected actions. - echo-content-monitor: Check content monitor public status and alert delivery boundaries. - calendar-kanban-task-flow: Evaluate task handoff across calendar and kanban surfaces. - platphorm-cli-harness-run: Validate CLI-generated evidence against Evals scorecard requirements. - x-vercel-ja4-digest-redaction: Ensure fingerprint-adjacent request metadata is hashed or redacted before public display. - browserops-link-check: Traverse public links through BrowserOps when a real provider run is configured. - browserops-run-api-protected-post-check: Confirm BrowserOps run triggers reject unauthenticated protected POSTs. - api-catalog-contract-check: Validate API Catalog OpenAPI entries, auth naming, and response envelope shape. - mcp-gateway-truth-model-check: Compare MCP Hub registry state with public JSON-RPC introspection. - agentui-phorm-to-workflow-check: Validate AgentUI Phorm rendering, submission state, and workflow handoff metadata. - msi-static-check: Validate MSI static checks and JSON/XML export boundaries when that target is present. - desa-script-health: Validate DESA script health, bounded execution notes, and remediation-card handoff. - json-xml-markdown-report: Validate generated artifacts across JSON, XML, and Markdown validators. - podcasts-feed-ingest: Validate podcast feed ingestion, Reader summary handoff, and Docs export boundaries. - reader-summary-docs-export: Validate Reader summaries and Docs export as public-safe evidence. - claws-remediation-card: Validate Claws orchestration metadata and Kanban remediation-card creation boundary. - release-gate-decision: Build a release decision from required suites, blockers, evidence links, and trace links. ## Benchmarks - platform-contract-baseline: Required public route, discovery, auth, and policy checks for every trusted site. - mcp-tool-quality: JSON-RPC correctness, schema validity, public-safe introspection, and protected action rejection. - release-control: Scorecard and gate criteria for release promotion decisions. - browserops-journey-evidence: Page-load, link, form, screenshot, and accessibility evidence when BrowserOps provider runs are available. - tool-workflow-chain: Cross-service workflow evidence from MCP through Spec, Sandbox, Evals, Docs, Sheets, or Decks. ## platphormctl Examples - platphormctl site inspect evals: Inspect Evals public route, policy, discovery, and health surfaces. - platphormctl mcp validate evals: Validate Evals MCP JSON-RPC introspection and tool schema metadata. - platphormctl policy inspect evals: Inspect Evals agent, AI, trust, security, and robots policies. - platphormctl evals list: List public-safe Evals suites, templates, gates, and recent run summaries. - platphormctl evals run-site mcp: Run a site-level evaluation plan for the MCP Hub target. - platphormctl evals run-mcp mcp: Run public-safe MCP introspection checks for MCP Hub. - platphormctl evals grade-tool mcp get_health: Grade the MCP get_health tool using deterministic output checks where possible. - platphormctl harness run discovery-full --trace: Run the full discovery harness with trace propagation. - platphormctl harness run developer-validation --target https://evals.platphormnews.com --dry-run: Preview developer validation without protected execution. (dry-run) - platphormctl harness run spec-evals-browserops-loop --dry-run: Preview the Spec to Evals to BrowserOps loop without claiming provider evidence. (dry-run) ## Public Pages In Sitemap - / - /dashboard - /launch - /registry - /targets - /capabilities - /suites - /runs - /evidence - /scorecards - /release-gates - /findings - /reports - /benchmarks - /templates - /builder/evals/new - /integrations - /methodology - /clients/cli - /clients/platphormctl - /docs - /faq - /privacy - /terms - /security - /data - /agents - /trust - /web4 - /web/status - /web4/status - /web4/evidence - /web4/fingerprints - /web4/provenance - /api/docs - /openapi.yaml - /openapi.json - /asyncapi.yaml - /llms.txt - /llms-full.txt - /llms-index.json - /robots.txt - /.well-known/agents.json - /.well-known/agent-policy.json - /.well-known/ai-policy.json - /.well-known/trust.json - /.well-known/web.json - /.well-known/web4.json - /.well-known/provenance.json - /.well-known/security.txt - /.well-known/mcp.json ## Evidence Rules - Deterministic checks take precedence over model-assisted grades. - Model-assisted grades must be labeled with prompt, version, and model when configured. - BrowserOps screenshots, Sandbox output, Claws workflow results, and release approvals are never fabricated. - Raw x-vercel-ja4-digest is fingerprint-adjacent metadata and is redacted or hashed before public display. - Visitor, browser, device, behavioral, raw request-header, raw JA4, raw x-vercel-ja4-digest, session, private logs, private artifacts, protected run payloads, provider tokens, and private model-grade prompts are never public provenance or contract-anchor eligible. ## Network integrations - platphormnews.com/api/network/graph - mcp.platphormnews.com - claws.platphormnews.com - spec.platphormnews.com - trace.platphormnews.com - docs.platphormnews.com - browserops.platphormnews.com - agentui.platphormnews.com - sandbox.platphormnews.com - webhooklab.platphormnews.com ## Standards - Manifest: https://evals.platphormnews.com/manifest.webmanifest - MCP well-known: https://evals.platphormnews.com/.well-known/mcp.json - Agent well-known: https://evals.platphormnews.com/.well-known/agents.json - Agent policy: https://evals.platphormnews.com/.well-known/agent-policy.json - AI policy: https://evals.platphormnews.com/.well-known/ai-policy.json - Security: https://evals.platphormnews.com/.well-known/security.txt