Enterprise vendor methodology — v1.1

How enterprise vendors are scored.

Enterprise AI platforms cannot be scored on the indie agent rubric — we don’t have telemetry access and they’re closed-source. So we score them via a Public- Evidence Proxy methodology: a structured assessment of signals that are publicly verifiable.

Comparability disclaimer

Incumbent scores are NOT directly comparable to indie agent scores.

Indie agent scores are derived from live production API telemetry or hands-on functional evaluation. Incumbent scores are derived from public-evidence proxies — information the vendor has chosen to publish. A vendor that scores 72 under this addendum and an indie agent that scores 72 under the standard methodology have 72 in common in name only. The evidence bases are structurally different.

The four proxy pillars

Each pillar maps to one of the four dimensions used in the standard methodology but substitutes public-evidence signals for live telemetry. Sub-components are scored mechanically against the rubric below; the full scorer code lives at lib/incumbent/canonical-scorer.ts.

Reliability

cap 80 pts

What the proxy measures: published commitments and public incident infrastructure. Not the same as live-telemetry reliability — a 99.9% SLA is a contractual obligation, not a verified record. The 80-point cap reflects this gap.

Evidence	Points
Published SLA ≥ 99.9% uptimeDocumented on the vendor website. Mutually exclusive with lower SLA tiers.	30
Published SLA 99.5%–99.9%	20
Published SLA (any uptime, any tier)	10
Public status page with ≥90 days historical dataMutually exclusive with the no-history variant below.	25
Public status page (current state only, no history)	15
Documented incident history (≤2 P1 in trailing 12 mo)Mutually exclusive with the higher-incident variant.	20
Documented incident history (3–5 P1 in trailing 12 mo)	10
Public changelog updated ≥ quarterly	10
Published reliability case study	5

Latency

cap 65 pts

What the proxy measures: published performance commitments and independent benchmarks. The 65-point cap reflects that latency without production measurement is at best a marketing claim. Mandatory floor: if no public latency evidence exists, the pillar is 0 / Insufficient Evidence.

Evidence	Points
Vendor publishes response-time benchmarks or SLA	30
Independent third-party benchmark (analyst, conference, academic)	20
Streaming output documented in dev docs	15
Edge deployment / CDN / multi-region documented	15
Response-time SLA in MSA / contract appendix	20

Cost efficiency

cap 80 pts

What the proxy measures: pricing transparency and ability to estimate unit economics from public data. The 80-point cap reflects that public pricing pages don't always reveal volume discounts or enterprise-tier custom contracts.

Evidence	Points
Pricing publicly listed on vendor websiteAny tier visible without sales contact.	30
Per-action / per-resolution / per-credit pricingUnit-based pricing model. Mutually exclusive with per-seat-only.	20
Pricing detail sufficient to estimate 100-resolution cost	25
Free tier or self-serve trial (no sales contact required)	10
ROI calculator or third-party cost case study	10
Per-seat-only pricing (published but no unit economics)Mutually exclusive with per-action.	5

Security posture

cap 100 pts

What the proxy measures: published compliance certifications and vulnerability disclosure practices. No ceiling — compliance certifications are externally audited, not self-reported. SOC 2 + ISO 27001 + GDPR + HIPAA + trust center + bug bounty + standalone pentest = 100 pts is honestly earned.

Evidence	Points
SOC 2 Type II current (within last 12 months)	30
ISO 27001 current (within last 24 months)	20
GDPR via published Data Processing Agreement	10
HIPAA via available Business Associate Agreement	10
Public trust center / security white paper	10
Bug bounty / VDP (HackerOne, Bugcrowd, self-hosted)	10
Standalone third-party pentest attestation (named auditor)STRICT reading: SOC 2 embedded pentest scope does NOT qualify. Only standalone third-party attestations count. CNA / CVE Numbering Authority status alone does not qualify for the bug bounty / VDP sub-component.	10

Composite score formula

composite = ROUND((R + L + C + S) / 325 * 79)

where:
  R ≤ 80   (Reliability raw)
  L ≤ 65   (Latency raw)
  C ≤ 80   (Cost efficiency raw)
  S ≤ 100  (Security posture raw)

Sum-of-maxes = 325. Composite ceiling = 79 (never 100).

The 79 ceiling reflects that proxy scores cap below indie scores by design — a public-evidence proxy is structurally less meaningful than live telemetry, and the methodology is honest about that gap.

Insufficient Evidence treatment

When a pillar has no public evidence to assess (e.g., no public pricing page, no published latency benchmark), it’s rendered as “Insufficient Evidence” on the profile page rather than as a 0. The composite is then computed only over the pillars with evidence. “No data” is not the same as “no problem” — we surface the gap honestly.

Vendor process

Before publishing an incumbent profile, BenchLytix sends a methodology disclosure email to the vendor’s security@ or legal@ address. The email includes the score, the per-pillar evidence URLs, and a 14-day window for vendor response before the page becomes indexable.

Vendor corrections (added certification, updated SLA, new pricing transparency) are accepted and re-scored on a quarterly cadence or on material evidence change.

Known biases

Marketing-budget bias. Vendors with larger marketing teams publish more case studies, ROI calculators, and trust-center content — raising their proxy score independent of their actual product quality. Mitigation: per-pillar caps prevent gaming, but the bias is real and acknowledged.
Evidence currency risk. Citations in the evidence files may go stale (vendor removes SOC 2 page, certification expires). Mitigation: quarterly re-validation per the runbook.
No functional assessment. Proxy scoring measures “Is the vendor publicly accountable?” not “Does the agent work well?” A high proxy score with low buyer satisfaction is possible.

Reproducibility

Every per-pillar score on every incumbent profile page links back to the public vendor URL it was derived from. The structured evidence is also exposed as JSON-LD on each profile (@graph with Product + Dataset) so AI crawlers can cite the methodology + sources alongside the score.

For the indie agent rubric (different methodology, different ceiling), see the scoring methodology page.