Methodology & Data Sources

Last updated: June 2026

What this page is for. CareCost Explorer turns the raw price-transparency files that insurers are legally required to publish into figures a non-technical professional can actually read. This page explains, in plain language, what that underlying data is, where it comes from, how we compute the numbers you see, how often it refreshes, and — just as important — what it cannot tell you.

1. What the data is

The federal Transparency in Coverage Rule, in effect since July 2022, requires nearly every health insurer and group health plan to publish the rates they have negotiated with in-network providers. These are released as Machine-Readable Files (MRFs) — enormous JSON files, often hundreds of gigabytes each, that list a negotiated dollar amount for a billing code at a specific provider. For the first time, the prices that were historically hidden inside private payer–provider contracts are a matter of public record.

CareCost Explorer ingests these files at scale. Our current corpus covers more than 36 million negotiated rates spanning 18 payers across 56 states and territories. The data is the insurers’ own published figures — we do not invent, model, or estimate prices. Our work is to collect, normalize, and make the numbers queryable.

2. Where it comes from

Every figure traces back to a payer-published MRF. Insurers post a table-of-contents index that points to the underlying in-network rate files; we crawl those indexes, download the referenced files, and parse them. Because each payer structures and names its files differently, ingestion is payer-specific: we map each insurer’s format onto a common schema of payer, provider, billing code, negotiated rate, rate type, and geography.

3. How the figures are computed

A single billing code at a single payer can have thousands of negotiated rates — one per contracted provider, sometimes more once contract variants are counted. Showing a raw list is useless to most users, so we summarize each group of rates into a small set of statistics.

3.1 Central tendency and spread

For each (payer, code, geography) grouping we report the medianas the headline figure — it is far more robust to outliers than the mean, and negotiated-rate distributions are routinely skewed by a handful of extreme contract values. To show the spread, we also compute the 10th and 90th percentiles (p10–p90), which describe the range a typical contract falls within while trimming the most extreme tails on either end.

3.2 Credibility filtering

A median computed from three contracts is not trustworthy. Before a summary is published we apply credibility filters:

4. Update cadence

Insurers are required to refresh their MRFs monthly. We re-crawl payer indexes on a rolling basis and re-ingest files as new versions are posted, so the corpus is continually refreshed rather than frozen at a single snapshot. Coverage of any individual payer depends on that payer publishing complete, parseable files for the period in question.

5. Limitations — read this

The honest framing matters more than the headline number. These data have real limits, and using them well means understanding them.

6. Questions and corrections

If you spot a figure that looks wrong, or want to understand how a specific number was derived, email hello@carecostexplorer.com. For details on access for journalists and academics, see the Researchers & Media page.

See the methodology in action: browse the free benchmarks →