Methodology & Data Sources
Last updated: June 2026
What this page is for. CareCost Explorer turns the raw price-transparency files that insurers are legally required to publish into figures a non-technical professional can actually read. This page explains, in plain language, what that underlying data is, where it comes from, how we compute the numbers you see, how often it refreshes, and — just as important — what it cannot tell you.
1. What the data is
The federal Transparency in Coverage Rule, in effect since July 2022, requires nearly every health insurer and group health plan to publish the rates they have negotiated with in-network providers. These are released as Machine-Readable Files (MRFs) — enormous JSON files, often hundreds of gigabytes each, that list a negotiated dollar amount for a billing code at a specific provider. For the first time, the prices that were historically hidden inside private payer–provider contracts are a matter of public record.
CareCost Explorer ingests these files at scale. Our current corpus covers more than 36 million negotiated rates spanning 18 payers across 56 states and territories. The data is the insurers’ own published figures — we do not invent, model, or estimate prices. Our work is to collect, normalize, and make the numbers queryable.
2. Where it comes from
Every figure traces back to a payer-published MRF. Insurers post a table-of-contents index that points to the underlying in-network rate files; we crawl those indexes, download the referenced files, and parse them. Because each payer structures and names its files differently, ingestion is payer-specific: we map each insurer’s format onto a common schema of payer, provider, billing code, negotiated rate, rate type, and geography.
- Negotiated rates— payer-published Transparency-in-Coverage in-network MRFs (45 CFR Part 147).
- Billing-code descriptions— CMS HCPCS Level II and CPT code reference sets, used to label what each code represents.
- Geography— provider location and service-area metadata from the files themselves, normalized to state level.
3. How the figures are computed
A single billing code at a single payer can have thousands of negotiated rates — one per contracted provider, sometimes more once contract variants are counted. Showing a raw list is useless to most users, so we summarize each group of rates into a small set of statistics.
3.1 Central tendency and spread
For each (payer, code, geography) grouping we report the medianas the headline figure — it is far more robust to outliers than the mean, and negotiated-rate distributions are routinely skewed by a handful of extreme contract values. To show the spread, we also compute the 10th and 90th percentiles (p10–p90), which describe the range a typical contract falls within while trimming the most extreme tails on either end.
3.2 Credibility filtering
A median computed from three contracts is not trustworthy. Before a summary is published we apply credibility filters:
- Minimum rate count. A grouping must clear a minimum number of distinct negotiated rates before we surface a median and percentiles for it. Thin groupings are flagged or suppressed rather than shown as if they were reliable.
- De-duplication. Payers frequently repeat the same rate across many file entries (the same contract appears under multiple provider references). We collapse exact duplicates so a single contract does not get counted dozens of times and distort the distribution.
- Outlier handling. Implausible values (zero-dollar placeholders, obvious encoding artifacts) are excluded, and the percentile range itself limits the influence of legitimate but extreme contracts.
4. Update cadence
Insurers are required to refresh their MRFs monthly. We re-crawl payer indexes on a rolling basis and re-ingest files as new versions are posted, so the corpus is continually refreshed rather than frozen at a single snapshot. Coverage of any individual payer depends on that payer publishing complete, parseable files for the period in question.
5. Limitations — read this
The honest framing matters more than the headline number. These data have real limits, and using them well means understanding them.
- These are contract rates, not a bill.A negotiated rate is the price for a single billing code at the facility or contract level. A real episode of care bundles many codes — the procedure, anesthesia, facility fees, imaging, follow-up — plus your specific plan’s deductible and coinsurance. CareCost Explorer is not a patient out-of-pocket estimator, and a single rate should never be read as the total cost of care.
- Coverage varies by payer and region. Some payers publish clean, complete files; others publish partial or malformed ones. A code with deep data in one state may be thin or absent in another. Where coverage is thin we say so rather than papering over the gap.
- Source errors propagate. If an insurer publishes an incorrect rate, that error flows into our data. We filter obvious artifacts, but we cannot independently verify the accuracy of every payer-reported figure.
- Rate type matters. Negotiated amounts can be expressed as fixed dollar amounts, percentages of billed charges, or fee-schedule references. We focus on rates we can normalize to a comparable dollar basis; non-dollar arrangements may be excluded from a summary.
- A median is a summary, not a quote.No individual provider necessarily charges the median. The percentile range is there precisely because prices vary widely — treat the figures as orientation, then verify specifics with the payer or provider.
6. Questions and corrections
If you spot a figure that looks wrong, or want to understand how a specific number was derived, email hello@carecostexplorer.com. For details on access for journalists and academics, see the Researchers & Media page.
See the methodology in action: browse the free benchmarks →