The Rule
What MRFs are and why they exist
On November 12, 2020, the Department of Health and Human Services and Department of Labor published the Transparency in Coverage final rule (85 FR 72158). The rule requires nearly every commercial health insurer and self-insured group health plan in the United States to publish the rates it has negotiated with every in-network provider for every covered service. Those disclosures are machine-readable files — structured JSON documents posted to a public URL, formatted to a schema maintained by CMS at github.com/CMSgov/price-transparency-guide.
The effective date for the in-network rate and allowed-amount files was July 1, 2022. The rule requires payers to refresh their published files at least monthly. Non-compliance triggers civil monetary penalties under the Public Health Service Act.
The primary audience for MRFs is not individual consumers — the files are far too large and technically complex for direct consumer use. The intended downstream users are data aggregators, health technology companies, plan sponsors, and researchers who can parse and normalize the data before surfacing it in consumer-facing tools. A KFF review of early MRF compliance found that by mid-2022 one aggregator had already pulled 700,000 unique files totaling roughly half a petabyte — and expected the full corpus to reach one to three petabytes. That scale makes MRFs a data-infrastructure problem, not a consumer-browsing problem.
File Types
The three file types
The TiC rule defines two required data files and one structural index that ties them together. Every payer must publish all three.
In-network rates file
The primary data file. Lists every covered item and service with its negotiated rate for each in-network provider, expressed at the billing-code level (CPT, HCPCS, MS-DRG, NDC, and others). This is what researchers, analysts, and data companies want. A single file for a national payer routinely expands to 100–400 GB uncompressed; the largest exceed 1 TB. Files are gzip-compressed for distribution, reducing download size to 5–20 GB, but parse at full size.
Out-of-network allowed-amounts file
Reports what the plan actually paid to out-of-network providers over the prior 90 days, expressed as billed charge and allowed amount. Useful for benchmarking out-of-network exposure and calculating reference-based pricing baselines. These files do not contain negotiated rates — they record historical payment amounts — and should not be substituted for in-network rate data.
Table-of-contents / index file
A JSON index that maps plan identifiers (EIN or HIOS ID) to the URLs of the in-network and allowed-amount files for each plan. Large payers split rates across dozens or hundreds of separate files; the table-of-contents is the only reliable entry point. UnitedHealthcare publishes more than 45,000 index entries. Every crawl of a payer's MRF data must start here — not from bookmarked file URLs, which change with each monthly refresh.
File Size
Why the files are enormous
Two structural forces drive MRF file size to the 50 GB–1 TB range. First, the disclosure requirement itself is a Cartesian product: every billing code, at every contracted provider, under every plan variant. A payer with 10,000 in-network physicians, 500 employer plan variants, and 10 commonly billed codes for a single specialty produces 50 million rows before accounting for modifiers or place-of-service splits. Multiply that across hundreds of specialties and the arithmetic explains the scale.
Second, ghost rates — rates published for services a provider will never actually perform — inflate files further. Payers derive rates from contract templates that cover far more procedure codes than any given provider bills. A dermatologist may appear with a published rate for knee replacement. An audiologist may carry rates for cardiac catheterization. Gigasheet’s analysis found that up to 40% of rates in some MRF files fall into this category. These entries are not malicious — they are structural artifacts of how payer contracting works — but they must be filtered before any analysis is trustworthy.
A third driver is denormalized JSON repetition. Rather than referencing shared provider objects by ID, many payers repeat full provider blocks for every rate record. A 200-character provider object repeated across 10 million rate rows adds 2 GB of redundant encoding before the first actual rate value appears.
Schema Updates
What changed in 2025–26
The CMSgov price-transparency-guide repository follows semantic versioning. In October 2025 the schema reached version 2.0.0 — the first major-version increment since the rule took effect in July 2022. A major version means CMS introduced at least one breaking change: fields removed, renamed, or with altered semantics. Parsers written for v1.x files need updating before they can reliably process 2026-dated MRFs. The VERSION.md file in the CMS repository is the authoritative changelog.
On enforcement: CMS has increased scrutiny of non-compliant payers since 2024. Civil monetary penalty authority under the Public Health Service Act allows for per-day fines for plans that fail to publish conformant MRFs. Several large insurers received CMS enforcement correspondence in 2024–25, and the agency has signaled continued attention to both completeness and schema conformance.
On ghost-rate exclusions: CMS and several policy groups have proposed rules that would require payers to suppress rates for provider-code combinations where no claim has been filed in the prior 12 months. If finalized, this would materially reduce file sizes — a single major payer file could shrink by 40% or more based on Gigasheet’s ghost-rate estimates. The CMSgov repository’s issue tracker and the Federal Register are the authoritative sources to monitor for ratified changes.
Data Quality
Ghost rates and data quality
Raw MRF data over-counts real-world rates in two ways. The first is ghost rates, described above: a rate record linking a provider to a billing code the provider has never billed. The second is thenegotiated_typeproblem: not all rate values are dollar amounts. A value withnegotiated_type = "percentage"means the number is a percentage of billed charges — 40.5 means 40.5%, not $40.50. Mixing these types into a distribution without filtering produces figures that are numerically nonsensical.
De-ghosting is the process of removing rates for provider-code pairs where the provider has never (or rarely) billed that code in practice. The cleanest signal for this is claims data — match MRF rate records against actual claim history to flag impossible pairings. Without claims data, a practical proxy is specialty-code coherence: filter out rates where the provider’s taxonomy code is incompatible with the billing code’s clinical domain. Neither approach eliminates all noise, but specialty filtering catches the most egregious entries.
CareCost Explorer ships with de-ghosting already applied — specialty-procedure coherence filtering on 36M+ rates across 18 payers. See what that looks like in practice →
Discovery
Where every payer publishes their files
The rule does not specify a central registry. Each payer posts its MRFs independently — usually to a dedicated compliance page buried several clicks from the insurer’s homepage. Common patterns include dedicated subdomains (transparency-in-coverage.uhc.com, transparency.emblemhealth.com), third-party platforms like HealthSparq and Zelis, and developer portals with REST-style file listings.
Below are the authoritative index locations for 20 major US payers, verified as of June 12, 2026. Each row includes the URL, the portal type, and monthly cadence.
| Payer | MRF Location | Type | Cadence | Verified |
|---|---|---|---|---|
UnitedHealthcare UnitedHealth Group | transparency-in-coverage.uhc.com | portal | monthly | ✓ 2026-06-12 |
Aetna CVS Health | health1.aetna.com | portal | monthly | ✓ 2026-06-12 |
Cigna The Cigna Group | www.cigna.com | portal | monthly | ✓ 2026-06-12 |
Elevance Health (Anthem) Elevance Health | www.anthem.com | portal | monthly | URL authoritative, anthem.com/machine-readable-file/search/ timed out on direct fetch; page existence confirmed via multiple independent sources and search results as of 2026-06-12 |
Humana Humana Inc. | developers.humana.com | portal | monthly | URL authoritative, developers.humana.com/syntheticdata/healthplan-price-transparency returns 502 Bad Gateway on direct fetch; confirmed via search results and GitHub CMS discussions as the authoritative TiC landing URL |
Kaiser Permanente Kaiser Foundation Health Plan | healthy.kaiserpermanente.org | portal | monthly | URL authoritative, healthy.kaiserpermanente.org/front-door/machine-readable hit redirect loop on direct fetch; URL confirmed live via multiple independent sources |
Centene Centene Corporation | www.centene.com | portal | monthly | ✓ 2026-06-12 |
Molina Healthcare Molina Healthcare, Inc. | www.molinahealthcare.com | portal | monthly | URL authoritative, molinahealthcare.com/members/common/mrf.aspx returns bot-detection block on automated fetch; URL confirmed via multiple sources as authoritative TiC landing page |
Oscar Health Oscar Health, Inc. | www.hioscar.com | portal | monthly | URL authoritative, hioscar.com/transparency-in-coverage-files/oscar renders minimal content on WebFetch due to SPA architecture; URL confirmed authoritative via CMS and third-party MRF trackers |
Highmark Highmark Health | mrfdata.hmhs.com | portal | monthly | ✓ 2026-06-12 |
HCSC (BCBS Illinois / Texas / Oklahoma / New Mexico / Montana) Health Care Service Corporation (HCSC) | www.hcsc.com | portal | monthly | ✓ 2026-06-12 |
Premera Blue Cross Premera Blue Cross (independent nonprofit) | www.premera.com | portal | monthly | URL authoritative, premera.com/visitor/transparency returns minimal content on WebFetch; migration from sapphiremrfhub.com confirmed via Premera employer communications dated November 2025 |
Florida Blue (GuideWell) GuideWell Mutual Holding Corporation | www.floridablue.com | portal | monthly | ✓ 2026-06-12 |
CareFirst BlueCross BlueShield CareFirst, Inc. | individual.carefirst.com | portal | monthly | ✓ 2026-06-12 |
Blue Cross Blue Shield of Massachusetts Blue Cross Blue Shield of Massachusetts (independent nonprofit) | transparency-in-coverage.bluecrossma.com | portal | monthly | URL authoritative, WebFetch returns only 'MRF' text — page appears to be fully JS-rendered; URL confirmed authoritative via multiple sources including Payerset and direct BCBSMA references |
Blue Cross Blue Shield of Michigan Blue Cross Blue Shield of Michigan (independent nonprofit) | www.bcbsm.com | portal | monthly | ✓ 2026-06-12 |
BlueCross BlueShield of Tennessee BlueCross BlueShield of Tennessee (independent nonprofit) | www.bcbst.com | portal | monthly | URL authoritative, bcbst.com/tcr returns HTTP 403 on automated fetch; URL confirmed authoritative via BCBST official communications, TN state health dept, and multiple employer compliance documents |
Blue Cross NC Blue Cross and Blue Shield of North Carolina (independent nonprofit) | www.bluecrossnc.com | portal | monthly | ✓ 2026-06-12 |
Regence BlueCross BlueShield Regence (a division of Cambia Health Solutions; BCBSA licensee) | www.regence.com | portal | monthly | ✓ 2026-06-12 |
EmblemHealth EmblemHealth (Group Health Inc. / HIP Health Plan of New York) | transparency.emblemhealth.com | portal | monthly | ✓ 2026-06-12 |
All 20 of these are already parsed and queryable in Explorer. Skip the discovery work →
Three gotchas worth knowing before you build a pipeline:
- UnitedHealthcare — expiring signed URLs. Files are served via short-lived AWS signed URLs that must be resolved at fetch time. You cannot cache the URL, only the content. Pipelines that cache URLs will silently start returning 403s within hours.
- Molina Healthcare & Oscar Health — SPA portals. Both portals are JavaScript single-page applications. The actual TOC JSON URLs are never present in the initial HTML — they are fetched via internal XHR after page load. Standard crawlers and curl all miss the files entirely. You need a headless browser with network request interception.
- HealthSparq (Sapphire) platform — unreliable timestamps. HealthSparq hosts files for multiple payers and allows retroactive edits to the ‘created’ date. Date-based freshness checks are unreliable; verify actual content, not metadata timestamps.
Cost Framing
Build vs. buy
3–6 engineering weeks is a realistic first-pipeline estimate for a single payer at scale: index discovery, schema reverse-engineering, provider-reference resolution, deduplication, output normalization, and basic monitoring. That is before de-ghosting, which adds its own complexity if you lack claims data for cross-reference.
Storage is a separate cost. 18 payers at an average of 150 GB uncompressed per in-network file set is roughly 2.7 TB per monthly snapshot. Six months of history for trend analysis means 16+ TB of raw source data before you build any queryable layer on top. Cloud object storage at current rates runs $0.02–$0.03/GB/month; 16 TB is $320–$480 per month in storage alone, plus egress when you read it.
The monthly refresh treadmill is the steepest ongoing cost. Every month, each payer republishes its files — usually at new URLs. The table-of-contents index must be re-crawled, new file locations located, and the full parse-validate-de-ghost cycle re-run. Payers change field conventions, file structure, and schema compliance between monthly drops. A pipeline that worked in February may throw exceptions on the March file. Monitoring format drift across 18+ payers at thousands of plan files each is ongoing infrastructure — not a one-time project.
| Approach | Upfront effort | Ongoing effort | Fits when |
|---|---|---|---|
| Build from scratch | 3–6 eng-weeks per payer | Monthly re-run per payer; schema drift monitoring | You need full data ownership and have the engineering capacity |
| Open-source partial parsers | Days to integrate; significant gap-filling | Ongoing maintenance as schema evolves | Single-payer research with a technical team |
| Pre-parsed datasets (files/Parquet) | Hours to load; normalized already | Negotiate refresh terms; verify de-ghosting methodology | Analytics or research work where you receive a data drop |
| Query layer / API (rent, don't build) | Hours to integrate | Subscription cost; no infra to maintain | You need answers, not raw files; time-to-value is the constraint |
For teams that need rates — not infrastructure — CareCost Explorer already has this pipeline running across 18 payers. The same rates, already streamed, de-ghosted, provider-resolved, and normalized, are queryable via API or dashboard. That is not the right fit for every organization, but it is worth pricing before committing engineering timeto a build that replicates existing infrastructure.
Found a factual error? Email hello@carecostexplorer.com. All regulatory claims trace to the TiC final rule and CMSgov/price-transparency-guide.
Frequently Asked Questions
What is a machine-readable file?
A machine-readable file (MRF) is a structured JSON document that a health insurer must publish under the federal Transparency in Coverage rule (45 CFR Part 147). Each file lists negotiated rates between the insurer and its in-network providers, formatted to a schema maintained by CMS at github.com/CMSgov/price-transparency-guide. Files are published to a public URL and refreshed at least monthly.
Are payer MRF files free?
Yes. The Transparency in Coverage rule mandates that all payers publish machine-readable files at no cost. The economic reality: the burden is entirely on you to download and process them. The engineering cost of accessing, storing, and normalizing 50+ TB of monthly MRF updates is substantial — which is why data companies and aggregators charge for parsed MRF data.
Are MRFs the same as hospital price transparency files?
No — these are two separate federal requirements. Hospital price transparency (45 CFR Part 180) applies to hospitals and requires a machine-readable chargemaster plus a consumer-facing shoppable services file. The Transparency in Coverage rule (45 CFR Part 147) applies to health insurers and requires MRFs listing negotiated rates with every in-network provider. Hospitals publish their own files; insurers publish theirs. The file formats, schemas, and enforcement agencies differ.
How often do payers update MRFs?
Monthly, per the federal mandate. In practice, the posting date inside a file's last_updated_on field is not always trustworthy — some payers republish old content under new filenames without updating rates. Verify freshness with the file's HTTP Last-Modified header or by hashing the content against the prior month's version.
Can consumers use MRFs directly?
In theory yes; in practice no. A single in-network file from a major payer runs 50 GB to 1 TB uncompressed. The files require a streaming JSON parser, provider-reference resolution, and de-ghosting before any useful comparison is possible. The federal consumer-facing cost estimator tool was the intended consumer interface; MRFs are the data layer behind it, designed for software systems and data aggregators.