Machine-Readable Files: The Complete Guide (2026)

What the Transparency in Coverage mandate requires, where the 20+ payer index files live, why the files reach 1 TB, and what it honestly takes to turn them into usable data.

Reviewed
SchemaTiC v2.0.0
SourcesCMS · CMSgov GitHub

The Rule

What MRFs are and why they exist

On November 12, 2020, the Department of Health and Human Services and Department of Labor published the Transparency in Coverage final rule (85 FR 72158). The rule requires nearly every commercial health insurer and self-insured group health plan in the United States to publish the rates it has negotiated with every in-network provider for every covered service. Those disclosures are machine-readable files — structured JSON documents posted to a public URL, formatted to a schema maintained by CMS at github.com/CMSgov/price-transparency-guide.

The effective date for the in-network rate and allowed-amount files was July 1, 2022. The rule requires payers to refresh their published files at least monthly. Non-compliance triggers civil monetary penalties under the Public Health Service Act.

The primary audience for MRFs is not individual consumers — the files are far too large and technically complex for direct consumer use. The intended downstream users are data aggregators, health technology companies, plan sponsors, and researchers who can parse and normalize the data before surfacing it in consumer-facing tools. A KFF review of early MRF compliance found that by mid-2022 one aggregator had already pulled 700,000 unique files totaling roughly half a petabyte — and expected the full corpus to reach one to three petabytes. That scale makes MRFs a data-infrastructure problem, not a consumer-browsing problem.

File Types

The three file types

The TiC rule defines two required data files and one structural index that ties them together. Every payer must publish all three.

In-network rates file

The primary data file. Lists every covered item and service with its negotiated rate for each in-network provider, expressed at the billing-code level (CPT, HCPCS, MS-DRG, NDC, and others). This is what researchers, analysts, and data companies want. A single file for a national payer routinely expands to 100–400 GB uncompressed; the largest exceed 1 TB. Files are gzip-compressed for distribution, reducing download size to 5–20 GB, but parse at full size.

Out-of-network allowed-amounts file

Reports what the plan actually paid to out-of-network providers over the prior 90 days, expressed as billed charge and allowed amount. Useful for benchmarking out-of-network exposure and calculating reference-based pricing baselines. These files do not contain negotiated rates — they record historical payment amounts — and should not be substituted for in-network rate data.

Table-of-contents / index file

A JSON index that maps plan identifiers (EIN or HIOS ID) to the URLs of the in-network and allowed-amount files for each plan. Large payers split rates across dozens or hundreds of separate files; the table-of-contents is the only reliable entry point. UnitedHealthcare publishes more than 45,000 index entries. Every crawl of a payer's MRF data must start here — not from bookmarked file URLs, which change with each monthly refresh.

File Size

Why the files are enormous

Two structural forces drive MRF file size to the 50 GB–1 TB range. First, the disclosure requirement itself is a Cartesian product: every billing code, at every contracted provider, under every plan variant. A payer with 10,000 in-network physicians, 500 employer plan variants, and 10 commonly billed codes for a single specialty produces 50 million rows before accounting for modifiers or place-of-service splits. Multiply that across hundreds of specialties and the arithmetic explains the scale.

Second, ghost rates — rates published for services a provider will never actually perform — inflate files further. Payers derive rates from contract templates that cover far more procedure codes than any given provider bills. A dermatologist may appear with a published rate for knee replacement. An audiologist may carry rates for cardiac catheterization. Gigasheet’s analysis found that up to 40% of rates in some MRF files fall into this category. These entries are not malicious — they are structural artifacts of how payer contracting works — but they must be filtered before any analysis is trustworthy.

A third driver is denormalized JSON repetition. Rather than referencing shared provider objects by ID, many payers repeat full provider blocks for every rate record. A 200-character provider object repeated across 10 million rate rows adds 2 GB of redundant encoding before the first actual rate value appears.

Schema Updates

What changed in 2025–26

The CMSgov price-transparency-guide repository follows semantic versioning. In October 2025 the schema reached version 2.0.0 — the first major-version increment since the rule took effect in July 2022. A major version means CMS introduced at least one breaking change: fields removed, renamed, or with altered semantics. Parsers written for v1.x files need updating before they can reliably process 2026-dated MRFs. The VERSION.md file in the CMS repository is the authoritative changelog.

On enforcement: CMS has increased scrutiny of non-compliant payers since 2024. Civil monetary penalty authority under the Public Health Service Act allows for per-day fines for plans that fail to publish conformant MRFs. Several large insurers received CMS enforcement correspondence in 2024–25, and the agency has signaled continued attention to both completeness and schema conformance.

On ghost-rate exclusions: CMS and several policy groups have proposed rules that would require payers to suppress rates for provider-code combinations where no claim has been filed in the prior 12 months. If finalized, this would materially reduce file sizes — a single major payer file could shrink by 40% or more based on Gigasheet’s ghost-rate estimates. The CMSgov repository’s issue tracker and the Federal Register are the authoritative sources to monitor for ratified changes.

Data Quality

Ghost rates and data quality

Raw MRF data over-counts real-world rates in two ways. The first is ghost rates, described above: a rate record linking a provider to a billing code the provider has never billed. The second is thenegotiated_typeproblem: not all rate values are dollar amounts. A value withnegotiated_type = "percentage"means the number is a percentage of billed charges — 40.5 means 40.5%, not $40.50. Mixing these types into a distribution without filtering produces figures that are numerically nonsensical.

De-ghosting is the process of removing rates for provider-code pairs where the provider has never (or rarely) billed that code in practice. The cleanest signal for this is claims data — match MRF rate records against actual claim history to flag impossible pairings. Without claims data, a practical proxy is specialty-code coherence: filter out rates where the provider’s taxonomy code is incompatible with the billing code’s clinical domain. Neither approach eliminates all noise, but specialty filtering catches the most egregious entries.

Naive analyses fail silently. A median negotiated rate for CPT 27447 (total knee replacement) calculated against raw MRF data will include dermatologists, audiologists, and psychiatrists who carry the code through contract templating. The median will be lower than the true procedure rate because those ghost entries cluster near the template floor rate, not the clinical rate. De-ghosting is not a cleanup step — it is a prerequisite for any number that should reflect what patients and payers actually transact.

CareCost Explorer ships with de-ghosting already applied — specialty-procedure coherence filtering on 36M+ rates across 18 payers. See what that looks like in practice →

Discovery

Where every payer publishes their files

The rule does not specify a central registry. Each payer posts its MRFs independently — usually to a dedicated compliance page buried several clicks from the insurer’s homepage. Common patterns include dedicated subdomains (transparency-in-coverage.uhc.com, transparency.emblemhealth.com), third-party platforms like HealthSparq and Zelis, and developer portals with REST-style file listings.

Below are the authoritative index locations for 20 major US payers, verified as of June 12, 2026. Each row includes the URL, the portal type, and monthly cadence.

Payer machine-readable file index locations and verification status.
PayerMRF LocationTypeCadenceVerified
UnitedHealthcare
UnitedHealth Group
transparency-in-coverage.uhc.comportalmonthly✓ 2026-06-12
Aetna
CVS Health
health1.aetna.comportalmonthly✓ 2026-06-12
Cigna
The Cigna Group
www.cigna.comportalmonthly✓ 2026-06-12
Elevance Health (Anthem)
Elevance Health
www.anthem.comportalmonthlyURL authoritative, anthem.com/machine-readable-file/search/ timed out on direct fetch; page existence confirmed via multiple independent sources and search results as of 2026-06-12
Humana
Humana Inc.
developers.humana.comportalmonthlyURL authoritative, developers.humana.com/syntheticdata/healthplan-price-transparency returns 502 Bad Gateway on direct fetch; confirmed via search results and GitHub CMS discussions as the authoritative TiC landing URL
Kaiser Permanente
Kaiser Foundation Health Plan
healthy.kaiserpermanente.orgportalmonthlyURL authoritative, healthy.kaiserpermanente.org/front-door/machine-readable hit redirect loop on direct fetch; URL confirmed live via multiple independent sources
Centene
Centene Corporation
www.centene.comportalmonthly✓ 2026-06-12
Molina Healthcare
Molina Healthcare, Inc.
www.molinahealthcare.comportalmonthlyURL authoritative, molinahealthcare.com/members/common/mrf.aspx returns bot-detection block on automated fetch; URL confirmed via multiple sources as authoritative TiC landing page
Oscar Health
Oscar Health, Inc.
www.hioscar.comportalmonthlyURL authoritative, hioscar.com/transparency-in-coverage-files/oscar renders minimal content on WebFetch due to SPA architecture; URL confirmed authoritative via CMS and third-party MRF trackers
Highmark
Highmark Health
mrfdata.hmhs.comportalmonthly✓ 2026-06-12
HCSC (BCBS Illinois / Texas / Oklahoma / New Mexico / Montana)
Health Care Service Corporation (HCSC)
www.hcsc.comportalmonthly✓ 2026-06-12
Premera Blue Cross
Premera Blue Cross (independent nonprofit)
www.premera.comportalmonthlyURL authoritative, premera.com/visitor/transparency returns minimal content on WebFetch; migration from sapphiremrfhub.com confirmed via Premera employer communications dated November 2025
Florida Blue (GuideWell)
GuideWell Mutual Holding Corporation
www.floridablue.comportalmonthly✓ 2026-06-12
CareFirst BlueCross BlueShield
CareFirst, Inc.
individual.carefirst.comportalmonthly✓ 2026-06-12
Blue Cross Blue Shield of Massachusetts
Blue Cross Blue Shield of Massachusetts (independent nonprofit)
transparency-in-coverage.bluecrossma.comportalmonthlyURL authoritative, WebFetch returns only 'MRF' text — page appears to be fully JS-rendered; URL confirmed authoritative via multiple sources including Payerset and direct BCBSMA references
Blue Cross Blue Shield of Michigan
Blue Cross Blue Shield of Michigan (independent nonprofit)
www.bcbsm.comportalmonthly✓ 2026-06-12
BlueCross BlueShield of Tennessee
BlueCross BlueShield of Tennessee (independent nonprofit)
www.bcbst.comportalmonthlyURL authoritative, bcbst.com/tcr returns HTTP 403 on automated fetch; URL confirmed authoritative via BCBST official communications, TN state health dept, and multiple employer compliance documents
Blue Cross NC
Blue Cross and Blue Shield of North Carolina (independent nonprofit)
www.bluecrossnc.comportalmonthly✓ 2026-06-12
Regence BlueCross BlueShield
Regence (a division of Cambia Health Solutions; BCBSA licensee)
www.regence.comportalmonthly✓ 2026-06-12
EmblemHealth
EmblemHealth (Group Health Inc. / HIP Health Plan of New York)
transparency.emblemhealth.comportalmonthly✓ 2026-06-12

All 20 of these are already parsed and queryable in Explorer. Skip the discovery work →

Three gotchas worth knowing before you build a pipeline:

Cost Framing

Build vs. buy

3–6 engineering weeks is a realistic first-pipeline estimate for a single payer at scale: index discovery, schema reverse-engineering, provider-reference resolution, deduplication, output normalization, and basic monitoring. That is before de-ghosting, which adds its own complexity if you lack claims data for cross-reference.

Storage is a separate cost. 18 payers at an average of 150 GB uncompressed per in-network file set is roughly 2.7 TB per monthly snapshot. Six months of history for trend analysis means 16+ TB of raw source data before you build any queryable layer on top. Cloud object storage at current rates runs $0.02–$0.03/GB/month; 16 TB is $320–$480 per month in storage alone, plus egress when you read it.

The monthly refresh treadmill is the steepest ongoing cost. Every month, each payer republishes its files — usually at new URLs. The table-of-contents index must be re-crawled, new file locations located, and the full parse-validate-de-ghost cycle re-run. Payers change field conventions, file structure, and schema compliance between monthly drops. A pipeline that worked in February may throw exceptions on the March file. Monitoring format drift across 18+ payers at thousands of plan files each is ongoing infrastructure — not a one-time project.

MRF data access approaches: effort and fit
ApproachUpfront effortOngoing effortFits when
Build from scratch3–6 eng-weeks per payerMonthly re-run per payer; schema drift monitoringYou need full data ownership and have the engineering capacity
Open-source partial parsersDays to integrate; significant gap-fillingOngoing maintenance as schema evolvesSingle-payer research with a technical team
Pre-parsed datasets (files/Parquet)Hours to load; normalized alreadyNegotiate refresh terms; verify de-ghosting methodologyAnalytics or research work where you receive a data drop
Query layer / API (rent, don't build)Hours to integrateSubscription cost; no infra to maintainYou need answers, not raw files; time-to-value is the constraint

For teams that need rates — not infrastructure — CareCost Explorer already has this pipeline running across 18 payers. The same rates, already streamed, de-ghosted, provider-resolved, and normalized, are queryable via API or dashboard. That is not the right fit for every organization, but it is worth pricing before committing engineering timeto a build that replicates existing infrastructure.

Found a factual error? Email hello@carecostexplorer.com. All regulatory claims trace to the TiC final rule and CMSgov/price-transparency-guide.

Frequently Asked Questions

What is a machine-readable file?

A machine-readable file (MRF) is a structured JSON document that a health insurer must publish under the federal Transparency in Coverage rule (45 CFR Part 147). Each file lists negotiated rates between the insurer and its in-network providers, formatted to a schema maintained by CMS at github.com/CMSgov/price-transparency-guide. Files are published to a public URL and refreshed at least monthly.

Are payer MRF files free?

Yes. The Transparency in Coverage rule mandates that all payers publish machine-readable files at no cost. The economic reality: the burden is entirely on you to download and process them. The engineering cost of accessing, storing, and normalizing 50+ TB of monthly MRF updates is substantial — which is why data companies and aggregators charge for parsed MRF data.

Are MRFs the same as hospital price transparency files?

No — these are two separate federal requirements. Hospital price transparency (45 CFR Part 180) applies to hospitals and requires a machine-readable chargemaster plus a consumer-facing shoppable services file. The Transparency in Coverage rule (45 CFR Part 147) applies to health insurers and requires MRFs listing negotiated rates with every in-network provider. Hospitals publish their own files; insurers publish theirs. The file formats, schemas, and enforcement agencies differ.

How often do payers update MRFs?

Monthly, per the federal mandate. In practice, the posting date inside a file's last_updated_on field is not always trustworthy — some payers republish old content under new filenames without updating rates. Verify freshness with the file's HTTP Last-Modified header or by hashing the content against the prior month's version.

Can consumers use MRFs directly?

In theory yes; in practice no. A single in-network file from a major payer runs 50 GB to 1 TB uncompressed. The files require a streaming JSON parser, provider-reference resolution, and de-ghosting before any useful comparison is possible. The federal consumer-facing cost estimator tool was the intended consumer interface; MRFs are the data layer behind it, designed for software systems and data aggregators.

Everything in this guide, already done.

36M+ rates from 18 payers' MRFs — parsed, de-ghosted, normalized, queryable.