The Centers for Medicare and Medicaid Services (CMS) is increasingly focused on identifying and preventing “fraud, waste and abuse” within vital healthcare programs like Medicaid. This effort, which spans various coverage types and provider services, received a boost in November 2025 with a letter from CMS to states outlining opportunities for collaborative action. The agency’s commitment to program integrity dates back to 2010, with the establishment of the Center for Program Integrity (CPI) aiming to shift from a reactive “pay and chase” approach to one driven by data analytics. On February 14, 2026, CMS released a new dataset intended to aid in this effort, offering provider-level spending information that could potentially highlight unusual billing patterns.
The newly released data, available on the HHS Open Data platform, represents a significant step toward transparency in Medicaid spending. However, understanding what the data *don’t* show is just as crucial as understanding what they do. While the dataset offers a granular look at certain aspects of Medicaid payments, its limitations require careful interpretation to avoid drawing inaccurate conclusions about provider behavior or program efficiency. The core of the dataset focuses on outpatient services, but a substantial portion of Medicaid spending remains outside its scope.
What’s Included in the Medicaid Spending Data?
The dataset comprises seven key data points for each claim: the National Provider Identifier (NPI) of both the billing and servicing provider, the Healthcare Common Procedure Coding System (HCPCS) code for the service provided, the month and year of service, the number of beneficiaries seen, the number of procedures delivered (claim count) and the total amount paid. This information covers both fee-for-service claims and payments made by Medicaid managed care organizations between 2018 and 2024.
Significant Omissions and Their Implications
Despite its detail, the dataset deliberately excludes crucial information. Notably, it contains no records related to institutional spending – such as hospital care – which represents 37% of total Medicaid expenditures and remains the single largest spending category. Similarly, prescription drug costs are absent. These omissions alone imply the dataset provides an incomplete picture of overall Medicaid spending. Beyond these broad exclusions, several other critical data points are missing, hindering a comprehensive assessment of service volume and spending.
One key omission is enrollment data. The number of individuals eligible for Medicaid fluctuates based on state policies, economic conditions, and demographic shifts. Without accounting for enrollment numbers, comparing service use across time or geographic locations becomes problematic. Similarly, variations in benefits and coverage offered by different states, and changes to those benefits over time, aren’t reflected in the data. Payment rates, which are influenced by local costs of living and state-level decisions, are also not included. Crucially, the dataset lacks diagnostic information – the underlying medical conditions for which procedures are performed – and details about the place of service (in-person vs. Telehealth) or other modifiers that provide context to the services delivered.
Potential for Misinterpretation and the Importance of Context
CMS acknowledges the potential for data analytics to identify problematic patterns, but warns against relying on this dataset in isolation. The agency highlights several shortcomings that could lead to inaccurate conclusions. For example, the procedures coded aren’t always directly comparable. CMS points to “personal care” as an example, noting that the single procedure code encompasses services ranging from 15 minutes to a full day, while psychotherapy is broken down into codes based on visit length (30, 45, or 60 minutes). This makes direct comparisons misleading.
The nature of the “providers” themselves also complicates analysis. According to CMS, 10 of the 20 largest “providers” identified in their example are actually state or local government agencies responsible for both administering and delivering Medicaid benefits, rather than traditional healthcare providers. States employ diverse approaches to benefit delivery, with health departments often directly providing services, particularly for behavioral health and individuals with developmental disabilities.
Perhaps most concerning is the limited information available regarding the data’s quality and methodology. The underlying data originates from the Transformed Medicaid Statistical Information System (T-MSIS), a rich source but one known to contain inconsistencies. CMS maintains a data quality atlas to highlight potential issues, but it remains unclear how these issues were addressed when creating the public dataset. The atlas reports that in 2024, data for six states were unusable and another 16 were of “high concern” regarding total spending information. It’s unknown whether this problematic data was included in the released dataset or if a different version of T-MSIS was used.
Finally, the dataset lacks the crucial context of the COVID-19 pandemic, which began in 2020. The pandemic led to significant shifts in Medicaid enrollment, driven by the continuous enrollment period, and increased awareness of unmet needs in behavioral health and long-term care. Changes in state policies regarding coverage, eligibility, and provider payment rates during this period further complicate interpretation of spending trends between 2018 and 2024.
The release of this Medicaid provider spending data represents a positive step toward greater transparency. However, a nuanced understanding of its limitations is essential to avoid misinterpretations and ensure that data-driven insights contribute to effective program integrity efforts. Future data releases should prioritize addressing these limitations, including incorporating institutional spending, prescription drug costs, and more detailed information about enrollment, benefits, and data quality. CMS is expected to provide further guidance on data interpretation and best practices in the coming months.
Have your say: What are your thoughts on the new Medicaid data release? Share your comments below.
