top of page

Healthcare Data Analytics and Data Mining

Project 01 - Healthcare Insurance Market Data Analytics

Insurance Enrollment Dataset from CMS

Introduction:

In this paper, we present an analysis of data from the insurance enrollment market in nine states in the US. Based on the data for Medicare Advantage Plans from the CMS website databases we analyze and discuss how the insurance market is organized and focused and which insurer has the largest market share in each state to assist the US Congress through our consulting company “Brand Health Analytica Ltd” to decide on a quasi private single payer model for insurance in each state and select the best insurance provider for partnership. We have been assigned the task of making a set of decisions for the states of Arizona, Georgia, Kentucky, Nevada, Rhode Island, Texas, Vermont, West Virginia and Wisconsin.

​

Methods: 

Data files for this project were downloaded from the CMS website. We used Python to analyze the data and Microsoft Excel and Tableau to generate tables and graphs for visualization. We excluded sections with missing data for enrolled beneficiaries. We used a second set of data to identify which larger insurance companies are offering contracts for smaller ones. Our analysis was focused on identifying potential candidates for a quasi private single payer partnership for each state, what benefit packages were being offered by the major insurance companies in those states and finally, which insurance companies are most mindful of opioid crisis and what steps they are taking to mitigate the human and economic toll of the crisis in their controlled states.  

​

Question 1

Based on our analysis of the “CPSC_Enrollment_Info_2022_01.csv” database, we found that among the states of Arizona, Georgia, Kentucky, Nevada, Rhode Island, Texas, Vermont, West Virginia, and Wisconsin:

​

  1. The company holding the largest market share is United Health Care in Vermont, which is 70.91%

​​

  • The Herfindahl-Hirschman Index (HHI) is a commonly accepted measure of market concentration. It is calculated by squaring the market share of each firm competing in a market and then summing the resulting numbers. It can range from close to zero to 10,000. We found that the four states with highest Herfindahl-Hirschman Index(HHI) are Vermont, West Virginia, Rhode Island, and Arizona. The companies with the largest market share in each of these four states are listed in Table(1). 

​

​

​

​

​

​

​

​

​

​

​

​​Table (1).  Major insurance companies with the largest market share in each of the 4 states ranked on the basis of Herfindahl-Hirschman Index (HHI).

​

​

​

​

​

 

 

 

 

 

 

 

 

 

 

 

​

Figure (1): Representation of market shares of major insurance companies across the states of Arizona, Georgia, Kentucky, Nevada, Rhode Island, Texas, Vermont, West Virginia, and Wisconsin

​

​

​

​

​

​

​

 

 

 

 

 

 

 

 

 

​

​

Figure(2). Calculated Herfindahl-Hirschman Index (HHI) for the states of Arizona, Georgia, Kentucky, Nevada, Rhode Island, Texas, Vermont, West Virginia, and Wisconsin

​

​

Question 2

In order to determine the extent of benefit for dental care offered by the insurance programs, we investigated the coverage of dental services offered by the insurance companies with the highest market shares in the states assigned to us. We obtained the percentages of enrollees receiving comprehensive dental benefits and preventive dental benefits from the total number of enrollees for the top 5 highest market share companies in each state as shown in tables(2) and (3) 

​

​

​

​

​

​

 

 

 

 

 

 

 

 

 

 

​

​

​

​

​

​

​

​

​

​

​

​

​

​

​

Table (2). Percentage enrollees with comprehensive dental benefits and percentage market shares of major insurance companies in Arizona, Georgia, Kentucky, Nevada, Rhode Island, Texas, Vermont, West Virginia, and Wisconsin

​

​

​

​

​

 

 

 

 

 

 

 

 

 

 

 

 

​

Figure(3): Percentage of enrollees with comprehensive dental benefits in insurance companies with major market shares in the states of AZ, GA, KY, NV, RI,TX, VT,WI and WV.

​

​

​

​

​

​

​

​

​

​

 

 

 

 

 

 

 

 

 

 

Figure(4): Percentage of enrollees with preventive dental benefits in insurance companies with major market shares in the states of AZ, GA, KY, NV, RI,TX, VT,WI and WV.

 

Question 3

To see which insurance companies are the more proactive in making sure their members are not on an unsafe dose of prescription opioids for a prolonged time period, we used data from “MA HEDIS Public Use Files 2021”, the latest update of the HEDIS measure set for the Medical Advantage insurance plans.

 

Firstly, we filtered out all the contracts labeled with “HDO” from the “hedis_measures” tab and stored CMS Contract Numbers and associated rates in one table called “Measures”. Secondly, we selected the “CMS Contract Number” column, “Plan Name” column and “General-0050” (12/2020 Enrollment as reported by the Medicare Advantage Prescription Drug (MARx) system) column of tab “general” to form the table “Company”. At last, we used the database from Question 1 to get the market share of each company in each State, stored in table “Share”.

 

To build our database, we connect the above 3 tables using Contract number. The schema of our database is shown by the EER Diagram below.

​

​

​

​

​

​

​

​

​

​

​

​

​

1.png
2.png
3.png
4.png
5.png
6.png
7.png

Project 02 - Healthcare Provider data analytics

​

NPPES Provider Database

Introduction

This paper presents a brief analysis of the National Plan and Provider Enumeration System (NPPES) database which contains information on healthcare providers in the US. The dataset helps provide an insight into the supply side of the US healthcare market. This database is obtained from the Centers for Medicare and Medicaid Services CMS which is a federal agency that oversees major healthcare programs in the US. The CMS provides health coverage to more than 100 million people through Medicare, Medicaid, the Children’s Health Insurance Program, and the Health Insurance Marketplace. It seeks to strengthen and modernize the US Nation’s health care system, to enable it to provide access to high quality care and improved health.

 

Methods

Files were downloaded from the CMS data warehouse and inputted into Python for coding and analysis. We used “pandas” to import data and retrieve useful information and used “scipy” to conduct statistical tests. Our analysis focused on specific geographic areas assigned to us which includes 9 states namely Arizona, Georgia, Kentucky, Nevada, Rhode Island, Texas, Vermont, West Virginia, Wisconsin.

 

Results

Question 1

This section shows the list of doctors for each group member and the state in which said doctor was first licensed. Group members are ordered alphabetically by last name and we used the provider's NPI to show the state in which they were licensed.  

  1. The healthcare provider of member a was first licensed in MA.

  2. The healthcare provider of member b was first licensed in MA.

  3. The healthcare provider of member c was first licensed in MA.

  4. The healthcare provider of member d Chitra was first licensed in MI.

  5. The healthcare provider of member e was first licensed in MA.

 

Question 2

In this section, we ran a statistical test to explore gender differences in practicing as a “Sole Proprietor”. Data presented here includes only states that were assigned to group two (Arizona, Georgia, Kentucky, Nevada, Rhode Island, Texas, Vermont, West Virginia, and Wisconsin). We considered only values with Male/Female (M/F) from the gender variable and values with Y/N for Sole Proprietor to run a Fisher’s Exact Test for a 2x2 table to test for the gender preference. We excluded the null values during the data cleaning process. Numbers of male and female being a sole proprietor or not are calculated before conducting Fisher’s Exact Test. After the test, the result showed a p-value of 0.6369 which is greater than 0.05. This is not statistically significant; therefore we don't reject the Null Hypothesis. In other words, we don’t have sufficient evidence to say that there is a significant association between gender and preference for sole proprietorship. The data showed that 32% female, as well as 32% of male had sole proprietorship. 

 

​

​

​

​

​

​

​

​

 

Figure 1. Relationship between sole proprietorship and gender among healthcare providers.

 

Question 3

In the section, we conducted a statistical analysis to test the hypothesis that male doctors are more likely than their female peers to choose the practices that are associated with higher risk for a higher reward. We did a 2x2 crosstab (2 practice categories X 2 genders) followed by a Fisher’s Exact Test to generate a p value to derive a conclusion. 

​

For this analysis, only data from states assigned to group 2 was considered. We cross-tabulated providers designated as male or female by low and high risk-reward categories. Low risk/reward category included “Obstetrics & Gynecology” and “Pediatrics” and the high risk/reward category included “Surgery” and “Orthopedic Surgery”. We found the Health Care Provider Taxonomy Code Set for all four categories.

​

​

​

​

​

​

​

​

Table 1. Providers with high risk and low risk practices performed based on gender.

 

 

As shown in the table 1, among the 4 mentioned practice categories, male doctors performed 53.1% of high-risk practices, whereas the ratio for female doctors is only 12.4%. To test if there is a statistically significant difference between male and female doctors in choosing practices with different risk levels, we conducted the Fisher Exact test for the following Hypothesis:

H0: Males are less or equally likely to do high risk practices, compared to their female peers.

Ha: Males are more likely to do high risk practices, compared to their female peers.

Based on the analysis, we concluded that in the designated states of Arizona, Georgia, Kentucky, Nevada, Rhode Island, Texas, Vermont, West Virginia and Wisconsin, there was an observed imbalance that is statistically significant (p = 0.00 < 0.001); males prefer to choose the high-risk practices while females prefer to do low-risk practice.​

​

Question 4

We aim to find the density of Magnetic Resonance Imaging (MRI) centers per state. Thus, after filtering it to obtain healthcare facilities in the national database, we found the number of MRI centers setting the code “Healthcare Provider Taxonomy Code_1” variable equal to 261QM1200X. We found MRI centers using the code “Healthcare Provider Taxonomy Code_1” variable in the national database after filtering it to obtain healthcare facilities. 

In order to calculate the MRI density (per 1,000,000 population) for all the states, we looked up data for US population by states on the United States Census Bureau. We used US Census Bureau population estimates as of July 1, 2021 for the population statistics. (https://data.census.gov/cedsci/table?tid=PEPPOP2021.NST_EST2021_POP)

We calculated the MRI density (per 1,000,000 population) for all the states. We found the MRI density to be highest in the states of Texas, Wyoming and Florida, with Florida having the highest density of MRIs per 1,000,000 population compared to all other states. Table 2 shows the ranked MRI density across states. We visualized MRI density across states in the US by graphing a heatmap on Tableau. (Figure.2)

​

​

​

​

 

 

 

 

 

 

​

​

​

​

​

​

​

​

​

 

Table 1: MRI density for each state in the United States

​

The high MRI density noted in Florida could be due to the fact that it is one among the ten most populated states in the US. Also, Florida has the second highest number of Americans of age 65 or older. The presence of a larger number of aged individuals in the state could predispose them to increasing investment in medical equipment that can be utilized to enhance diagnostic and treatment modalities in the clinical setting (Killduff, 2021).

​

In addition, Florida has the second highest number of medical device manufacturing companies in the country (https://business.orlando.org/l/medical-technology/) which could explain why the density of MRI centers and facilities is high. Fast growing medical device companies have set up a market in Orlando, Florida because of its recognized medical device training expertise and expansive workforce, making the area one of the top cities for medical device companies.

 

​

​

​

​

​

 

 

 

 

 

 

 

 

 

Figure 2: graphic visualization of MRI density per 1,000,000 population in different states. The darker states have higher MRI densities.

 

 

References

Killduff, L. (2021, December 22). Which U.S. states have the oldest populations? PRB. Retrieved January 30, 2022, from https://www.prb.org/resources/which-us-states-are-the-oldest/

Prince, C. J., Keller, M., Braun, J., & Roche, A. (2021, September 30). Medical Technology: Orlando Economic Development. Orlando Economic partnership. Retrieved January 30, 2022, from https://business.orlando.org/l/medical-technology/

United States Census Bureau. (n.d.). NST_EST2021_POP Annual Estimates of the Resident Population for the United States, Regions, States, District of Columbia, and Puerto Rico: April 1, 2020 to July 1, 2021. Retrieved January 30, 2022, from https://data.census.gov/cedsci/table?tid=PEPPOP2021.NST_EST2021_POP

hw1-p1.png
hw1-p2.png
hw1-p3.png
hw1-p4.png

Project 03 - Healthcare Insurance Claims analytics

Question 1: Patient Vignettes

We made use of Inpatient, Emergency Department and Revenue dataset from All Payer Claim Data APCD datasets.  Firstly, we plan to do some initial analysis of some insurance claim records in order to get a better sense of insurance claims data. Thus, we tried to tell patient stories using UNIQ Identifiers to position individuals who had insurance claims record. The UNIQ Identifiers we chose were: 254, 507033, 40436, 859382, 1585831, 200760, 3692, 690326. We plan to tell patients' stories of admissions one by one and just show 3 patients stories here.

 

254 - This is a woman aged at least 75. Her doctor requested a bed to be reserved for her and she was transferred from another hospital and checked in at the admission office of Mount Ascutney Hospital And Health Center. She spent 46 days at the hospital, with a total charge of $67375.25. She was diagnosed to be in a weak health condition with multiple infections throughout the body, anemia, Gastro-esophageal reflux disease without esophagitis and various complications that needed specified aftercare. At the hospital, her three principal procedures were transfusion of red blood cells, insertion of an infusion device into superior vena cava and inspection of nasal She was discharged from the hospital with instruction to get home health care afterwards. As a senior patient, she would get her payment for health services covered by the government with Medicare.

 

507033 -This is a woman aged between 25-29, who was then 40 weeks pregnant. She had elective surgery and successfully delivered a baby at Northwestern Medical Center. After all those procedures, she left for home within 1 day. The total charge was $3233.29 including the fees for surgeries, medicines, a laboratory clinical diagnostic, a semi-private room, and etc. Her insurance company, Blue Cross, will pay for it.

 

40436 - This is a woman aged between 70-74. She was transferred to the University of Vermont Medical Center from another hospital and the situation is urgent. She was suffering from some kind of familial heart disease and other diseases of the circulatory system and this might explain why she was diagnosed with many cardiovascular diseases. Aspirin is a recommended medicine in the treatment of cardiovascular disease. Long term use of aspirin may be the reason for her asthma. At the same time she was a heavy smoker, which may exacerbate the asthma. Her nicotine dependence might be a result of major depression. Besides, she had diabetes, which explained her hyperlipidemia because diabetics have a higher chance of getting the disease.

 

Luckily, after all the procedures, she was getting well and returned home within one day. The total charge for her was $70275.41, which includes surgeries, medicines, laboratory clinical diagnostic, various kinds of image services and radiological and cardiological diagnostics, a semi-private room and an observation room, and etc. Since the lady is beyond 70 years old, MEDICARE will take care of all the costs.

​

​

Question 2: Service and Cost Profile of Major Insurances from 2016 Inpatient

​

DRGs (Diagnostic Related Groups) is used to set prices for inpatient hospitalizations. DRGs are further classified in Major Diagnosis Groups or MDCs. For this question, we combined all hospital admissions of the two major commercial insurances of “BLUE CROSS” + “COMMERCIAL INSURANCE” and used the  used the MDCs to analyze and present a graphical view of the “inpatient services portfolio” for each insurance. For medicare, the largest share in percentages (21%) of all costs is represented by the MDC musculoskeletal, while heart and circulatory is the second. For medicaid, neonatal and musculoskeletal both represent the largest share with 13% each. For commercial payer, musculoskeletal represents 22% of the slides while heart and circulatory MDC is second representing 13% of all MDCs. 

​

​

​

​

​

​

​

​

​

​

​

​

 

Figure 1: Graphical view of the inpatient services cost portfolio for Medicare. This Pie chart represents the MDC’s cost share in percentages

​

​

​

​

​

​

​

​

​

​

​

​

Figure 2: Graphical view of the inpatient services cost portfolio for Medicaid. This Pie chart represents the MDC’s cost share in percentages

​

​

​

​

​

​

​

​

​

​

​

​

Figure 3: Graphical view of the inpatient services cost portfolio for Commercial Payers. This Pie chart represents the MDC’s cost share in percentages

2.png
1.png
3.png

03

Upcoming

04

Upcoming

bottom of page