Data Science in the Health Care Industry

It’s no secret that the U.S. healthcare industry is expensive. Health care expenses represented 19.7% of the GDP in 2020. Some say big data is the answer. Accelerating costs are forcing payers and healthcare providers to shift from a fee-for-service approach (the more treatments, the better) to one that favors patient outcomes (rewarding providers for targeted treatments that actually work). Additionally, physicians review data from a range of sources before making treatment decisions.

Big Data in Healthcare

Big data has the potential to help physicians make better decisions such as personalized treatments and preventive care.

Sponsored Schools

Syracuse University


Master of Science in Applied Data Science

Syracuse University’s online Master of Science in Data Science can be completed in as few as 18 months.

  • Complete in as little as 18 months
  • No GRE scores required to apply

Southern Methodist University


Master of Science in Data Science

Earn your MS in Data Science at SMU, where you can specialize in Machine Learning or Business Analytics, and complete in as few as 20 months.

  • No GRE required.
  • Complete in as little as 20 months.

University of California, Berkeley


Master of Information and Data Science

Earn your Master’s in Data Science online from UC Berkeley in as few as 12 months.

  • Complete in as few as 12 months
  • No GRE required

Syracuse University


Master of Science in Business Analytics

Looking to become a data-savvy leader? Earn your online Master of Science in Business Analytics from Syracuse University.

  • As few as 18 months to complete 
  • No GRE required to apply


Doctors can merge and analyze data sets from clinical trials, electronic medical records, online patient networks, and genomics research. Doctors also now have the ability to map entire DNA sequences to assess health. One of the goals is to create a personalized treatment plan based on individual biology. Instead of treating your patient with a drug that works some of the time, you can employ data science to custom-tailor a regimen for a specific person.

Researchers aim to achieve personalized care. As a beginning, the FDA has already begun to issue medicine labels that specify different dosages for patients with particular genetic variants.

However, prevention is always better than a cure and it also helps patients and hospitals save money. Mount Sinai used predictive analytics to reduce readmission rates. They combined data on disease, past hospital visits and other factors to determine a patient’s risk of readmission. These high-risk patients would then receive regular communication from hospital staff to help them avoid getting sick again.

Doctors can do a lot, but they can’t follow a patient around every minute of the day. Wearable body sensors – sensors tracking everything from heart rate to testosterone to body water – can. Sensors are just one way in which medical technology is moving beyond the hospital bed. Home-use, medical monitoring devices and mobile applications are cropping up daily. A scanner to diagnose melanomas? A personal EKG heart monitor? No problem. These gadgets are designed to help the patient, naturally, but they’re also busy harvesting data.

For example, Propeller’s GPS-enabled tracker, records inhaler usage by asthmatics. This information is collected, analyzed and merged with data on asthma catalysts from the CDC (e.g., high pollen counts in New England) to help doctors learn how best to prevent attacks. The Pittsburgh Health Data Alliance brings together three schools of specialty in healthcare, technology, and data to mine data and provide reports and analysis to researchers, practitioners and the public.

Big data also has the potential to contribute to a fully digital and unprecedentedly comprehensive electronic health record (EHR). This one precious file would contain every piece of information about a patient’s health, would always be up to date, and could be shared across any network. A patient’s file could contain structured data from every one of the patient’s health care providers (e.g., lab results, demographic information, prescription histories, etc.), unstructured professional data (e.g., notes from clinicians, physicians, PCPs and nurses), unstructured personal data (e.g., notes from in-home caregivers, family members, patients and social workers), and saved images from x-rays and MRI scans.

Data Scientist Demand in Healthcare

Many industries are facing big data analytics skills gaps. In the healthcare industry, most data is unstructured and difficult to analyze. Organizations are looking for analysts and data scientists who have mid-level to senior-level experience. According to a study in the Journal of the American Medical Informatics Association, data scientists are sought to help with patient outcomes, quality measurement, and reporting of finances. Most positions require a bachelor’s degree, though some require a master’s degree. With the right talent, hospitals and healthcare organizations are able to organize their data and extract actionable insights.

History of Data Analysis and Health Care

In 1950, soon after the outbreak of the Korean War, a physicist/dentist named Robert Ledley was offered a job at the National Bureau of Standards (now known the National Institute of Standards and Technology) in 1952. There he encountered the Standards Eastern Automatic Computer (SEAC). Ledley realized that SEAC could perform complex equations that no human could hope to tackle. He saw that physics, mathematics and computers might be combined to solve biomedical problems. In 1959, Ledley teamed up with a radiologist to publish an article that covered symbolic logic, probability and value theory, and educated physicians on the potential of databases and electronic diagnosis. The drive to computerize medicine had begun in earnest.

The 1900s saw the introduction of MEDLARS/MEDLINE, a computerized bibliographic database compiled by the National Library of Medicine, along with research and experimentation with programming languages. One of these databases was the Massachusetts General Hospital Utility Multi-Programming System (MUMPS). Developed by Neil Pappalardo, Curtis Marble and Robert Greenes from 1966-1967, MUMPS powered the creation and integration of medical databases. By the early 1970s, it was the most commonly used programming language for clinical applications.

The 1960s, you might say, the decade of peace, love and data:

  • Early 1960s: Morris Collen, a physician with Kaiser Permanente’s Division of Research, developed a system to automate the 10-year-old multiphasic health screening exam and a prototype electronic health record.
  • 1965: Work commences on Systematized Nomenclature of Pathology (SNOP), an effort to systematize the language of pathology for use in computer systems. In 1974, this was extended to include all medical terms – the famous Systematized Nomenclature of Medicine (SNOMED).
  • 1965: Congress amends the Social Security Act to create Medicare and Medicaid. This puts pressure on medical providers to provide documentation of care. Interest in health informatics receives a significant boost.

By 1968, Dr. Lawrence Weed was working on the Problem Oriented Medical Information System(PROMIS). Though it did not gain wide acceptance, PROMIS was a strong attempt to establish an integrated system covering all aspects of health care, including patient treatment, as well as the Problem-Oriented Medical Record (POMR). And, of course, there was the infant Internet. By the end of the 1970s decade, the idea of online data communications technology had spread beyond large teaching medical centers. Physicians were beginning to receive instant access to computerized databases.

It was not until the latter half of the 1980s and the early 1990s that the focus of healthcare technology began to shift towards clinical integration and improving the quality of patient care. Thanks to the Internet, networked technologies, large-scale databases and the development of relational database software, data was suddenly everywhere. Then came the new millennium and the “oughts.” Devices, gadgets and PDAs became ubiquitous in clinical settings. Storage capabilities continued to increase. The flow of available information surged to flood force, pushed by pharmaceutical research data, clinical data, activity and cost data, patient behavior data, and biological data.

In 2009 congress passed the Health Information Technology for Economic and Clinical Health Act (HITECH). In doing so, the government indicated it was willing to spend billions to promote and expand the adoption of health-information technology and create a nationwide network of electronic health records (EHRs).

Data Risks and Regulations

There are plenty of hurdles to creating a data-driven health care industry. Some are technical, some emotional. Health care providers have had decades to accumulate paper records, inefficiencies and entrenched routines.

Let’s say data scientists achieve their dream of a digital EHR. Who will have access to it? Who will own the data? How will it be protected? In an effort to increase risk management in healthcare, the HIPAA Act explicitly states that covered entities must:

“Protect individuals’ health records and other identifiable health information by requiring appropriate safeguards to protect privacy, and setting limits and conditions on the uses and disclosures that may be made of such information without patient authorization.”

But the record of U.S. medical providers isn’t exactly encouraging. In 2021 more than one healthcare security breach happened each day. EHRs also cry out for questions about relevancy. How much information is too much? After all, your financial records, social media photos, sexual history, location and your weekly liquor store bill are all relevant to your overall health. Should these be included in your digital file?

Last updated: April 27, 2022