It’s no secret that the U.S. healthcare industry is overpriced. Health care expenses represented 17.8% of the GDP in 2015. By 2025, this figure is estimated to rise to nearly 20%. Some say big data is the answer. Although the healthcare industry has been notoriously slow to harness its power (for more details, see our profiles of Biotechnology and Pharmaceuticals), there is hope. Accelerating costs are forcing payers and healthcare providers to shift from a fee-for-service approach (the more treatments, the better) to one that favors patient outcomes (rewarding providers for targeted treatments that actually work). Additionally, physicians are continuing to move towards evidence-based medicine, reviewing data from a huge range of sources before making treatment decisions. And that puts data scientists in a prime position.
Big Data in Healthcare
Big data has the potential to help physicians make better decisions across the board – from personalized treatments to preventive care.
Sponsored Online Master's Programs
Learn MoreTufts University
*GRE waivers available for applicants who have at least three years of work experience or hold an advanced degree
Learn MoreSyracuse University
* No GRE Scores Required
Learn MoreSouthern Methodist University
* GRE waivers available for applicants with 3+ years work experience.
Learn MoreUniversity of Denver
Learn MoreUniversity of California, Berkeley
* No GRE Scores Required
Learn MoreUniversity of Dayton
Learn MoreAmerican University
Learn MorePepperdine University
Now, doctors can merge and analyze data sets from clinical trials, electronic medical records, online patient networks, and genomics research. Doctors also now have the ability to map entire DNA sequences and measure tens of thousands of blood components to assess health. One of the top goals is to create a personalized treatment plan based on individual biology. Instead of treating your patient with a drug that works 80% of the time (e.g., the breast cancer drug, Tamoxifen), you can employ data science to custom-tailor a regimen just for her. Michael Walker points out on Data Science Central,
Researchers aim to achieve ultra-personalized care. As a beginning, the FDA has already begun to issue medicine labels that specify different dosages for patients with particular genetic variants.
However, prevention is always better than a cure and it also helps patients and hospitals save money. Mount Sinai used predictive analytics to reduce readmission rates. They combined data on disease, past hospital visits and other factors to determine a patient’s risk of readmission. These high-risk patients would then receive regular communication from hospital staff to help them avoid getting sick again.
Doctors can do a lot, but they can’t follow a patient around every minute of the day. Wearable body sensors – sensors tracking everything from heart rate to testosterone to body water – can. Sensors are just one way in which medical technology is moving beyond the hospital bed. Home-use, medical monitoring devices and mobile applications are cropping up daily. A scanner to diagnose melanomas? A personal EKG heart monitor? No problem. These gadgets are designed to help the patient, naturally, but they’re also busy harvesting data.
For example, Propeller’s GPS-enabled tracker, records inhaler usage by asthmatics. This information is collected, analyzed and merged with data on asthma catalysts from the CDC (e.g., high pollen counts in New England) to help doctors learn how best to prevent attacks. The Pittsburgh Health Data Alliance brings together three schools of specialty in healthcare, technology, and data to mine data and provide reports and analysis to researchers, practitioners and the public.
Another use of data science in the past few years has been in tracking (and finding ways to halt or prevent) diseases. At a 24-Hour Data Science Code-a-Thon hosted by Kaiser Permanente in 2013, teams used Hadoop technologies to map incidences of respiratory conditions (e.g., asthma flare-ups occurred in areas with higher ozone levels Big data also has the potential to contribute to a fully digital and unprecedentedly comprehensive electronic health record (EHR). This one precious file would contain every piece of information about a patient’s health, would always be up to date, and could be shared across any network. A patient’s file could contain structured data from every one of the patient’s health care providers (e.g., lab results, demographic information, prescription histories, etc.), unstructured professional data (e.g., notes from clinicians, physicians, PCPs and nurses), unstructured personal data (e.g., notes from in-home caregivers, family members, patients and social workers), and saved images from x-rays and MRI scans.
Data Scientist Demand in Healthcare
Many industries are facing big data analytics skills gaps. In the healthcare industry, most data is unstructured and difficult to analyze. A study from IBM in 2017 claims the need for clinical data review as a skill for data scientists indicates a growing demand for data-driven approaches to clinical care. In 2016, jobs specifying clinical data analysis saw significant growth and increased by 54%. The need for clinical analysts will increase by 14% and the need for Clinical Data Systems Specialists will increase by 24% by 2020. However, organizations are looking for analysts and data scientists who have mid-level to senior-level experience. IBM’s study reports that 71-78% of job listings for healthcare analysts require at least three years of experience. These listings request experience with healthcare informatics, analytics, and data management, such as data architecture and data governance. Most positions require a bachelor’s degree and some require a master’s degree. With the right talent, hospitals and healthcare organizations are able to organize their data and extract actionable insights.
History of Data Analysis and Health Care
In the hazy days of 1950, soon after the outbreak of the Korean War, a fresh-faced physicist/dentist named Robert Ledley was offered a job at the National Bureau of Standards in 1952. There he encountered the Standards Eastern Automatic Computer (SEAC). It was love at first sight. Ledley realized that SEAC could perform complex equations that no human could hope to tackle. He saw that physics, mathematics and computers might be combined to solve biomedical problems. In 1959, Ledley teamed up with a radiologist to publish an article that covered symbolic logic, probability and value theory, and educated physicians on the potential of databases and electronic diagnosis. The drive to computerize medicine had begun in earnest.
The 1900s saw the introduction of MEDLARS/MEDLINE, a computerized bibliographic database compiled by the National Library of Medicine, along with research and experimentation with programming languages. One of these databases was the Massachusetts General Hospital Utility Multi-Programming System (MUMPS) . Developed by Neil Pappalardo, Curtis Marble and Robert Greenes from 1966-1967, MUMPS powered the creation and integration of medical databases. By the early 1970s, it was the most commonly used programming language for clinical applications.
It was, you might say, the decade of peace, love and data:
- Early 1960s: Morris Collen, a physician with Kaiser Permanente’s Division of Research, develops a system to automate the 10-year-old multiphasic health screening exam and a prototype electronic health record.
- 1965: Work commences on Systematized Nomenclature of Pathology (SNOP), an effort to systematize the language of pathology for use in computer systems. In 1974, this was extended to include all medical terms – the famous Systematized Nomenclature of Medicine (SNOMED).
- 1965: Congress amends the Social Security Act to create Medicare and Medicaid. This puts pressure on medical providers to provide documentation of care. Interest in health informatics receives a significant boost.
Soon after, the healthcare industry began using computers to provide statistical reports to the government, create patient care applications, centralized medical records and, most importantly, organize billing. However, healthcare providers that did take the plunge ran into problems with machine speed and processing capabilities. Even worse, there was little integration between systems. Nevertheless, physicians forged onward.
By 1968, Dr. Lawrence Weed was working on the PRoblem Oriented Medical Information System(PROMIS). Though it did not gain wide acceptance, PROMIS was a strong attempt to establish an integrated system covering all aspects of health care, including patient treatment, as well as the Problem-Oriented Medical Record (POMR). And, of course, there was the infant Internet. By the end of the 1970s decade, the idea of online data communications technology had spread beyond large teaching medical centers. Physicians were beginning to receive instant access to computerized databases.
It was not until the latter half of the 1980s and the early 1990s that the focus of healthcare technology began to shift towards clinical integration and improving the quality of patient care. Thanks to the Internet, networked technologies, large-scale databases and the development of relational database software, data was suddenly everywhere. Then came the new millennium and the “oughts.” Devices, gadgets and PDAs became ubiquitous in clinical settings. Storage capabilities continued to increase. The flow of available information surged to flood force, pushed by pharmaceutical research data, clinical data, activity and cost data, patient behavior data, and biological data.
In 2009 congress passed the Health Information Technology for Economic and Clinical Health Act (HITECH). In doing so, the government indicated it was willing to spend billions to promote and expand the adoption of health-information technology and create a nationwide network of electronic health records (EHRs).
Data Risks and Regulations
There are plenty of hurdles to creating a data-driven health care industry. Some are technical, some emotional. Health care providers have had decades to accumulate paper records, inefficiencies and entrenched routines. A remedy will not be quick.
And some say it shouldn’t. At least, not without a hard look at patient privacy, data ownership and the overall direction of U.S. health care.
Let’s say data scientists achieve their dream of a digital EHR. Who will have access to it? Who will own the data? How will it be protected? In an effort to increase risk management in healthcare, the HIPAA Act explicitly states that covered entities must:
“Protect individuals’ health records and other identifiable health information by requiring appropriate safeguards to protect privacy, and setting limits and conditions on the uses and disclosures that may be made of such information without patient authorization.”
But the record of U.S. medical providers isn’t exactly encouraging. Privacy Rights Clearinghouse indicates that from 2005 to January 2018, the industry experienced over 7,000 health data security breaches (i.e., unintended disclosure, hacking, etc.), resulting in compromised data. EHRs also cry out for questions about relevancy. How much information is too much? After all, your financial records, social media photos, sexual history, location and your weekly liquor store bill are all relevant to your overall health. Should these be included in your digital file?