Opportunities in Health Care Data Science
The Promise of Big Data
It’s no secret that the U.S. healthcare industry is an overpriced, inefficient mess. In 2012, the authors of How Data Science Is Transforming Health Care: Solving the Wanamaker Dilemma reported that the U.S. was spending over $2.6 trillion on health care each year; $600 billion of those costs include treatments that either do not help or actually cause harm.
Health care expenses represented 17.8% of the GDP in 2015 By 2025, this figure is estimated to rise to nearly 20%. Some say big data is the answer. Although the healthcare industry has been notoriously slow to harness its power (for more details, see our profiles of Biotechnology and Pharmaceuticals), there is hope:
- Accelerating costs are forcing payers and healthcare providers to shift from a fee-for-service approach (the more treatments, the better) to one that favors patient outcomes (rewarding providers for targeted treatments that actually work).
- Physicians are continuing to move towards evidence-based medicine, reviewing data from a huge range of sources before making treatment decisions.
That puts data scientists in a prime position. Used wisely, big data has the potential to help physicians make better decisions across the board – from personalized treatments to preventive care.
Oh, and it has one other major benefit of interest to the healthcare industry… it slashes costs.
George Washington University
University of Denver
Southern Methodist University
Imagine you’re a doctor treating a patient with cancer. In the past, you might have based your treatment plan on the results of double-blind studies. These studies may have been rigorous, but they may have failed to take patient differences into account.
Big data changes the game. Now you can merge and analyze data sets from:
- Clinical trials
- Direct observations of other physicians
- Electronic medical records
- Online patient networks
- Genomics research (see below)
- and more…
One of the top goals is to create a personalized treatment plan based on individual biology. Instead of treating your patient with a drug that works 80% of the time (e.g., the breast cancer drug, Tamoxifen), you can employ data science to custom-tailor a regimen just for her.
Inexpensive DNA sequencing and next-generation genomic technologies are changing the way health care providers do business. As Michael Walker points out, we now have the ability to map entire DNA sequences and measure tens of thousands of blood components to assess health:
“Next-generation genomic technologies allow data scientists to drastically increase the amount of genomic data collected on large study populations. When combined with new informatics approaches that integrate many kinds of data with genomic data in disease research, we will better understand the genetic bases of drug response and disease.”
Researchers aim to achieve ultra-personalized care. As a beginning, the FDA has already begun to issue medicine labels that specify different dosages for patients with particular genetic variants.
Predictive Analytics and Preventive Measures
Prevention is always better than cure. For the healthcare industry, it also happens to save a lot of money.
Take, for example, the partnership between Mount Sinai Medical Center and former Facebook guru Jeff Hammerbacher. Mount Sinai’s problem was how to reduce readmission rates. Hammerbacher’s solution was predictive analytics:
- In a pilot study, Hammerbach and his team combined data on disease, past hospital visits and other factors to determine a patient’s risk of readmission.
- These high-risk patients would then receive regular communication from hospital staff to help them avoid getting sick again.
Sinai isn’t alone. In 2008, Texas Health partnered with Healthways to merge and analyze clinical and insurance claims information. Their goal was the same – identify high-risk patients and offer them customized interventions.
Meanwhile, in 2013, data scientists at Methodist Health System are looking at accountable-care organization claims from 14,000 Medicare beneficiaries and 6,000 employees. Their aim? You guessed it. Predict which patients will need high-cost care in the future.
Patient Monitoring and Home Devices
Doctors can do a lot, but they can’t follow a patient around every minute of the day. Wearable body sensors – sensors tracking everything from heart rate to testosterone to body water – can.
Sensors are just one way in which medical technology is moving beyond the hospital bed. Home-use, medical monitoring devices and mobile applications are cropping up daily. A scanner to diagnose melanomas? A personal EKG heart monitor? No problem.
These gadgets are designed to help the patient, naturally, but they’re also busy harvesting data.
For example, Propeller’s GPS-enabled tracker, records inhaler usage by asthmatics. This information is collected, analyzed and merged with data on asthma catalysts from the CDC (e.g., high pollen counts in New England) to help doctors learn how best to prevent attacks. The Pittsburgh Health Data Alliance brings together three schools of specialty in healthcare, technology, and data to mine data and provide reports and analysis to researchers, practitioners and the public.
It’s a “patient heal thyself” world, now. Developments like personal genetic testing (e.g., 23andMe.com), online patient networks and behavioral apps like Be Well are allowing individuals to take control of their own health.
This is getting data scientists very excited. In Big Data and the Consumerization of Healthcare, the author envisions expanding the Be Well app to include long-term analysis of behavioral patterns. Big data could help individuals create a “life report” that connects ongoing changes to current conditions and gives them new perspectives on their well-being.
There’s another benefit to empowering patients. Their insights can be mined for data. In the same big data and comsumerizationarticle, the author notes:
“A community such as PatientsLikeMe groups over 150,000 patients who share their symptoms, concerns, experiences with treatment and healing stories about over 1,000 conditions.”
That’s a lot of information about symptoms, treatments and side effects that hospitals, pharmaceutical companies and researchers are interested in hearing about.
Disease Modeling and Mapping
One of the flashiest uses of data science in the past few years has been in tracking (and finding ways to halt or prevent) diseases.
- At a 24-Hour Data Science Code-a-Thon hosted by Kaiser Permanente in 2013, teams used Hadoop technologies to map incidences of respiratory conditions (e.g., asthma flare-ups occurred in areas with higher ozone levels for extended periods during the summer).
- While developing the open-source modeling application Spatio-Temporal Epidemiological Modeler (STEM), as reported in 2013, researchers discovered links between changes in local climate and temperature and the spread of outbreaks of dengue and malaria.
- Mount Sinai Medical Center researchers have found genetic markers that link a risk for type 2 diabetes and Alzheimer’s.
In these cases, a picture is worth a thousand words. Check out the Mount Sinai diabetes maps in this article from Fast Co.Exist.
The Ultimate EHR
One of the biggest dreams of all is a fully digital and unprecedentedly comprehensive electronic health record (EHR). You may also see it referred to as an electronic medical record (EMR). This one precious file would contain every piece of information about a patient’s health, would always be up to date, and could be shared across any network.
A patient’s file could contain:
- Structured data from every one of the patient’s health care providers (e.g., lab results, demographic information, prescription histories, etc.)
- Unstructured professional data (e.g., notes from clinicians, physicians, PCPs and nurses)
- Unstructured personal data (e.g., notes from in-home caregivers, family members, patients and social workers)
- Saved images (e.g., X-rays and MRI scans)
- Genomic data
According to the 2016 Report to Congress on Health IT Progress, in 2015, 96% of hospitals used certified EHR technology.
Data Risks and Regulations
The Challenges Ahead
There are plenty of hurdles to creating a data-driven health care industry. Some are technical, some emotional. Health care providers have had decades to accumulate paper records, inefficiencies and entrenched routines. A remedy will not be quick.
And some say it shouldn’t. At least, not without a hard look at patient privacy, data ownership and the overall direction of U.S. health care.
Let’s say data scientists achieve their dream of a digital EHR. Who will have access to it? Who will own the data? How will it be protected? The HIPAA Act explicitly states that covered entities must:
“Protect individuals’ health records and other identifiable health information by requiring appropriate safeguards to protect privacy, and setting limits and conditions on the uses and disclosures that may be made of such information without patient authorization.”
But the record of U.S. medical providers isn’t exactly encouraging. Privacy Rights Clearinghouse indicates that from 2005 to January 2018, the healthcare industry experienced over 7,000 security breaches (i.e., unintended disclosure, hacking, etc.), resulting in compromised data.
How Much Data is Too Much?
EHRs also cry out for questions about relevancy. How much information is too much? After all, your financial records, social media photos, sexual history, location and your weekly liquor store bill are all relevant to your overall health. Should these be included in your digital file?
Now add the constant stream of data from body sensors and in-home devices. A competent hacker will know where you are, where you’ve been and even whether or not you’re likely to experience a heart attack in the next 24 hours.
History of Data Analysis and Health Care
In the hazy days of 1950, soon after the outbreak of the Korean War, a fresh-faced physicist/dentist named Robert Ledley was given an offer he couldn’t refuse:
“The army called me down to New York… and the colonel said to me, ‘Well, if you volunteer to be in the army, then you’ll become a lieutenant, an officer. But if you don’t volunteer, you’ll be drafted anyway, and sent to boot camp. So I volunteered.”
After a stint at Walter Reed General Hospital, Ledley was offered a job at the National Bureau of Standards in 1952. There he encountered the Standards Eastern Automatic Computer (SEAC). It was love at first sight.
Ledley realized that SEAC could perform complex equations that no human could hope to tackle. He saw that physics, mathematics and computers might be combined to solve biomedical problems.
In 1959, Ledley and a radiologist named Lee B. Lusted teamed up to publish “Reasoning Foundations of Medical Diagnosis.” As a primer in operations research techniques, the article covered symbolic logic, probability and value theory, and educated physicians on the potential of databases and electronic diagnosis. The drive to computerize medicine had begun in earnest.
Ledley wasn’t the only researcher interested in the potential of computer science. The concept of health informatics – the study of resources and methods for managing health information – had been kicking around in other countries for years.
But for a post-war, cash-rich U.S. government, this field was especially intriguing. Ledley’s work helped pave the way for change.
The 190s saw the introduction of MEDLARS/MEDLINE, a computerized bibliographic database compiled by the National Library of Medicine, along with research and experimentation with programming languages.
One of these was the Massachusetts General Hospital Utility Multi-Programming System (MUMPS) . Developed by Neil Pappalardo, Curtis Marble and Robert Greenes from 1966-1967, MUMPS powered the creation and integration of medical databases. By the early 1970s, it was the most commonly used programming language for clinical applications.
It was, you might say, the decade of peace, love and data:
- Early 1960s: Morris Collen, a physician with Kaiser Permanente’s Division of Research, develops a system to automate the 10-year-old multiphasic health screening exam and a prototype electronic health record.
- 1965: Work commences on Systematized Nomenclature of Pathology (SNOP), an effort to systematize the language of pathology for use in computer systems. In 1974, this was extended to include all medical terms – the famous Systematized Nomenclature of Medicine (SNOMED).
- 1965: Congress amends the Social Security Act to create Medicare and Medicaid. This puts pressure on medical providers to provide documentation of care. Interest in health informatics receives a significant boost.
The healthcare industry began using computers to provide statistical reports to the government, create patient care applications, centralized medical records and, most importantly, organize billing.
The Promise of PROMIS
Despite all this good will, computerized medicine in the 1970s remained a hodgepodge. Computer manufacturers did not always understand the hospital market, and hospitals did not always understand computers.
Healthcare providers that did take the plunge ran into problems with machine speed and processing capabilities. Even worse, there was little integration between systems.
Nevertheless, physicians forged onward:
- By 1968, Dr. Lawrence Weed was working on the PRoblem Oriented Medical Information System (PROMIS). Though it did not gain wide acceptance, PROMIS was a strong attempt to establish an integrated system covering all aspects of health care, including patient treatment, as well as the Problem-Oriented Medical Record (POMR).
- In a similar project, Dr. Homer Warner and his colleagues at Intermountain Healthcare designed the Health Evaluation Through Logical Processing (HELP) system during the late 1960s and early 1970s. HELP provided one of the nation’s first versions of an electronic medical record.
And, of course, there was the infant Internet. By the end of the decade, the idea of online data communications technology had spread beyond large teaching medical centers. Physicians were beginning to receive instant access to computerized databases.
The arrival of affordable, increasingly powerful technology in the late 1970s and early 1980s accelerated developments. Large, multi-application vendors stepped up to meet industry demand. Organizations started developing protocols for health care information and data collection.
But it was not until the latter half of the 1980s and the early 1990s that the focus of healthcare technology began to shift towards clinical integration and improving the quality of patient care.
That’s because everyone – finally – caught up to each other. Thanks to the Internet, networked technologies, large-scale databases and the development of relational database software, data was suddenly everywhere.
Even politicians sat up and took notice. In 1996, Edward Kennedy and Nancy Kassebaum pushed the Health Insurance Portability and Accountability Act (HIPAA), aka the Kennedy–Kassebaum Act, through Congress. Designed to encourage the use of electronic data interchange in the U.S. healthcare system, HIPAA mandated the establishment of national standards for electronic health care transactions. It also included provisions for patient privacy.
The U.S. goes HITECH
Then came the new millennium and the “oughts.” And here’s where things got complicated. As the authors of the McKinsey report, The Big Data Revolution in U.S. Health Care, note:
- Payors and providers began to digitize patient records.
- Pharmaceutical companies continued to transfer years’ worth of research and development data into medical databases.
- The federal government and similar stakeholders started allowing public access to a treasure trove of health-care knowledge, including clinical trial data and information on patients covered under public insurance.
Devices, gadgets and PDAs became ubiquitous in clinical settings. Storage capabilities continued to increase. The flow of available information surged to flood force, pushed by
- Pharmaceutical research data (e.g., clinical trial results)
- Clinical data (e.g., patient records)
- Activity and cost data (e.g., estimated procedure costs)
- Patient behavior data (e.g., health purchases history)
- Biological data (e.g., genomics)
In 2004, George Bush responded by establishing the Office of the National Coordinator for Health Information Technology in order to encourage technological development and electronic information flow in the field.
This role took on new meaning when Congress passed the Health Information Technology for Economic and Clinical Health Act (HITECH) in 2009. In doing so, the government indicated it was willing to spend billions to promote and expand the adoption of health-information technology and create a nationwide network of electronic health records (EHRs).