Opportunities in Health Care Data Science
The Promise of Big Data
It’s no secret that the U.S. health care industry is an overpriced, inefficient mess. In 2012, the authors of How Data Science Is Transforming Health Care: Solving the Wanamaker Dilemma reported that the U.S. was spending over $2.6 trillion on health care each year; $600 billion of those costs include treatments that either do not help or actually cause harm.
Even more depressingly, health care expenses represented 17.6% of the GDP in 2013, $600 billion of which are consumed by waste and fraud. By 2020, this figure is estimated to rise to nearly 20%. The country ranks 37th out of developed economies in life expectancy and other measures of health.
Some say big data is the answer. Although the health care industry has been notoriously slow to harness its power (for more details, see our profiles of Biotechnology and Pharmaceuticals), there is hope:
- Accelerating costs are forcing payors and health-care providers to shift from a fee-for-service approach (the more treatments, the better) to one that favors patient outcomes (rewarding providers for targeted treatments that actually work).
- Physicians are continuing to move towards evidence-based medicine, reviewing data from a huge range of sources before making treatment decisions.
- In 2012, the NIH devoted approximately $15 million in award funding for eight projects to research uses of big data.
That puts data scientists in a prime position. Used wisely, big data has the potential to help physicians make better decisions across the board – from personalized treatments to preventive care.
Oh, and it has one other major benefit of interest to the health care industry… it slashes costs.
Sponsored Master's in Data Science Programs
More InfoSyracuse University
* GRE waivers are available.
More InfoSouthern Methodist University
* GRE waivers available for experienced applicants
More InfoAmerican University
More InfoSyracuse University
More InfoUniversity of California-Berkeley
More InfoSyracuse University
More InfoGeorge Washington University
Imagine you’re a doctor treating a patient with cancer. In the past, you might have based your treatment plan on the results of double-blind studies. These studies may have been rigorous, but they may have failed to take patient differences into account.
Big data changes the game. Now you can merge and analyze data sets from:
- Clinical trials
- Direct observations of other physicians
- Electronic medical records
- Online patient networks
- Genomics research (see below)
- and more…
One of the top goals is to create a personalized treatment plan based on individual biology. Instead of treating your patient with a drug that works 80% of the time (e.g., the breast cancer drug, Tamoxifen), you can employ data science to custom-tailor a regimen just for her.
Inexpensive DNA sequencing and next-generation genomic technologies are changing the way health care providers do business. As Michael Walker points out, we now have the ability to map entire DNA sequences and measure tens of thousands of blood components to assess health:
“Next-generation genomic technologies allow data scientists to drastically increase the amount of genomic data collected on large study populations. When combined with new informatics approaches that integrate many kinds of data with genomic data in disease research, we will better understand the genetic bases of drug response and disease.”
Researchers aim to achieve ultra-personalized care. As a beginning, the FDA has already begun to issue medicine labels that specify different dosages for patients with particular genetic variants.
Predictive Analytics and Preventive Measures
Prevention is always better than cure. For the health-care industry, it also happens to save a lot of money. (The Centers for Medicaid and Medicare Services, for instance, can penalize hospitals that exceed average rates of readmission – indicating that they could be doing more to prevent medical problems.)
Take, for example, the partnership between Mount Sinai Medical Center and former Facebook guru Jeff Hammerbach. Mount Sinai’s problem was how to reduce readmission rates. Hammerbach’s solution was predictive analytics:
- In a pilot study, Hammerbach and his team combined data on disease, past hospital visits and other factors to determine a patient’s risk of readmission.
- These high-risk patients would then receive regular communication from hospital staff to help them avoid getting sick again.
Sinai isn’t alone. In 2008, Texas Health partnered with Healthways to merge and analyze clinical and insurance claims information. Their goal was the same – identify high-risk patients and offer them customized interventions.
Meanwhile, in 2013, data scientists at Methodist Health System are looking at accountable-care organization claims from 14,000 Medicare beneficiaries and 6,000 employees. Their aim? You guessed it. Predict which patients will need high-cost care in the future.
Patient Monitoring and Home Devices
Doctors can do a lot, but they can’t follow a patient around every minute of the day. Wearable body sensors – sensors tracking everything from heart rate to testosterone to body water – can.
Sensors are just one way in which medical technology is moving beyond the hospital bed. Home-use, medical monitoring devices and mobile applications are cropping up daily. A scanner to diagnose melanomas? A personal EEG heart monitor? No problem.
These gadgets are designed to help the patient, naturally, but they’re also busy harvesting data.
- Asthmapolis’s GPS-enabled tracker, already available by 2011, records inhaler usage by asthmatics. This information is collated, analyzed and merged with data on asthma catalysts from the CDC (e.g., high pollen counts in New England) to help doctors learn how best to prevent attacks.
- With Ginger.io’s mobile application, out in 2012, patients consent to have data about their calls, texts, location and movements monitored. These are combined with data on behavioral health from the NIH and other sources to pinpoint potential problems. Too many late-night phone calls, for instance, might signal a higher risk of anxiety attack.
- To improve patient drug compliance, Eliza, a Boston-based company, monitors which types of reminders work on which types of people. Smarter targeting means more compliance.
It’s a “patient heal thyself” world, now. Developments like personal genetic testing (e.g., 23andMe.com), online patient networks and behavioral apps like Be Well are allowing individuals to take control of their own health.
This is getting data scientists very excited. In Big Data and the Consumerization of Healthcare, the author envisions expanding the Be Well app to include long-term analysis of behavioral patterns. Big data could help individuals create a “life report” that connects ongoing changes to current conditions and gives them new perspectives on their well-being.
There’s another benefit to empowering patients. Their insights can be mined for data. In the same Data Science Series article, the author notes:
“A community such as PatientsLikeMe groups over 150,000 patients who share their symptoms, concerns, experiences with treatment and healing stories about over 1,000 conditions.”
That’s a lot of information about symptoms, treatments and side effects that hospitals, pharmaceutical companies and researchers are interested in hearing about.
Disease Modeling and Mapping
One of the flashiest uses of data science in the past few years has been in tracking (and finding ways to halt or prevent) diseases.
- At a 24-Hour Data Science Code-a-Thon hosted by Kaiser Permanente in 2013, teams used Hadoop technologies to map incidences of respiratory conditions (e.g., asthma flare-ups occurred in areas with higher ozone levels for extended periods during the summer).
- While developing the open-source modeling application Spatio-Temporal Epidemiological Modeler (STEM), as reported in 2013, researchers discovered links between changes in local climate and temperature and the spread of outbreaks of dengue and malaria.
- By mapping hot spots for Type 2 diabetes, Mount Sinai Medical Center is hoping to improve treatments. After identifying the hot spots, researchers focus on which genetic factors are involved in those environments. They can then create better guidelines for physicians and more tailored treatments.
In these cases, a picture is worth a thousand words. Check out the Mount Sinai diabetes maps in this article from Fast Co.Exist.
The Ultimate EHR
One of the biggest dreams of all is a fully digital and unprecedentedly comprehensive electronic health record (EHR). You may also see it referred to as an electronic medical record (EMR). This one precious file would contain every piece of information about a patient’s health, would always be up to date, and could be shared across any network.
Are we there yet? Not by a long shot. But that doesn’t stop data scientists from fantasizing about a file that contains:
- Structured data from every one of the patient’s health care providers (e.g., lab results, demographic information, prescription histories, etc.)
- Unstructured professional data (e.g., notes from clinicians, physicians, PCPs and nurses)
- Unstructured personal data (e.g., notes from in-home caregivers, family members, patients and social workers)
- Saved images (e.g., X-rays and MRI scans)
- Genomic data
The U.S. Department of Health and Human Services is already overseeing a plan to ensure the widespread adoption of EMRs, but success would mean complete centralization with the government near the center.
One record to rule them all? Stranger things have happened.
Data Risks and Regulations
The Challenges Ahead
There are plenty of hurdles to creating a data-driven health care industry. Some are technical, some emotional. Health care providers have had decades to accumulate paper records, inefficiencies and entrenched routines. A remedy will not be quick.
And some say it shouldn’t. At least, not without a hard look at patient privacy, data ownership and the overall direction of U.S. health care.
Let’s say data scientists achieve their dream of a digital EHR. Who will have access to it? Who will own the data? How will it be protected? The HIPAA Act explicitly states that covered entities must:
“Protect individuals’ health records and other identifiable health information by requiring appropriate safeguards to protect privacy, and setting limits and conditions on the uses and disclosures that may be made of such information without patient authorization.”
But the record of U.S. medical providers isn’t exactly encouraging. Privacy Rights Clearinghouse indicates that from 2005 to December, 2013, the health care industry experienced 1,066 security breaches (i.e., unintended disclosure, hacking, etc.), resulting in compromised data.
How Much Data is Too Much?
EHRs also cry out for questions about relevancy. How much information is too much? After all, your financial records, your social media photos, your sexual history, your location and your weekly liquor store bill are all relevant to your overall health. Should these be included in your digital file?
Now add the constant stream of data from body sensors and in-home devices. A competent hacker will know where you are, where you’ve been and even whether or not you’re likely to experience a heart attack in the next 24 hours.
Muddling it Out
The big-data dream assumes that health care providers are organized, efficient institutions with dedicated technology teams on hand. Reality check: they’re not.
Instead, they are oversized, secretive, often competitive, institutions saddled with a variety of data-related problems:
- Information locked in inaccessible organizational “silos”
- Staff unwilling or unable to change their practices
- Poor communication from data scientists
- Missing, flawed or misinterpreted data sets
- Conflicting results
The list goes on.
In the end, too, we have to remember that data science is not a panacea for the health care industry’s ills. We’d need a much bigger pill than that.
History of Data Analysis and Health Care
“I’ve been asked a lot for my view on American health care. Well, ‘it would be a good idea,’ to quote Gandhi.” – Paul Farmer
In the hazy days of 1950, soon after the outbreak of the Korean War, a fresh-faced physicist/dentist named Robert Ledley was given an offer he couldn’t refuse:
“The army called me down to New York… and the colonel said to me, ‘Well, if you volunteer to be in the army, then you’ll become a lieutenant, an officer. But if you don’t volunteer, you’ll be drafted anyway, and sent to boot camp. So I volunteered.”
After a stint at Walter Reed General Hospital, Ledley was offered a job at the National Bureau of Standards in 1952. There he encountered the Standards Eastern Automatic Computer (SEAC). It was love at first sight.
Ledley realized that SEAC could perform complex equations that no human could hope to tackle. He saw that physics, mathematics and computers might be combined to solve biomedical problems.
In 1959, Ledley and a radiologist named Lee B. Lusted teamed up to publish “Reasoning Foundations of Medical Diagnosis.” As a primer in operations research techniques, the article covered symbolic logic, probability and value theory, and educated physicians on the potential of databases and electronic diagnosis. The drive to computerize medicine had begun in earnest.
Ledley wasn’t the only researcher interested in the potential of computer science. The concept of health informatics – the study of resources and methods for managing health information – had been kicking around in other countries for years.
But for a post-war, cash-rich U.S. government, this field was especially intriguing. Ledley’s work helped pave the way for change. Between 1960 and 1964, the NIH spent over $40 million establishing dozens of technology-led biomedical research centers.
The 1960s also saw the introduction of MEDLARS/MEDLINE, a computerized bibliographic database compiled by the National Library of Medicine, along with research and experimentation with programming languages.
One of these was MUMPS (Massachusetts General Hospital Utility Multi-Programming System). Developed by Neil Pappalardo, Curtis Marble and Robert Greenes from 1966-1967, MUMPS powered the creation and integration of medical databases. By the early 1970s, it was the most commonly used programming language for clinical applications.
It was, you might say, the decade of peace, love and data:
- Early 1960s: Morris Collen, a physician with Kaiser Permanente’s Division of Research, develops a system to automate the 10-year-old multiphasic health screening exam and a prototype electronic health record.
- 1965: Work commences on Systematized Nomenclature of Pathology (SNOP), an effort to systematize the language of pathology for use in computer systems. In 1974, this was extended to include all medical terms – the famous Systematized Nomenclature of Medicine (SNOMED).
- 1965: Congress amends the Social Security Act to create Medicare and Medicaid. This puts pressure on medical providers to provide documentation of care. Interest in health informatics receives a significant boost.
The health care industry began using computers to provide statistical reports to the government, create patient care applications, centralize medical records and, most importantly, organize billing.
The Promise of PROMIS
Despite all this good will, computerized medicine in the 1970s remained a hodgepodge. Computer manufacturers did not always understand the hospital market, and hospitals did not always understand computers.
Healthcare providers that did take the plunge ran into problems with machine speed and processing capabilities. Even worse, there was little integration between systems.
Nevertheless, physicians forged onward:
- By 1968, Dr. Lawrence Weed was working on the PRoblem Oriented Medical Information System (PROMIS). Though it did not gain wide acceptance, PROMIS was a strong attempt to establish an integrated system covering all aspects of health care, including patient treatment, as well as the Problem-Oriented Medical Record (POMR).
- In a similar project, Dr. Homer Warner and his colleagues at Intermountain Healthcare designed the HELP system (Health Evaluation Through Logical Processing) during the late 1960s and early 1970s. HELP provided one of the nation’s first versions of an electronic medical record.
And, of course, there was the infant Internet. By the end of the decade, the idea of online data communications technology had spread beyond large teaching medical centers. Physicians were beginning to receive instant access to computerized databases.
The arrival of affordable, increasingly powerful technology in the late 1970s and early 1980s accelerated developments. Large, multi-application vendors stepped up to meet industry demand. Organizations started developing protocols for health care information and data collection.
But it was not until the latter half of the 1980s and the early 1990s that the focus of health care technology began to shift towards clinical integration and improving the quality of patient care.
That’s because everyone – finally – caught up to each other. Thanks to the Internet, networked technologies, large-scale databases and the development of relational database software, data was suddenly everywhere.
Even politicians sat up and took notice. In 1996, Edward Kennedy and Nancy Kassebaum pushed the Health Insurance Portability and Accountability Act (HIPAA), aka the Kennedy–Kassebaum Act, through Congress. Designed to encourage the use of electronic data interchange in the U.S. health care system, HIPAA mandated the establishment of national standards for electronic health care transactions. It also included provisions for patient privacy.
The U.S. goes HITECH
Then came the new milennium and the “oughts.” And here’s where things got complicated. As the authors of the McKinsey report, The Big Data Revolution in U.S. Health Care, note:
- Payors and providers began to digitize patient records.
- Pharmaceutical companies continued to transfer years’ worth of research and development data into medical databases.
- The federal government and similar stakeholders started allowing public access to a treasure trove of health-care knowledge, including clinical trial data and information on patients covered under public insurance.
Devices, gadgets and PDAs became ubiquitous in clinical settings. Storage capabilities continued to increase. The flow of available information surged to flood force, pushed by
- Pharmaceutical research data (e.g., clinical trial results)
- Clinical data (e.g., patient records)
- Activity and cost data (e.g., estimated procedure costs)
- Patient behavior data (e.g., health purchases history)
- Biological data (e.g., genomics)
In 2004, George Bush responded by establishing the Office of the National Coordinator for Health Information Technology in order to encourage technological development and electronic information flow in the field.
This role took on new meaning when Congress passed the Health Information Technology for Economic and Clinical Health Act (HITECH) in 2009. In doing so, the government indicated it was willing to spend billions to promote and expand the adoption of health-information technology and create a nationwide network of electronic health records (EHRs).