The Promise of Big Data For every 5,000 compounds starting in the laboratory, five are tested in humans and one makes it to market.
Moreover, it takes approximately 10 years and an average cost of $2-3 billion to develop each new drug. That adds up to a vast amount of molecular and clinical data stored in proprietary networks, just ripe for analytics.
Rutgers Data Science Bootcamp
Gain skills needed to analyze data and deliver value to organizations. Complete projects using real data sets from the worlds of finance, healthcare, government, social welfare, and more.
Southern Methodist University
SMU Data Science Boot Camp
Develop concrete, in-demand data skills and learn how to help drive business decisions and solve challenges that companies are facing. No programming experience required.
Northwestern Data Science and Visualization Boot Camp
Northwestern Data Science and Visualization Bootcamp teaches practical and technical skills in 24 intensive weeks. Students apply their knowledge to hands-on projects that translate directly into work in the field.
Georgia Institute of Technology
Georgia Tech Data Science and Analytics Boot Camp
Expand your skill set and grow as a data scientist. Georgia Tech Data Science and Analytics Boot Camp covers the skills needed to analyze and solve complex data analytics and visualization problems.
University of Southern California
USC Viterbi Data Analytics Boot Camp
Expand your skill set and grow as a data analyst. This program covers the specialized skills to be successful in the field of data in 24 weeks.
There has been a push in recent years to increase collaboration – both internally and with the outside world. To gain a competitive edge, increase their expertise and enlarge their ever-growing databanks, pharmas are now working with:
External Partners: These might include Contract Research Organizations (CROs) or data management companies. For example, in 2013, GlaxoSmithKline announced a partnership with SAS to provide a globally accessible private cloud where the pharmaceutical industry can securely collaborate around anonymous clinical trial information.
Academic Collaborators: To get a first look at compounds being developed outside of the company, Eli Lilly created the Phenotypic Drug Discovery Initiative. External researchers submit their compounds for screening and Lilly uses its proprietary tools and data to identify whether any of them have the potential to become drugs.
Customers and Health Professionals: Thanks to the explosive growth of social media, pharmas can personally reach out to their customers and physicians. They’re also conducting sentiment analysis of online physician communities, electronic medical records, and consumer-generated media to flag potential safety issues. These data can then be used to shape strategy throughout the pipeline progression.
Insurance Companies: By creating proprietary data networks where payors and providers can share, analyze and respond to outcomes and claims data, pharmas are able to enlarge their databanks far beyond clinical trials.
The power to forecast the future has applications for drug discovery and avoiding negative outcomes.
In terms of drug discovery, Pharmas spend a vast amount of money screening compounds to test in preclinical trials. To speed up the process, drug companies are using predictive models to search gargantuan virtual databases of molecular and clinical data. Analysts zoom in on likely drug candidates with the help of criteria based on chemical structure, diseases/targets and other characteristics.
For example, Numerate, which works with companies like Boehringer Ingelheim and Merck, designs its predictive models with specific drug targets and treatment goals in mind.
In relation to avoiding negative outcomes, predictive modeling can also be used to short-circuit potential disasters such as deaths from risk factors.
Predictive analytics can also be used to optimize clinical trials through the selection of optimal patients through genetic clustering, and to improve marketing efforts.
In recent years, pharmaceutical companies and institutions have sponsored crowdsourced contests to predict patient and clinical outcomes, sales patterns, molecule activity, and anything else involving big data.
Data scientists can help to reduce the costs of clinical trials by enabling drug companies to implement:
Data-Based Patient Selection: Pharmas use multiple data sources – including social media and public health databases – and more targeted criteria (e.g., genetic information) to identify which populations would work best in trials.
Real-Time Monitoring: Companies now monitor real-time data from trials to identify safety or operational risks and nip problems in the bud.
Drug Safety Assurance: Data scientists can even tap into side-effect data to predict whether a compound will provoke an adverse reaction before it even reaches trial. Working the University of California-San Francisco, researchers at Novartis have built computer models to do just that.
Targeted Marketing and Sales
Once upon a time, pharmaceutical companies would send their reps on lengthy doctors’ visits and invest in expensive, broad-scale product promotion.
Some of that money is likely to go into monitoring doctors’ therapeutic tastes, geographic trends, peak prescription rates – anything that has a direct relevance to the sales cycle. This data then feeds into:
Predictive Analytics: Drug companies are employing predictive methods to determine which consumers and physicians are most likely to utilize a drug and create more targeted on-the-ground marketing efforts.
Sophisticated Sales: Pharmas are providing drug reps with mobile devices and real-time analytics on their prospects. Reps can then tailor their agenda to suit the physician. Afterward, the sales team can analyze the results to determine whether the approach was effective.
Better Patient Follow-Ups
With the development of miniature biosensors, sophisticated at-home devices, smart pills and bottles, smartphones and health apps, monitoring a patient’s health has never been easier. Pharmaceutical companies are increasingly interested in how the real-time data from these tools can be used to support R&D, analyze efficacy and increase drug sales.
In addition to knowing how their drugs are being used, companies also typically want to hear how customers view their products. Opinions about new drugs are often generated through patient/physician and patient/patient experiences in a way that creates messy, unstructured data sets.
However, if properly organized and analyzed, this data can be a rich trove of information on:
Patterns in drug-drug interactions
What drives patients to stop taking medications
Which patients will not stick to their prescriptions
Data Risks and Regulations
The Challenges Ahead
By the time it reached the 21st century, the pharmaceutical industry had amassed quantities of structured and semi-structured data in separate, mutually inaccessible “silos”.
Unfortunately, it’s having trouble bringing those silos together. However analysts slice the numbers, the ultimate goal is to turn the data into information that can play a strategic business role. The prerequisite for that is making the data sources communicate with one another in a meaningful way.
What’s more, drug companies are coping with a flood of unstructured information – including social sentiment – that’s coming at them from outside sources. Integrating, manipulating, organizing, and interpreting this data to support some coherent course of action is causing more than a few headaches.
Working with the Fed
Data integration raises the issue of patient privacy. The more databases are shared among institutions, CROs, partners, software companies, etc., the more pharmas run the risk of exposing sensitive patient information to the eyes of those who shouldn’t be seeing it.
That means drug companies need to reduce their exposure to running afoul of federal laws and regulations. For example, the HIPAA and its younger brother the 2009 HITECH Act clearly state that covered entities must:
“Protect individuals’ health records and other identifiable health information by requiring appropriate safeguards to protect privacy, and setting limits and conditions on the uses and disclosures that may be made of such information without patient authorization.”
Data scientists should also be aware of the FDA’s Sentinel Initiative. This is a legally mandated electronic-surveillance system linking and analyzing health care data on millions of patients from multiple sources. Its purpose is to collect data on safety issues and enable regulators to take quick action.
Lilly, Heinrich Emanuel Merck, Charles Pfizer, Friedrich Bayer, Edward Robinson Squibb – before they were brand names, they were pioneers of the pharmaceutical industry. They were also some of the first men to use data-driven methods to cut costs and improve products.
The Fog Of War
Fast-forward to the 20th century. Witness the rise of air travel, fortune cookies, Einstein’s Theory of Relativity, jazz – and the megalithic manufacturers we know today.
By the time the troops came back from the war, pharma was becoming big business. The second half of the 20th century saw the development of many pharmaceutical breakthroughs, including ibuprofen, the contraceptive pill, Valium, and the war on cancer, among others. Advances in genetics – including automated protein and DNA sequencing – and psychiatric treatments opened new markets.
This was also the age when data began to make its mark.
Take Electronic Data Capture (EDC). In the years leading up to the 1970s, pharmaceutical companies had been receiving clinical research data on paper forms. This often resulted in data entry errors and delays.
To circumvent the problem, the Institute for Biological Research and Development (IBRD) formed an alliance with Abbott Pharmaceuticals.
Each clinical investigator would have access to a computer and be able to enter clinical data directly into the IBRD mainframe.
After cleaning up the data, IBRD supplied reports directly to Abbott.
Software companies blossomed, anxious to provide drug companies with useful tools. Examples of their functions included:
Predictive Analytics: Based on statistical models, Dr. Kurt Enslein’s TOPKAT software could predict the toxicity of a molecule from its structural components.
3D Molecular Modeling: Graphics software gave chemists the ability to view molecular structures in 3D and create virtual models.
Data Analysis: Statisticians used data management programs like SAS to analyze clinical data; computational chemists could now get, in minutes or hours, statistical analyses that previously took weeks or months.
Due to the massive influx of data, Lilly was the first pharmaceutical company to purchase a supercomputer (the Cray-2), and many of its competitors soon followed.
The volume of information only increased with the advent of the World Wide Web and the new millennium:
Collaboration between internal departments and external research institutions became the norm.
Pharmaceutical companies started to market directly to consumers.
Demand for alternative medicines and nutritional supplements skyrocketed.
Software programs grew increasingly complex and sophisticated.
Discoveries in genetics and the sequencing of the genome generated new drugs and sources of revenue.
Effective data mining became critical to drug development.