Data Science in the Pharmaceutical Industry

June 5, 2020

Opportunities in Pharmaceutical Data Science

The Promise of Big Data
For every 5,000 compounds starting in the laboratory, five are tested in humans and one makes it to market.

Moreover, it takes approximately 10 years and an average cost of $2-3 billion to develop each new drug. That adds up to a vast amount of molecular and clinical data stored in proprietary networks, just ripe for analytics.

Data Risks and Regulations

The Challenges Ahead

By the time it reached the 21st century, the pharmaceutical industry had amassed quantities of structured and semi-structured data in separate, mutually inaccessible “silos”.

Unfortunately, it’s having trouble bringing those silos together. However analysts slice the numbers, the ultimate goal is to turn the data into information that can play a strategic business role. The prerequisite for that is making the data sources communicate with one another in a meaningful way.

What’s more, drug companies are coping with a flood of unstructured information – including social sentiment – that’s coming at them from outside sources. Integrating, manipulating, organizing, and interpreting this data to support some coherent course of action is causing more than a few headaches.

Working with the Fed

Data integration raises the issue of patient privacy. The more databases are shared among institutions, CROs, partners, software companies, etc., the more pharmas run the risk of exposing sensitive patient information to the eyes of those who shouldn’t be seeing it.

That means drug companies need to reduce their exposure to running afoul of federal laws and regulations. For example, the HIPAA and its younger brother the 2009 HITECH Act clearly state that covered entities must:

“Protect individuals’ health records and other identifiable health information by requiring appropriate safeguards to protect privacy, and setting limits and conditions on the uses and disclosures that may be made of such information without patient authorization.”

Data scientists should also be aware of the FDA’s Sentinel Initiative. This is a legally mandated electronic-surveillance system linking and analyzing health care data on millions of patients from multiple sources. Its purpose is to collect data on safety issues and enable regulators to take quick action.

History of Data Analysis & Pharma

In the late 19th century, Colonel Eli Lilly, founder of Eli Lilly and Company, had:

Become an independent drug manufacturer
Automated the creation of pills and capsules
Hired a permanent R&D staff
Instituted a raft of quality assurance measures

By the end of his life, he was a millionaire.

Lilly, Heinrich Emanuel Merck, Charles Pfizer, Friedrich Bayer, Edward Robinson Squibb – before they were brand names, they were pioneers of the pharmaceutical industry. They were also some of the first men to use data-driven methods to cut costs and improve products.

The Fog Of War

Fast-forward to the 20th century. Witness the rise of air travel, fortune cookies, Einstein’s Theory of Relativity, jazz – and the megalithic manufacturers we know today.

The shift from small independent companies to conglomerates got started in the 1940s. With so many lives on the line, World War II spurred intense collaboration between governments and pharmaceutical companies. In the 1940s, a mind-boggling collaborative effort between the government, Merck, Pfizer and Squibb (among others) resulted in the mass production of penicillin.

The Golden Age of Development

By the time the troops came back from the war, pharma was becoming big business. The second half of the 20th century saw the development of many pharmaceutical breakthroughs, including ibuprofen, the contraceptive pill, Valium, and the war on cancer, among others. Advances in genetics – including automated protein and DNA sequencing – and psychiatric treatments opened new markets.

This was also the age when data began to make its mark.

Take Electronic Data Capture (EDC). In the years leading up to the 1970s, pharmaceutical companies had been receiving clinical research data on paper forms. This often resulted in data entry errors and delays.

To circumvent the problem, the Institute for Biological Research and Development (IBRD) formed an alliance with Abbott Pharmaceuticals.
Each clinical investigator would have access to a computer and be able to enter clinical data directly into the IBRD mainframe.
After cleaning up the data, IBRD supplied reports directly to Abbott.

The 1970s also saw the introduction of Cambridge Structural Database (CSD) and the Protein Data Bank (PDB), as well as genetics-focused resources like the Staden Package for DNA Sequences.

The Golden Age Goes Platinum

With the arrival of Professor Norman Allinger’s the Journal of Computational Chemistry in 1980, the first decade of computer-assisted drug development had begun. As Sean Ekins discusses in his book, Computer Applications in Pharmaceutical Research and Development, scientists were now empowered to use computational chemistry programs on personal computers.

Software companies blossomed, anxious to provide drug companies with useful tools. Examples of their functions included:

Predictive Analytics: Based on statistical models, Dr. Kurt Enslein’s TOPKAT software could predict the toxicity of a molecule from its structural components.
3D Molecular Modeling: Graphics software gave chemists the ability to view molecular structures in 3D and create virtual models.
Data Analysis: Statisticians used data management programs like SAS to analyze clinical data; computational chemists could now get, in minutes or hours, statistical analyses that previously took weeks or months.

Due to the massive influx of data, Lilly was the first pharmaceutical company to purchase a supercomputer (the Cray-2), and many of its competitors soon followed.

1990s-2000s

The volume of information only increased with the advent of the World Wide Web and the new millennium:

Collaboration between internal departments and external research institutions became the norm.
Pharmaceutical companies started to market directly to consumers.
Demand for alternative medicines and nutritional supplements skyrocketed.
Software programs grew increasingly complex and sophisticated.
Discoveries in genetics and the sequencing of the genome generated new drugs and sources of revenue.

Effective data mining became critical to drug development.

Last updated: June 2020

Data Science in the Pharmaceutical Industry

Opportunities in Pharmaceutical Data Science

Sponsored Schools

Increased Collaboration

Predictive Analytics

Crowdsourced Competitions

More Effective Drug Trials

Targeted Marketing and Sales

Better Patient Follow-Ups