Master's in Data Science

  • Top Schools
    • 23 Great Schools with Master’s Programs in Data Science
    • 22 Top Schools with Master’s in Information Systems Degrees
    • 25 Top Schools with Master’s in Business Analytics Programs
  • Online Programs
    • Online Data Science Degree Programs
    • Online Masters in Business Analytics Programs
    • Online Masters in Information Systems Programs
    • Online Masters in Computer Engineering
    • Online Masters in Computer Science
    • Online Masters in Cybersecurity
    • Online Certificate Programs in Analytics
  • By State
    • Alabama
    • Arizona
    • Arkansas
    • California
    • Colorado
    • Connecticut
    • Delaware
    • Florida
    • Georgia
    • Hawaii
    • Idaho
    • Illinois
    • Indiana
    • Iowa
    • Kansas
    • Kentucky
    • Louisiana
    • Maine
    • Maryland
    • Massachusetts
    • Michigan
    • Minnesota
    • Mississippi
    • Missouri
    • Montana
    • Nebraska
    • Nevada
    • New Hampshire
    • New Jersey
    • New Mexico
    • New York
    • North Carolina
    • North Dakota
    • Ohio
    • Oklahoma
    • Oregon
    • Pennsylvania
    • Rhode Island
    • South Carolina
    • South Dakota
    • Tennessee
    • Texas
    • Utah
    • Vermont
    • Virginia
    • Washington
    • Washington, D.C.
    • West Virginia
    • Wisconsin
  • Related Degrees
    • Data Science Bachelor Degrees
    • Graduate Certificates in Data Science of 2019
    • Data Science Bootcamps
    • Master’s in Accounting Analytics
    • Master’s in Applied Statistics
    • Master’s in Business Analytics
    • Master’s in Business Analytics Online
    • Master’s in Business Intelligence
    • Master’s in Geospatial Science & GIS
    • Master’s in Health Informatics
    • Master’s in Information Systems
    • Master’s in Public Policy Data Analytics
    • MBA in Analytics/Data Science
    • PhD in Data Science Programs
    • Programs Outside the US
  • Careers
    • Business Analyst
    • Computer Engineer
    • Computer Scientist
    • Data Analyst
    • Data Architect
    • Data Engineer
    • Data Scientist
    • Marketing Analyst
    • Quantitative Analyst
    • Statistician
    • Data Analyst vs Data Scientist
    • Computer Science vs. Computer Engineering
    • Cyber Security vs Computer Science
    • Data Analytics vs. Business Analytics
    • Data Science vs. Machine Learning
  • Online Courses
    • Data Science Online Courses
  • Resources
    • What is Data Analytics?
    • What is Business Analytics?
    • What is Computer Engineering?
    • What is Computer Science
    • Data Scientist Salary Guide
    • Data Analyst Salary Guide
    • Learn Data Science Online
    • Benefits of Business Intelligence Software

Data Science in Biotechnology

Opportunities in Biotech Data Science

The Promise of Big Data

Ready for an improbable truth? The NIH’s National Human Genome Research Institute has calculated that:

  • Generating an entire human genome sequence would have cost approximately $20 million in 2006.
  • Thanks to the development of next-gen sequencers, it is now about $1000 per genome.

What’s more, this work is being done in mere hours.

But genomics is not the only area where big data is making an impact. As Ryan McBride illustrates in 10 Reasons Why Biotech Needs Big Data, data mining is generating a host of lucrative business opportunities for the industry.

18.205.60.226

ad
Featured Schools

Sponsored Online Master's Programs

Learn MoreSyracuse University

Online Master's in Applied Data Science
Syracuse University's online Master's in Applied Data Science can be completed in as few as 18 months.

* GRE waivers are available.
Sponsored Program

Learn MoreSouthern Methodist University

Online Master of Science in Data Science
Earn your M.S. in data science online in 20 months from SMU - ranked a Top National University by US News. Bachelor's degree required.

* GRE waivers available for applicants with 3+ years work experience.
Sponsored Program

Learn MoreUniversity of Denver

Online MS in Data Science program
Earn your MS in Data Science online in as few as 18 months. Bridge courses are available.
Sponsored Program

Learn MoreUniversity of California, Berkeley

Online Master of Information and Data Science (MIDS)
Earn your Master's in Data Science online from UC Berkeley in as few as 12 months.

* No GRE Scores Required
Sponsored Program

Learn MoreUniversity of Dayton

Master of Business Analytics
Gain in-demand analytics skills with an online master's in business analytics. Complete in as few as 12 months.
Sponsored Program

Learn MoreAmerican University

Online Master of Science in Analytics
Make informed decisions using data analysis in 12 months with a Master's in Business Analytics online from American University. No GMAT/GRE required to apply.
Sponsored Program

Learn MorePepperdine University

Master's in Applied Analytics
Earn your MS in Applied Analytics online from Pepperdine University. GMAT waivers are available for qualified applicants.
Sponsored Program

Learn MoreSyracuse University

Online Master of Science in Business Analytics
Looking to become a data-savvy leader? Earn your M.S. in Business Analytics online. GMAT waivers available!
Sponsored Program

Sponsored Online Master's Programs

Genomics

Take genomics. Numbers-wise, each human genome is composed of 20,000-25,000 genes composed of 3 billion base pairs. That’s around 3 gigabytes of data.

In addition to sequencing, massive amounts of information on structure/function annotations, disease correlations, population variations – the list goes on – are being entered into databanks. Software companies are furiously developing tools and products to analyze this treasure trove.

For example, RainDance Technologies provides researchers, universities and private companies with genomic tools used for ultra-sensitive detection of cancer and inherited diseases.

Using Google frameworks as a starting point, the folks at NextBio have created a platform that allows biotechnologists to search life-science information, share data, and collaborate with other researchers.

  • In 2012, NextBio and Intel announced a partnership aimed at optimizing and stabilizing the Hadoop stack and advancing the use of big data technologies in genomics.

The Human Microbiome

Though genomics currently hogs the spotlight, there are plenty of other biotechnology fields wrestling with big data.

In fact, when it comes to human microbes – the bacteria, fungi and viruses that live on or inside us – we’re talking about astronomical amounts of data. Scientists with the NIH’s Human Microbiome Project have counted more than 10,000 microbes in the human body, with 100 times more genes than in the body’s own cells.

To determine which microbes are most important to our well-being, researchers at the Harvard Public School of Health used unique computational methods to identify around 350 of the most important organisms in their microbial communities.

With the help of DNA sequencing, they sorted through 3.5 terabytes of genomic data and pinpointed genetic “name tags” – sequences specific to those key bacteria. They could then identify where and how often these markers occurred throughout a healthy population.

This gave them the opportunity to catalog over 100 opportunistic pathogens and understand where in the microbiome these organisms occur normally.

Like genomics, there are also plenty of start-ups – Libra Biosciences, Vendanta Biosciences, Seres Therapeutics – looking to capitalize on new discoveries.

Crowdsourcing

In 2011, players of an online game called Foldit took three weeks to produce an accurate 3D model of the M-PMV retroviral protease enzyme. The structure of the enzyme – which plays an important role in the spread of an AIDS-like virus in rhesus monkeys – had eluded researchers for fifteen years.

In January 2012, gamers had another stunning success – the first crowdsourced redesign of a protein. By adding 13 amino acids to an enzyme that catalyzes Diels-Alders reactions, Foldit players increased its activity more than 18 times.

In a world of social networking sites, online communities and publicly funded projects, crowdsourcing has become an integral part of people’s lives. Forward-thinking scientists have begun to use this collective wisdom to advance their research and development goals.

They’re also partnering with private companies to access information. 23andMe made its name by offering a personal genome test kit. Customers provide a saliva sample, and the company supplies an online analysis of inherited traits, genealogy, and possible congenital risk factors.

Their ever-growing bank of digital patient data, including one of the largest databases on genes involved in Parkinson’s disease, has put them in a pivotal position of power. In recent years, they have:

  • Provided testing for and partnered with universities and institutions involved in disease research
  • Received their first genetic patent for “Polymorphisms Associated With Parkinson’s Disease”
  • Acquired CureTogether, an online, patient-led platform that provides health tools and surveys and allows users to track their health

Synthesizing Diverse Data

Perhaps the biggest data challenge for biotechnologists is synthesis. How can scientists integrate large quantities and diverse sets of data – genomic, proteomic, phenotypic, clinical, semantic, social etc. – into a coherent whole?

Many teams are busy providing answers:

  • Cambridge Semantics has a developed semantic web technologies that help pharmaceutical companies sort and select which businesses to acquire and which drug compounds to license.
  • Data scientists at the Broad Institute of MIT and Harvard have developed the Integrative Genomics Viewer (IGV), open source software that allows for the interactive exploration of large, integrated genomic datasets.
  • GNS Healthcare is using causal machine learning platform, REFS to analyze diverse sets of data and create predictive models and biomarker signatures.

With data sets multiplying by the minute, data scientists aren’t suffering for lack of raw materials.

Data Risks and Regulations

Choose Your Data Wisely

What’s that phrase again? Every rose has its thorn? Well, in the field of biotechnology, every discovery has a caveat.

As AstraZeneca R&D Information Vice President John Reynders warns in Big Data Has Arrived in Biotech. Now What?, hypothesis-generation and predictive analytics are a little easier when you’re just trying to guess what books someone may prefer. Genomic data, on the other hand, is far more complex and extensive.

The volume, velocity and variety of data (3Vs) are creating similar headaches. When faced with an ever-growing mountain of information, it can take a great deal of human skill to understand what questions you need to ask and how best to find the answers.

In more prosaic terms, as Warp Drive Bio CEO Alexis Borisy notes on FierceBiotech: “Our clinical, phenomic data sucks.”

These aren’t insurmountable problems, but they’re big ones. As the 3Vs accelerate, biotech companies will likely have to be careful that they keep their minds open and their hubris in check.

I’d Like To Keep That Private

Unlike Europe, the U.S. lacks an overarching data protection law. It does, however, have a great deal of federal and state legislation that affects companies who handle personal data. These laws and regulations can vary according to the industries involved.

Biotechnology companies who partner with health care providers, for example, may run into the Health Insurance Portability and Accountability Act (HIPAA). Enacted in 1996, the HIPAA Privacy Rule:

“…requires appropriate safeguards to protect the privacy of personal health information, and sets limits and conditions on the uses and disclosures that may be made of such information without patient authorization.”

Going one step further, the 2009 HITECH Act makes the HIPAA privacy provisions applicable to business associates.

Companies who intend to store personal data must also be aware of the rigid laws in place to protect U.S. consumers. The FTC has full power to bring enforcement actions to ensure that companies are living up to the promises in their privacy statements.

History of Data Analysis and Biotech

“Big Data is only going to be as good as the questions that are being asked of it. It’s the human element in the loop that’s able to interrogate that data.” – John Reynders, Big Data has arrived in biotech. Now what?

On the last day of February 1953, a wild-haired Yank and Brit barreled into the Eagle Pub in Cambridge. It was lunchtime, the pub filled with noise and the tantalizing aroma of meat and three veg. “We have found the secret of life!” the Brit announced to startled patrons.

That, at any rate, is the myth surrounding James Watson and Francis Crick’s discovery of the structure of DNA. Though it ignores the contributions of Rosalind Franklin and others, it does hint at one undeniable fact – the incredible leap forward that biotechnology (and data analysis) took in the 20th century.

Breaking Barriers

In 1919 Károly Ereky, a Hungarian agricultural engineer, first used the word in his book, Biotechnology of Meat, Fat and Milk Production in an Agricultural Large-Scale Farm. For Ereky, biotechnology was the means to upgrade raw materials biologically to reveal socially useful products.”

Large amounts of scientific data were, of course, integral to the development of the field. As the world of travel and telecommunications shrank, so too did the barriers to sharing information.

War accelerated the process. With a little help from Alexander Fleming and Clodomiro (Clorito) Picado Twight, a coordinated effort was mounted to mass-produce the wonder drug called penicillin. By 1943, scientists had discovered a moldy cantaloupe in Peoria contained the best strain for production. By 1944, 2.3 million doses were available for the invasion of Normandy.

The Rise of Genetics

Then along came genetics. In 1958, DNA was first made in a test tube. In 1981, scientists at Ohio University transferred genes from other animals into mice to create the first transgenic animals. A year later, the FDA approved the first biotech drug (human insulin) produced in genetically modified bacteria.

These discoveries were aided and abetted by advances in technology. In the mid-1970s, automated protein and DNA sequencing became a reality. A decade down the track, scientists could remotely access huge quantities of data stored in central computer repositories.

Many biotechnologists were eager to share their findings amongst colleagues. In 1977, Rodger Staden and his group at Cambridge developed the data-packed Staden Package for DNA sequences, initially available to academics, then eventually open source.

Over in the United States, the NIH was involved in sponsoring PROPHET, “a national computing resource tailored to meet the data management and analysis needs of life scientists.” PROPHET’s main attraction was “a broad spectrum of integrated, graphics-oriented information-handling tools.”

1980s-1990s

But it was in the years that Madonna reigned supreme that biotechnology and data analytics really hit their stride. Academic scientists, the NIH, the EMBL and large research funding centers poured their time – and their money – into new bioinformatic databases and software.

Highlights of this period include:

  • 1986: Amos Bairoch, a young Swiss bioinformatician, begins to develop an annotated protein sequence databank known as Swiss-Prot. The full-blown version is launched to great acclaim in 1991.
  • 1986: Interferon becomes the first anti-cancer drug produced through biotech.
  • Late 1980s: Genofit and Intelli-Genetics commercialize PC/GENE, a software package created by Amos Bairoch for the analysis of protein and nucleotide sequences.
  • 1991: Bairoch creates PROSITE, a database of protein sequence and structure correlations. He complements this with ENZYME, a nomenclature database on enzymes, and SeqAnalRef, a reference database focused on sequence analysis.
  • 1993: SWISS-2DPAGE, a proteomics-oriented database, is established. It contains data on two-dimensional polyacrylamide gel electrophoresis (2-D PAGE) maps of proteins from a range of healthy and diseased tissues.
  • 1993: The Swiss Institute of Bioinformatics (SIB) introduces ExPAsy, an integrative bioinformatics portal that draws on a wide variety of scientific resources, databases and software tools.
  • 1996: Dolly the sheep becomes the first animal cloned from an adult cell.

A New Century

This explosion of data contributed to a slew of firsts for the biotechnology sector in 21st century. Industries seized upon the discoveries, pumping funds into the development of new drugs, bio-engineered farming and alternative energy.

Big-bang events in this period include:

  • 2000: The Human Genome Project and Celera Genomics create a draft of the human genome sequence. Their work appears in both Science and Nature.
  • 2001: Gleevec® (imatinib), a drug for patients with chronic myeloid leukemia, is the first gene-targeted drug to receive FDA approval.
  • 2002: Rice becomes the first crop to have its genome decoded.
  • 2003: The Human Genome Project completes sequencing of the human genome.
  • 2004: The U.N. Food and Agriculture Organization endorses biotech crops, stating that biotechnology can “contribute to meeting the challenges” faced by poor farmers and developing countries.
  • 2005: The Energy Policy Act authorizes multiple incentives for bioethanol development.
  • 2006: FDA approves Gardasil®, the first vaccine developed against human papillomavirus (HPV) and the first preventive cancer vaccine.
  • 2007: The FDA approves the H5N1 vaccine, the first vaccine for avian flu.
  • 2008: The NIH launches the Human Microbiome Project (HMP), a five-year project aimed at identifying and characterizing the microorganisms found in association with healthy and diseased humans.
  • 2009: Global biotech crop acreage reaches 330 million acres.
Share on Facebook Share
Share on TwitterTweet
Share on LinkedIn Share

SPONSORED DATA SCIENCE PROGRAMS

UC Berkeley - Master of Information and Data Science
Sponsored Program
Syracuse University - Master of Science in Applied Data Science
Sponsored Program

SPONSORED ANALYTICS PROGRAMS

American University - Master of Science in Analytics
Sponsored Program
Syracuse University - Master of Science in Business Analytics
Sponsored Program

Online Programs

  • Online Master’s in Data Science Programs
  • Online Master’s in Business Analytics
  • Master’s in Information Systems Online
  • Online Master’s in Computer Science
  • Online Master’s in Computer Engineering
  • Online Master’s in Cybersecurity
  • Graduate Certificates in Data Science Online

Career Profiles

  • Business Analyst
  • Data Analyst
  • Data Architect
  • Data Engineer
  • Data Scientist
  • Marketing Analyst
  • Quantitative Analyst
  • Statistician

Schools by State

  • Alabama
  • Arizona
  • Arkansas
  • California
  • Colorado
  • Connecticut
  • Delaware
  • District of Columbia
  • Florida
  • Georgia
  • Hawaii
  • Idaho
  • Illinois
  • Indiana
  • Iowa
  • Kansas
  • Kentucky
  • Louisiana
  • Maine
  • Maryland
  • Massachusetts
  • Michigan
  • Minnesota
  • Mississippi
  • Missouri
  • Montana
  • Nebraska
  • Nevada
  • New Hampshire
  • New Jersey
  • New Mexico
  • New York
  • North Carolina
  • North Dakota
  • Ohio
  • Oklahoma
  • Oregon
  • Pennsylvania
  • Rhode Island
  • South Carolina
  • South Dakota
  • Tennessee
  • Texas
  • Utah
  • Vermont
  • Virginia
  • Washington
  • West Virginia
  • Wisconsin

Industry Uses

  • Biotechnology
  • Energy
  • Finance
  • Gaming and Hospitality
  • Government
  • Health Care
  • Insurance
  • Internet
  • Manufacturing
  • Pharmaceuticals
  • Retail
  • Telecommunications
  • Travel and Transportation
  • Utilities

Data Science Technologies

  • R
  • Python
  • SQL
  • Hadoop
  • Tableau

MastersInDataScience.org is owned and operated by 2U, Inc.
© 2U, Inc. 2019

About Us | Privacy Policy | Terms of Use | Blog