Thanks to the development of next-gen sequencers, it is now about $1000 per genome.
What’s more, this work is being done in mere hours.
Rutgers Data Science Bootcamp
Gain skills needed to analyze data and deliver value to organizations. Complete projects using real data sets from the worlds of finance, healthcare, government, social welfare, and more.
Southern Methodist University
SMU Data Science Boot Camp
Develop concrete, in-demand data skills and learn how to help drive business decisions and solve challenges that companies are facing. No programming experience required.
Northwestern Data Science and Visualization Boot Camp
Northwestern Data Science and Visualization Bootcamp teaches practical and technical skills in 24 intensive weeks. Students apply their knowledge to hands-on projects that translate directly into work in the field.
University of Southern California
USC Viterbi Data Analytics Boot Camp
Expand your skill set and grow as a data analyst. This program covers the specialized skills to be successful in the field of data in 24 weeks.
In the field of genomics, data science is being used to analyze variation in genetics and perform clinical and phenotype analyses. The National Human Genome Research Institute has a strategic plan for data science in order to modernize how data is used at the institute.
The Human Microbiome
There are plenty of other biotechnology fields wrestling with big data.
In fact, when it comes to human microbes – the bacteria, fungi and viruses that live on or inside us – we’re talking about astronomical amounts of data.
In 1919 Károly Ereky, a Hungarian agricultural engineer, first used the word in his book, Biotechnology of Meat, Fat and Milk Production in an Agricultural Large-Scale Farm. For Ereky, biotechnology was the means to upgrade raw materials biologically to reveal socially useful products.”
Large amounts of scientific data were, of course, integral to the development of the field. As the world of travel and telecommunications shrank, so too did the barriers to sharing information.
War accelerated the process. With a little help from Alexander Fleming and Clodomiro (Clorito) Picado Twight, a coordinated effort was mounted to mass-produce the wonder drug called penicillin. By 1943, scientists had discovered a moldy cantaloupe in Peoria contained the best strain for production. By 1944, 2.3 million doses were available for the invasion of Normandy.
These discoveries were aided and abetted by advances in technology. In the mid-1970s, automated protein and DNA sequencing became a reality. A decade down the track, scientists could remotely access huge quantities of data stored in central computer repositories.
Many biotechnologists were eager to share their findings amongst colleagues. In 1977, Rodger Staden and his group at Cambridge developed the data-packed Staden Package for DNA sequences, initially available to academics, then eventually open source.
Over in the United States, the NIH was involved in sponsoring PROPHET, “a national computing resource tailored to meet the data management and analysis needs of life scientists.” PROPHET’s main attraction was “a broad spectrum of integrated, graphics-oriented information-handling tools.”
But it was in the years that Madonna reigned supreme that biotechnology and data analytics really hit their stride. Academic scientists, the NIH, the EMBL and large research funding centers poured their time – and their money – into new bioinformatic databases and software.
Highlights of this period include:
1986: Amos Bairoch, a young Swiss bioinformatician, begins to develop an annotated protein sequence databank known as Swiss-Prot. The full-blown version is launched to great acclaim in 1991.
1986:Interferon becomes the first anti-cancer drug produced through biotech.
1991:Bairoch creates PROSITE, a database of protein sequence and structure correlations. He complements this with ENZYME, a nomenclature database on enzymes, and SeqAnalRef, a reference database focused on sequence analysis.
This explosion of data contributed to a slew of firsts for the biotechnology sector in 21st century. Industries seized upon the discoveries, pumping funds into the development of new drugs, bio-engineered farming and alternative energy.
Big-bang events in this period include:
2000:The Human Genome Project and Celera Genomics create a draft of the human genome sequence. Their work appears in both Science and Nature.
2001: Gleevec® (imatinib), a drug for patients with chronic myeloid leukemia, is the first gene-targeted drug to receive FDA approval.
2002:Rice becomes the first crop to have its genome decoded.
In 2011, players of an online game called Foldit took three weeks to produce an accurate 3D model of the M-PMV retroviral protease enzyme. The structure of the enzyme – which plays an important role in the spread of an AIDS-like virus in rhesus monkeys – had eluded researchers for fifteen years.
In January 2012, gamers had another stunning success – the first crowdsourced redesign of a protein. By adding 13 amino acids to an enzyme that catalyzes Diels-Alders reactions, Foldit players increased its activity more than 18 times.
In a world of social networking sites, online communities and publicly funded projects, crowdsourcing has become an integral part of people’s lives. Forward-thinking scientists have begun to use this collective wisdom to advance their research and development goals.
They’re also partnering with private companies to access information. 23andMe made its name by offering a personal genome test kit. Customers provide a saliva sample, and the company supplies an online analysis of inherited traits, genealogy, and possible congenital risk factors.
Their ever-growing bank of digital patient data, including one of the largest databases on genes involved in Parkinson’s disease, has put them in a pivotal position of power. In past years, they have:
One data challenge for biotechnologists is synthesis. How can scientists integrate large quantities and diverse sets of data – genomic, proteomic, phenotypic, clinical, semantic, social etc. – into a coherent whole?
Many teams are busy providing answers:
Cambridge Semantics has developed semantic web technologies that help pharmaceutical companies sort and select which businesses to acquire and which drug compounds to license.
Data scientists at the Broad Institute of MIT and Harvard have developed the Integrative Genomics Viewer (IGV), open source software that allows for the interactive exploration of large, integrated genomic datasets.
With data sets multiplying by the minute, data scientists aren’t suffering for lack of raw materials.
Data Risks and Regulations
Choose Your Data Wisely
What’s that phrase again? Every rose has its thorn? Well, in the field of biotechnology, every discovery has a caveat.
As AstraZeneca R&D Information Vice President John Reynders warns in Big Data Has Arrived in Biotech. Now What?, hypothesis-generation and predictive analytics are a little easier when you’re just trying to guess what books someone may prefer. Genomic data, on the other hand, is far more complex and extensive.
The volume, velocity and variety of data (3Vs) are creating similar headaches. When faced with an ever-growing mountain of information, it can take a great deal of human skill to understand what questions you need to ask and how best to find the answers.
These aren’t insurmountable problems, but they’re big ones. As the 3Vs accelerate, biotech companies will likely have to be careful that they keep their minds open and their hubris in check.
I’d Like To Keep That Private
Unlike Europe, the U.S. lacks an overarching data protection law. It does, however, have a great deal of federal and state legislation that affects companies who handle personal data. These laws and regulations can vary according to the industries involved.
Biotechnology companies who partner with health care providers, for example, may run into the Health Insurance Portability and Accountability Act (HIPAA). Enacted in 1996, the HIPAA Privacy Rule:
“…requires appropriate safeguards to protect the privacy of personal health information, and sets limits and conditions on the uses and disclosures that may be made of such information without patient authorization.”
Going one step further, the 2009 HITECH Act makes the HIPAA privacy provisions applicable to business associates.
Companies who intend to store personal data must also be aware of the rigid laws in place to protect U.S. consumers. The FTC has full power to bring enforcement actions to ensure that companies are living up to the promises in their privacy statements.