Gain skills needed to analyze data and deliver value to organizations. Complete projects using real data sets from the worlds of finance, healthcare, government, social welfare, and more.
Southern Methodist University
SMU Data Science Boot Camp
Develop concrete, in-demand data skills and learn how to help drive business decisions and solve challenges that companies are facing. No programming experience required.
Northwestern Data Science and Visualization Boot Camp
Northwestern Data Science and Visualization Bootcamp teaches practical and technical skills in 24 intensive weeks. Students apply their knowledge to hands-on projects that translate directly into work in the field.
University of Southern California
USC Viterbi Data Analytics Boot Camp
Expand your skill set and grow as a data analyst. This program covers the specialized skills to be successful in the field of data in 24 weeks.
Like This Ad?
One of the more famous petri dishes for data science was Facebook. In 2005, Jeffrey Hammerbacher put together a team of gurus to mine huge hordes of the site’s social network data, crunch the numbers for insights and use their discoveries to improve the service and create targeted advertising. (Two years later, Hammerbacher decamped again to form Cloudera.)
Over at LinkedIn, D.J. Patil and his colleagues were hard at work on a similar approach. Their work produced many tools users are now familiar with – recommendation technology such as “Groups you may like”, as well as features such as Career Explorer and Jobs Recommendation.
Consumers ‘R’ Us
Business intelligence and the retail industry have a long history together (AC Nielsen began asking impertinent questions in 1923), and the partnership has developed with the Internet.
Data scientists working in retail can combine data sets to create:
Personalized recommendations based on weather, seasonal trends, traffic reports, past purchase history, your dog’s favorite chew toys…
Smarter sentiment analysis
Product insights gleaned from RFID and sensor data
Detailed market basket and video analysis
Real-time pricing and inventory management
In recent years Google, SAS and IBM have gone on buying streaks, snapping up smaller companies with useful analytics technologies.
In the same year, IBM rolled out a number of analytics-backed retail apps and services that incorporated components from Tealeaf Technology, a customer experience management specialist.
Humans, We Have a Problem
Though many data scientists may use the Web as a data source, they’re not limited by it. In fact, most successful business intelligence and data analytics companies are pulling data from as many sources as they can find.
Take Splunk. Its specialty is harvesting big data from machines. These include Web servers, mobile devices – even something as prosaic as an air conditioning unit.
As soon as a machine creates a piece of data, Splunk seizes it and stores it in a cloud-based database.
The aim is to trace a machine’s repetitive patterns, find anomalies and diagnose problems.
Once it’s found an issue, its programs can create immediate alerts (as well as less urgent graphs and reports) for the client.
Saving the World
But maybe the most positive trend appearing in Web-based data science has nothing to do with profit or gain. As Mayer-Schönberger and Cukier relate in their book, Big Data, A Revolution That Will Transform How We Live, Work, and Think – big data might just help save the world.
They went at this task with no preconceptions. They simply designed a system that would look for correlations between the recorded spread of flu and the frequency of specific search queries. To test their flu predictions, they processed 450 million mathematical models.
And they found the link. A combination of 45 search terms, used together in a mathematical model, could tell them – in real-time – how flu epidemics were spreading.
This real-time monitoring was far superior to any government report to date. When the H1N1 virus hit in 2009, public health officials were right on top of its spread.
Data Risks and Regulations
You’re Only as Good as Your Data
With all its potential, big data isn’t the answer to everything.
What’s more, the volume, velocity and variety of big data are only going to get bigger. Mobile use is exploding. Developing countries are coming online. The Internet of Things is spawning a whole new information universe.
Internet companies, who are already dealing with astronomical numbers, will have to be ready to handle the load. Data scientists will have to be primed to know where to look.
Private! Keep Out!
In a global economy – and with increasing demands from the government to access citizens’ information – the Internet industry is facing a set of complicated questions:
Who owns the rights to personal data? Are there exceptions to the rule?
Now that the cloud is here, what safeguards are needed to protect private information?
As data collaboration increases, how much can Internet companies share with commercial partners, business intelligence vendors and non-profits?
What does privacy mean in the 21st century?
Unfortunately, as with many things regarding data science, there are no easy answers.
History of Data Science and the Internet
“I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers.” – Tim Berners-Lee, inventor of the World Wide Web
In early October 1957, headlines read:
“RED ‘MOON’ OVER LONDON!”
“RUSSIA WINS THE RACE INTO OUTER SPACE!”
“SPACE AGE IS HERE!”
Sputnik, the first artificial earth satellite, was in the sky.
A few months later the U.S. Department of Defense hastily issued directive 5105.15 – the establishment of the Advanced Research Projects Agency (ARPA). Packed with the country’s best and brightest, this agency spent the sixties tackling a communications challenge. The solution to the challenge was known as ARPANET and would transform the world forever.
Do You See the L?
In 1962, J.C.R. Licklider (aka “computing’s Johnny Appleseed”) became Director of the Information Processing Techniques Office (IPTO) within the Defense Advanced Research Projects Agency (DARPA). His job? Find a way to unite the department’s main computers at Cheyenne Mountain with the Pentagon and Strategic Air Command (SAC) headquarters in a wide-area network.
His idea was clear and simple:
“A network of such [computers], connected to one another by wide-band communication lines”, from Man-Computer Symbiosis (1960).
This dream came one step closer to reality when Paul Baran, Donald Davies and others came up with the concept of packet-switching. By bundling data into arbitrary packets and routing these “digital envelopes” onward, computer engineers could save precious bandwidth on connection lines.
By the late 1960s, Licklider had moved on to other projects and Robert Taylor had become head of IPTO. Working with smart people like MIT’s Larry Roberts and Leonard Kleinrock, he threw even more resources into the project.
On October 29, 1969, Kleinrock was at UCLA, on the phone with colleagues at the Stanford Research Institute (SRI). Their computers had been linked; their systems were running:
“We typed the L, and we asked on the phone, ‘Do you see the L?’
‘Yes, we see the L,’ came the response.
‘We typed the O, and we asked, ‘Do you see the O?’
1976: Robert Metcalfe and his colleagues launch Ethernet, a family of technologies for local area networks (LANs)
By the 1980s, emails, newsgroups and LANs were common practice for universities and research groups. In 1985, the National Science Foundation Network (NSFnet) combined five national supercomputer centers, eventually replacing ARPANET as the de facto educational network.
The Roaring 90s
Throughout the 1980s and early 1990s, a number of groups were busy developing ways to organize this data and accommodate projected growth. In 1989, Tim Berners-Lee proposed a simple but far-reaching idea: Why not use the Internet as a platform to create a global hypertext system available to all? A few years later, the World Wide Web appeared as a publicly available service.
From the beginning, it was important that the Web be readily accessible by ordinary users. Browsers like Mosaic (1993) and Netscape (1994) helped them make sense of it. Yahoo! (1994) and AltaVista (1995) made it easier to search. E-commerce companies such as Book Stacks Unlimited (1992) and Amazon (1995) created new businesses.
By the end of the century, the Internet was well on its way to becoming the primary avenue of information flowing through two-way telecom networks.
Then along came Google. The word is a deliberate misspelling of the word “googol” – the name for a very large number: one followed by 100 zeros. As this name suggests, the company was thinking big from the beginning.
Throughout the late 1990s and early 2000s, Google:
Improved on existing search algorithms by tallying hyperlinked references to each Web page (to measure popularity) in addition to parsing the text on the page (PageRank, named for the co-founder, Larry Page)
Began selling advertisements associated with search keywords
Developed MapReduce, a programming model for processing large data sets (codified into the open source software Apache Hadoop)
Launched Gmail, Google Translate, Google Maps, Google News, Google Books, and many more experiments
“The reason Google’s translation system works well is not that it has a smarter algorithm. It works well because its creators, like Banko and Brill at Microsoft, fed in more data – and not just of high quality. Google was able to use a dataset tens of thousands of times larger than IBM’s Candide because it accepted messiness.”