Data scientists are big data wranglers. They take an enormous mass of messy data points (unstructured and structured) and use their formidable skills in math, statistics and programming to clean, massage and organize them. Then they apply all their analytic powers – industry knowledge, contextual understanding, skepticism of existing assumptions – to uncover hidden solutions to business challenges.
Data Scientist Responsibilities
“A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.”
On any given day, a data scientist may be required to:
- Conduct undirected research and frame open-ended industry questions
- Extract huge volumes of data from multiple internal and external sources
- Employ sophisticated analytics programs, machine learning and statistical methods to prepare data for use in predictive and prescriptive modeling
- Thoroughly clean and prune data to discard irrelevant information
- Explore and examine data from a variety of angles to determine hidden weaknesses, trends and/or opportunities
- Devise data-driven solutions to the most pressing challenges
- Invent new algorithms to solve problems and build new tools to automate work
- Communicate predictions and findings to management and IT departments through effective data visualizations and reports
- Recommend cost-effective changes to existing procedures and strategies
Every company will have a different take on job tasks. Some treat their data scientists as glorified data analysts or combine their duties with data engineers; others need top-level analytics experts skilled in intense machine learning and data visualizations.
As data scientists achieve new levels of experience or change jobs, their responsibilities invariably change. For example, a person working alone in a mid-size company may spend a good portion of the day in data cleaning and munging. A high-level employee in a business that offers data-based services may be asked to structure big data projects or create new products.
An Interview with a Real Data Scientist
We caught up with Lisa Qian, Data Scientist at Airbnb, to find out what it’s like to work as a data scientist. Read on to learn about the impact data science has on Airbnb’s success, the programming languages they use on the job, and what students need to know in order to succeed.
The best data scientists are also great at communicating, both to other data scientists and non-technical people. In order to be effective at Airbnb, our analyses have to be both technically rigorous and presented in a clear and actionable way to other members of the company.
Data Scientist Salaries
The term “data scientist” is the hottest job title in the IT field – with starting salaries to match. It should come as no surprise that Silicon Valley is the new Jerusalem. According to a 2014 Burtch Works study, 36% of data scientists work on the West Coast. Entry-level professionals in that area earn a median base salary of $100,000 – 22% more than their Northeast peers.
Average Salary (2015): $118,709 per year
Median Salary (2015): $93,991 per year
Total Pay Range: $63,524 – $138,123
Senior Data Scientist
Median Salary (2015): $124,273 per year
Total Pay Range: $89,801 – $179,445
Data Scientist Qualifications
What Kind of Degree Will I Need?
Broadly speaking, you have 3 education options if you’re considering a career as a data scientist:
- Degrees and graduate certificates provide structure, internships, networking and recognized academic qualifications for your résumé. They will also cost you significant time and money.
- MOOCs and self-guided learning courses are free/cheap, short and targeted. They allow you to complete projects on your own time – but they require you to structure your own academic path.
- Bootcamps are intense and faster to complete than traditional degrees. They may be taught by practicing data scientists, but they won’t give you degree initials after your name.
Academic qualifications may be more important than you imagine. As Burtch Works notes, “it’s incredibly rare for someone without an advanced quantitative degree to have the technical skills necessary to be a data scientist.”
In its data science salary report, Burtch Works determined that 88% of data scientists have a master’s degree and 46% have a PhD. The majority of these degrees are in rigorous quantitative, technical or scientific subjects, including math and statistics (32%), computer science (19%) and engineering (16%).
With that being said, companies are desperate for candidates with real-world skills. Your technical know-how may trump preferred degree requirements.
Note: Check out our list of 23 Great Schools with Master’s Programs in Data Science.
What Kind of Skills Will I Need?
- Math (e.g. linear algebra, calculus and probability)
- Statistics (e.g. hypothesis testing and summary statistics)
- Machine learning tools and techniques (e.g. k-nearest neighbors, random forests, ensemble methods, etc.)
- Software engineering skills (e.g. distributed computing, algorithms and data structures)
- Data mining
- Data cleaning and munging
- Data visualization (e.g. ggplot and d3.js) and reporting techniques
- Unstructured data techniques
- R and/or SAS languages
- SQL databases and database querying languages
- Python (most common), C/C++ Java, Perl
- Big data platforms like Hadoop, Hive & Pig
- Cloud tools like Amazon S3
This list is always subject to change. As Anmol Rajpurohit suggests, “generic programming skills are a lot more important than being the expert of any particular programming language.”
- Analytic Problem-Solving: Approaching high-level challenges with a clear eye on what is important; employing the right approach/methods to make the maximum use of time and human resources.
- Effective Communication: Detailing your techniques and discoveries to technical and non-technical audiences in a language they can understand.
- Intellectual Curiosity: Exploring new territories and finding creative and unusual ways to solve problems.
- Industry Knowledge: Understanding the way your chosen industry functions and how data are collected, analyzed and utilized.
Note: You can view a handy trajectory on How to Become a Data Scientist in an infographic from Datacamp. Also, KDnuggets.com is a great source of information on big data, machine learning, and data science topics.
What About Certifications?
To avoid wasting time on poor quality certifications, ask your mentors for advice, check job listing requirements and consult articles like Tom’s IT Pro “Best Of” certification lists. Here are a few that focus on useful skills:
CAP was created in 2013 by the Institute for Operations Research and the Management Sciences (INFORMS) and is targeted towards data scientists. During the certification exam, candidates must demonstrate their expertise of the end-to-end analytics process. This includes the framing of business and analytics problems, data and methodology, model building, deployment and life cycle management.
- 5+ years of analytics work-related experience for BA/BS holder in a related area
- 3+ years of analytics work-related experience for MA/MS (or higher) holder in a related area
- 7+ years of analytics work-related experience for BA/BS (or higher) holder in an unrelated area
- Verification of soft skills/provision of business value by employer
- Agreement to adhere to Code of Ethics
Targeted towards the elite level, the CCP:DS is aimed at data scientists who can demonstrate advanced skills in working with big data. Candidates are drilled in 3 exams – Descriptive and Inferential Statistics, Unsupervised Machine Learning and Supervised Machine Learning – and must prove their chops by designing and developing a production-ready data science solution under real-world conditions.
Related Cloudera certifications include:
The EMCDSA certification tests your ability to apply common techniques and tools required for big data analytics. Candidates are judged on their technical expertise (e.g. employing open source tools such as “R”, Hadoop, and Postgres, etc.) and their business acumen (e.g. telling a compelling story with the data to drive business action).
Once you’ve passed the EMCDSA, you can consider the Advanced Analytics Specialty. This works on developing new skills in areas such as Hadoop (and Pig, Hive, HBase), Social Network Analysis, Natural Language Processing, data visualization methods and more.
This certification is designed for SAS Enterprise Miner users who perform predictive analytics. Candidates must have a deep, practical understanding of the functionalities for predictive modeling available in SAS Enterprise Miner 7 before they can take the performance-based exam. This exam includes topics such as data preparation, predictive models, model assessment and scoring and implementation.
Related SAS certifications include:
- Statistical Business Analyst Using SAS 9: Regression and Modeling
- Business Intelligence Content Developer for SAS 9
Jobs Similar to Data Scientist
Some data scientists get their start working as low-level Data Analysts, extracting structured data from MySQL databases or CRM systems, developing basic visualizations or analyzing A/B test results. These jobs aren’t usually that challenging.
However, once you have your technical skills in order, you have plenty of options. If you’d like to push beyond your analytical role, you could think about building/engineering/architecture jobs such as:
Data Scientist Job Outlook
In an oft-cited 2011 big data study, McKinsey reported that by 2018 the U.S. could face a shortage of 140,000 to 190,000 “people with deep analytic skills” and 1.5 million “managers and analysts with the know-how to use the analysis of big data to make effective decisions.”
The ensuing panic has led to high demand for data scientists. Companies of every size and industry – from Google, LinkedIn and Amazon to the humble retail store – are looking for experts to help them wrestle big data into submission. Starting salaries are astronomical.
The bubble is bound to burst, of course. In a 2014 Mashable article, Roy Lowrance, the managing director of New York University’s Center for Data Science program, is quoted as saying “anything that gets hot like this can only cool off.” But even as demand for data engineers surges, job postings for big data experts are expected to remain high.
There are also some indications that the roles of data scientists and business analysts are beginning to merge. In certain companies, “new look” data scientists may find themselves responsible for financial planning, ROI assessment, budgets and a host of other duties related to the management of an organization.