Data scientists are big data wranglers. They take an enormous mass of messy data points (unstructured and structured) and use their formidable skills in math, statistics and programming to clean, manage and organize them. Then they apply all their analytic powers – industry knowledge, contextual understanding, skepticism of existing assumptions – to uncover hidden solutions to business challenges.
Southern Methodist University
University of California, Berkeley
George Washington University
Harvard Business Analytics Certificate Online
Data Scientist Responsibilities
“A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician”. – Josh hWills, What is the difference between a data scientist and a statistician?
On any given day, a data scientist’s responsibilities may include:
- Conduct undirected research and frame open-ended industry questions
- Extract huge volumes of data from multiple internal and external sources
- Employ sophisticated analytics programs, machine learning and statistical methods to prepare data for use in predictive and prescriptive modeling
- Thoroughly clean and prune data to discard irrelevant information
- Explore and examine data from a variety of angles to determine hidden weaknesses, trends and/or opportunities
- Devise data-driven solutions to the most pressing challenges
- Invent new algorithms to solve problems and build new tools to automate work
- Communicate predictions and findings to management and IT departments through effective data visualizations and reports
- Recommend cost-effective changes to existing procedures and strategies
Every company will have a different take on job tasks. Some treat their data scientists as data analysts or combine their duties with data engineers; others need top-level analytics experts skilled in intense machine learning and data visualizations.
As data scientists achieve new levels of experience or change jobs, their responsibilities invariably change. For example, a person working alone in a mid-size company may spend a good portion of the day in data cleaning and munging. A high-level employee in a business that offers data-based services may be asked to structure big data projects or create new products.
How to Become a Data Scientist
1. Pursue an undergraduate, graduate, or certificate in data science or closely related field.
Broadly speaking, you have 3 education options if you’re considering a career as a data scientist:
- Degrees and graduate certificates provide structure, internships, networking and recognized academic qualifications for your résumé. They will also cost you significant time and money.
- MOOCs and self-guided learning courses are free/cheap, short and targeted. They allow you to complete projects on your own time – but they require you to structure your own academic path.
- Bootcamps are intense and faster to complete than traditional degrees. They may be taught by practicing data scientists, but they won’t give you degree initials after your name.
Academic qualifications may be more important than you imagine. As Burtch Works notes, data scientists typically have a graduate or advanced degree in a quantitative discipline.
As of a May 2017, with the release of The Burtch Works Study – Education, 90% of interviewed data scientists reported to obtaining an advanced degree – 49% hold a master’s and 41% hold a PhD.
Note: Check out our list of 23 Great Schools with Master’s Programs in Data Science.
2. Improve and fine tune your skills in statistics, data mining and data analysis.
Technical Skills for Data Scientists
- Math (e.g. linear algebra, calculus and probability)
- Statistics (e.g. hypothesis testing and summary statistics)
- Machine learning tools and techniques (e.g. k-nearest neighbors, random forests, ensemble methods, etc.)
- Software engineering skills (e.g. distributed computing, algorithms and data structures)
- Data mining
- Data cleaning and munging
- Data visualization (e.g. ggplot and d3.js) and reporting techniques
- Unstructured data techniques
- R and/or SAS languages
- SQL databases and database querying languages
- Python (most common), C/C++ Java, Perl
- Big data platforms like Hadoop, Hive & Pig
- Cloud tools like Amazon S3
This list is always subject to change. As Anmol Rajpurohit suggests, “generic programming skills are a lot more important than being the expert of any particular programming language.”
Business Skills for Data Scientists
- Analytic Problem-Solving: Approaching high-level challenges with a clear eye on what is important; employing the right approach/methods to make the maximum use of time and human resources.
- Effective Communication: Detailing your techniques and discoveries to technical and non-technical audiences in a language they can understand.
- Intellectual Curiosity: Exploring new territories and finding creative and unusual ways to solve problems.
- Industry Knowledge: Understanding the way your chosen industry functions and how data are collected, analyzed and utilized.
Note: You can view a handy trajectory on How to Become a Data Scientist in an infographic from Datacamp. Also, KDnuggets.com is a great source of information on big data, machine learning, and data science topics.
3. Review additional data scientist certifications and post-graduate learning.
To avoid wasting time on poor quality certifications, ask your mentors for advice, check job listing requirements and consult articles like Tom’s IT Pro “Best Of” certification lists. Here are a few that focus on useful skills:
CAP was created in 2013 by the Institute for Operations Research and the Management Sciences (INFORMS) and is targeted towards data scientists. During the certification exam, candidates must demonstrate their expertise of the end-to-end analytics process. This includes the framing of business and analytics problems, data and methodology, model building, deployment and life cycle management.
- 5+ years of analytics work-related experience for BA/BS holder in a related area
- 3+ years of analytics work-related experience for MA/MS (or higher) holder in a related area
- 7+ years of analytics work-related experience for BA/BS (or higher) holder in an unrelated area
- Verification of soft skills/provision of business value by employer
- Agreement to adhere to Code of Ethics
The EMCDS certification training will enable you to learn how to apply common techniques and tools required for big data analytics. Candidates are judged on their technical expertise (e.g. employing open source tools such as “R”, Hadoop, and Postgres, etc.) and their business acumen (e.g. telling a compelling story with the data to drive business action). There are two levels to certificated, associate and specialist.
Once you’ve passed the EMCDS associate level exam, you can consider the Advanced Analytics Specialty. The certification training works on developing new skills in areas such as Hadoop (and Pig, Hive, HBase), Social Network Analysis, Natural Language Processing, data visualization methods and more.
This certification is designed for SAS Enterprise Miner users who perform predictive analytics. Candidates must have a deep, practical understanding of the functionalities for predictive modeling available in SAS Enterprise Miner 14.. This exam includes topics such as data preparation, predictive models, model assessment and scoring and implementation.
Related SAS certifications include:
- Statistical Business Analyst Using SAS 9: Regression and Modeling
- Business Intelligence Content Developer for SAS 9
An Interview with a Real Data Scientist
We caught up with Lisa Qian, Data Scientist at Airbnb, to find out what it’s like to work as a data scientist. Read on to learn about the impact data science has on Airbnb’s success, the programming languages they use on the job, and what students need to know in order to succeed.
A: Successful data scientists have a strong technical background, but the best data scientists also have great intuition about data. Rather than throwing every feature possible into a black box machine learning model and seeing what comes out, one should first think about if the data makes sense. Are the features meaningful, and do they reflect what you think they should mean? Given the way your data is distributed, which model should you be using? What does it mean if a value is missing, and what should you do with it? The answers to these questions differ depending on the problem you are solving, the way the data was logged, etc., and the best data scientists look for and adapt to these different scenarios.The best data scientists are also great at communicating, both to other data scientists and non-technical people. In order to be effective at Airbnb, our analyses have to be both technically rigorous and presented in a clear and actionable way to other members of the company.
Data Scientist Salary for 2018: How much does a data scientist make?
In Glassdoor’s 50 Best Jobs in America, as of January 2018, data scientist is ranked number one! According to The Burtch Works Study – Region (2017), 40% of data scientists work on the West Coast. Entry-level professionals in that area earn a median base salary of $102,500 – about 13% more than their Northeast peers.
Average Salary: $120,931 per year
Median Salary: $90,993 per year
Total Pay Range: $61,927 – $124,757
Senior Data Scientist
Median Salary: $125,851 per year
Total Pay Range: $87,485 – $163,132
Jobs Similar to Data Scientist
Some data scientists get their start working as low-level Data Analysts, extracting structured data from MySQL databases or CRM systems, developing basic visualizations or analyzing A/B test results. These jobs aren’t usually that challenging.
Once you have your technical skills in order, you have plenty of options. If you’d like to push beyond your analytical role – you could think about building/engineering/architecture jobs such as:
Data Scientist Jobs
Companies of every size and industry – from Google, LinkedIn and Amazon to the humble retail store – are looking for experts to help them wrestle big data into submission. In a 2014 Mashable article, Roy Lowrance, the managing director of New York University’s Center for Data Science program, is quoted as saying “anything that gets hot like this can only cool off.” But even as demand for data engineers surges, job postings for big data experts are expected to remain high.
There are also some indications that the roles of data scientists and business analysts are beginning to merge. In certain companies, “new look” data scientists may find themselves responsible for financial planning, ROI assessment, budgets and a host of other duties related to the management of an organization.
Professional Organizations for Data Scientists
- Data Science Association
- International Institute for Analytics (IIA)
- International Machine Learning Society (IMLS)
- Institute for Operations Research and the Management Sciences (INFORMS)
- Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD)