Data scientists can be considered well paid, but they earn those healthy paychecks. Success as a data scientist is likely to require a mastery of both hard and soft skills. You may be required to execute a complex database query, but also interface comfortably with data users and producers throughout your organization. Here’s a rundown of the primary areas in which a would-be data scientist should aspire to excel:
Data-Driven Problem Solving
A data scientist is likely to know how to productively approach a problem. This means identifying a situation’s salient features, figuring out how to frame a question that will yield the desired answer, deciding what approximations make sense, and consulting the right co-workers at the appropriate junctures of the analytic process. All of that in addition to knowing which data science methods to apply to the problem at hand.
Featured Certificates and Short Courses
Sponsored Certificates and Short Courses
Harvard Business Analytics Certificate Online
This rigorous 9-month online certificate is for seasoned professionals who want to learn how to leverage data to drive competitive business strategy.
Data scientists use a variety of programming languages and software packages to flexibly and efficiently extract, clean, analyze, and visualize data. An aspiring data scientist will want to be familiar with at least these five:
- R was once confined almost exclusively to academia, but social networking services, financial institutions, and media outlets now use this programming language and software environment for statistical analysis, data visualization, and predictive modeling.
- Python, unlike R, was not designed for data analysis. Now that data analytics and data processing libraries have been developed for Python, however, the likes of Bank of America and Facebook are using Python for data science. The high-level programming language is powerful, fast, friendly, open and easy to learn.
- SQL, or Structured Query Language, is a special-purpose programming language for managing data held in relational database management systems. Some of what you can do with SQL—data insertion, queries, updating and deleting, schema creation and modification, and data access control—you can also accomplish with R, Python, or even Excel, but writing your own SQL code could be more efficient and yield reproducible scripts.
- Seattle-based software company Tableau offers a suite of products that complement data science standbys such as R and Python. Tableau may not be the best tool for cleaning or reshaping data, and its relational model doesn’t allow for procedural computations or offline algorithms, but it is great for data exploration and interactive analysis.
- Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop offers computing power, flexibility, fault tolerance and scalability.
Software runs all the necessary statistical tests these days, but a data scientist may still have to possess the statistical sensibility to know which test to run when and how to interpret the results. A solid understanding of multivariable calculus and linear algebra, which form the basis of many data analysis techniques, is likely to allow a data scientist to build in-house implementations of analysis routines as needed.
Pictures often communicate more effectively than either numbers or words, so it could behoove a data scientist to be able to present data in a visually compelling way. This requires you to not only master data visualization tools but also familiarize yourself with the principles of visualizing data effectively.
Data scientists must be able to report technical findings such that they are comprehensible to non-technical colleagues, whether corner-office executives or associates in the marketing department. Make your data-driven story not just comprehensible but compelling.
The skills required of a data scientist can be sliced and diced in different ways. Mitchell Sanders’s Data Science Central blog post concludes with an assortment of breakdowns, and perusing these may help you wrap your head around what it takes to make it as a data scientist. It is also important to remember, as Dave Holtz points out on the Udacity blog, that the “data scientist” job title encompasses a variety of positions, which may demand vastly different skills from applicants. Holtz’s post identifies four types of data scientist jobs and breaks down which skills are most vital for each.