Data scientists are well paid, sure, but they earn those healthy paychecks. Success as a data scientist requires mastery of an array of both hard and soft skills. You must be able to execute a complex database query, but also interface comfortably with data users and producers throughout your organization. Here’s a rundown of the primary areas in which a would-be data scientist should aspire to excel:
Data-Driven Problem Solving
A data scientist needs to know how to productively approach a problem. This means identifying a situation’s salient features, figuring out how to frame a question that will yield the desired answer, deciding what approximations make sense, and consulting the right co-workers at the appropriate junctures of the analytic process. All of that in addition to knowing which data science methods to apply to the problem at hand.
Start your search with respected programs recruiting students from around the US.
More InfoSouthern Methodist University
* GRE waivers available for experienced applicants
Data scientists use a variety of programming languages and software packages to flexibly and efficiently extract, clean, analyze, and visualize data. An aspiring data scientist will want to be familiar with at least these five:
- R was once confined almost exclusively to academia, but social networking services, financial institutions, and media outlets now use this programming language and software environment for statistical analysis, data visualization, and predictive modeling.
- Python, unlike R, was not designed for data analysis. Now that data analytics and data processing libraries have been developed for Python, however, the likes of Los Alamos National Laboratory, Bank of America, and Facebook are using Python for data science. The high-level programming language is powerful, fast, friendly, open, and easy to learn.
- SQL, or Structured Query Language, is a special-purpose programming language for managing data held in relational database management systems. Some of what you can do with SQL—data insertion, queries, updating and deleting, schema creation and modification, and data access control—you can also accomplish with R, Python, or even Excel, but writing your own SQL code is more efficient and yields easily reproducible scripts.
- Seattle-based software company Tableau offers a suite of products that complement data science standbys such as R and Python. Tableau is not the best tool for cleaning or reshaping data, and its relational model doesn’t allow for procedural computations or offline algorithms, but it is great for data exploration and interactive analysis.
- Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop offers computing power, flexibility, fault tolerance, low cost, and scalability.
Software runs all the necessary statistical tests these days, but a data scientist still has to have the statistical sensibility to know which test to run when and how to interpret the results. A solid understanding of multivariable calculus and linear algebra, which form the basis of many data analysis techniques, allows a data scientist to build in-house implementations of analysis routines as needed.
Pictures often communicate more effectively than either numbers or words, so it behooves a data scientist to be able to present data in a visually compelling way. This requires you to not only master data visualization tools but also familiarize yourself with the principles of visualizing data effectively.
Data scientists must be able to report technical findings such that they are comprehensible to non-technical colleagues, whether corner-office execs or associates in the marketing department. Make your data-driven story not just comprehensible but compelling, and you just might compel your boss to give you a raise.
The skills required of a data scientist can be sliced and diced in different ways. Mitchell Sanders’s Data Science Central blog post concludes with an assortment of breakdowns, and perusing these may help you wrap your head around what it takes to make it as a data scientist. It is also important to remember, as Dave Holtz points out on the Udacity blog, that the “data scientist” job title encompasses a variety of positions, which may demand vastly different skills from applicants. Holtz’s post identifies four types of data scientist jobs and breaks down which skills are most vital for each.