Assembling a strong set of data skills is useful, but there may be other skills to consider::
Hard (i.e., technical)
Human (i.e., interpersonal)
Companies typically look for both.
Mathematics (Other Than Statistics)
Basic undergraduate courses typically cover calculus and linear algebra. Once you have those “basics” under your belt, try digging into matrix computation, diffusion geometry, and similar topics in applied mathematics.
“Understanding correlation, multivariate regression and all aspects of massaging data together to look at it from different angles for use in predictive and prescriptive modeling is the backbone knowledge that’s really step one of revealing intelligence.”
Which means interviewers are going to be looking for core competencies in statistical tools such as:
“If you were to leave today and ask: ‘What specific skills should I learn?’ Python.”
A solid understanding of SQL-based systems is useful. Try learning the fundamentals of database design and management – get a handle on primary and foreign keys, indexing, querying, normalization, constraints and other basic features.
NewSQL helps create scalable, horizontally-distributed systems like Cloudera Impala or VoltDB. This effort combines the power of NoSQL for big, messy data with the rigorous and reliable structure of traditional relational databases.
“Knowing the difference between a fact table that is put together well and one that is faulty with semi-structured unconstrained keys makes all the difference in how easily you can trust and massage the data you’re trying to capture.”
For those getting into the game, Roe recommends starting with data modeling tools, techniques and methodologies like:
Agile Data Modeling
UML class diagrams
Predictive modeling is important. Harris goes so far as to class predictive modeling as one of a data scientist’s four core competencies (along with SQL, statistics and programming).
“If you don’t have at least a grounding in these skills, you’re probably not getting through the door, in part because they form a common language that lets people from different backgrounds talk to each other.”
Want to make back some of your education costs?
Test your skills against the best on Kaggle, a crowdsourced platform for data predictions. Companies and organizations regularly award prizes for the best solutions to their predictive-modeling needs.
Machine learning is variously defined as the:
Ability of a machine to improve its own performance through artificial intelligence
Use of computers to develop and improve algorithms
Science of getting computers to act without being explicitly programmed
Machine learning, formerly the province of science fiction, is now making a regular appearance in lists of data science job requirements.
You can search, scrub and mine data to your heart’s content, but in the end, it all comes down to showcasing your findings in a way that business users will understand.
This can be achieved with visualization tools such as:
Google Visualization API
It’s the all-important end step. Always keep in mind: the clearer your findings, the easier the decision, the quicker the outcome, and the higher the praise for all your hours of hard work.
Domain expertise usually means having a deep and abiding interest in your field of expertise (e.g., medicine, government, retail, manufacturing, etc.) and a total understanding of your organization’s data.
How can you cultivate those two desirables?
Become familiar with the systems.
Explore the products.
Learn how the data is collected and how it’s being used.
Get to know the people who are involved in each step of that collection and use.
“I’ve never heard anyone discuss a data science profile without talking about understanding the business. Again, it’s critical to have the person running the analysis fully understand – and be interested in – why this question is being asked, what the business person would do given the results, and why they would make that decision.”
Creativity and Curiosity
Data scientists look at a incomplete and messy data, inadequate analytics, faulty methods and models, and seemingly insoluble business problems, and they say:
“I got this!”
Creative data scientists are curious. They aren’t afraid of playing around in unstructured environments, of proceeding by trial and error, of following the white rabbit down the hole.
Creative data scientists experiment. They blend CRM transaction records with traffic reports; they entangle multiple systems and data sets; they hack across a dizzying array of incompatible data sources.
“Individuals may be judged not because of what they’ve done, or what they will do in the future, but because inferences or correlations drawn by algorithms suggest they may behave in ways that make them poor credit or insurance risks, unsuitable candidates for employment or admission to schools or other institutions, or unlikely to carry out certain functions.”
These risks may be difficult for your company to ignore.
The Elevator Speech
In the end, it may come down to a few simple must-haves.
Harris explains what Chris Pouliot, Director of Algorithms and Analytics at Netflix, is really looking for in candidates:
“An advanced degree in a quantitative field; hands-on experience hacking data (ideally using Hive, Pig, SQL or Python); good exploratory analysis skills; the ability to work with engineering teams; and the ability to generate and create algorithms and models rather than relying on out-of-the-box ones.”