Using SQL for Data Science

Why SQL?

How is SQL (pronounced see–kwell) relevant to you as an aspiring data scientist? And what is SQL anyway?

SQL, or Structured Query Language, is a special-purpose programming language for managing data held in relational database management systems. Almost all structured data is stored in such databases, so, if you want to play with data, chances are you’ll want to know some SQL.

Some of what you can do with SQL—data insertion, queries, updating and deleting, schema creation and modification, and data access control—you can also accomplish with R, Python, or even Excel, but writing your own SQL code is more efficient and can yield more easily reproducible scripts. Learning SQL can also give you a leg up on folks transitioning, often with limited database experience, into the data science industry from academia.

SPONSORED SCHOOLS

Rutgers University

info

Rutgers Data Science Bootcamp

Gain skills needed to analyze data and deliver value to organizations. Complete projects using real data sets from the worlds of finance, healthcare, government, social welfare, and more.

Southern Methodist University

info

SMU Data Science Boot Camp

Develop concrete, in-demand data skills and learn how to help drive business decisions and solve challenges that companies are facing. No programming experience required.

Northwestern University

info

Northwestern Data Science and Visualization Boot Camp

Northwestern Data Science and Visualization Bootcamp teaches practical and technical skills in 24 intensive weeks. Students apply their knowledge to hands-on projects that translate directly into work in the field.

University of Southern California

info

USC Viterbi Data Analytics Boot Camp

Expand your skill set and grow as a data analyst. This program covers the specialized skills to be successful in the field of data in 24 weeks.

info SPONSORED

Six Handy SQL Features for Data Scientists

Here, adapted from a list on the Yhat blog, are “some awesome things you can do with SQL.”

  • Generate queries from a query: Basic string concatenation makes it easy to generate en masse queries that use data in a database to fetch data found in another system.
  • Handle dates: “Fantastic date functions” exist to meet all your formatting and type conversion needs.
  • Text mining: Yhat recommends going as far as you can with SQL’s built-in string functions before turning to a scripting language.
  • Find the median: Since there’s no built-in aggregate function for median, Yhat provides the code.
  • Load data into your database with the \COPY command.
  • Generate sequences: Use the generate_series function to create ranges of dates and times and to handle time series and funnels.

Interested in a different career? Check out our other bootcamp guides below:

Learn SQL Online

Resources for learning SQL abound, ranging from text-based to interactive and providing entry points for learners with varying levels of programming and database experience. Here’s a sampling of the online offerings:

  • Intro to SQL: Querying and managing data from Khan Academy covers SQL basics, more advanced SQL queries, relational queries in SQL, and modifying databases with SQL.
  • SQL School from Mode Analytics is an interactive tutorial geared toward aspiring analysts with no SQL experience.
  • SQLCourse.com allows you to practice what you learn in its tutorial using an online SQL interpreter.
  • SQL Tutorial from Tutorials Point gives beginners a quick start with SQL.
  • Learn SQL the Hard Way is written for people with no prior database, programming, or SQL knowledge, and promises to teach you “80% of SQL you probably need to use it effectively.”

Happy querying!

Last updated: June 2020