Those interested in data science may be interested in learning the R programming language. R for data science can be used for statistical analysis and other functions. There are a number of ways to embark on your path to learn R. Keep reading to learn more about R in data science, R vs. Python, real-world applications of R, the best add-on packages for R and more.
What Is R in Data Science?
The R Foundation, a nonprofit focused on supporting the continued development of R through the R Project, describes R as “a language and environment for statistical computing and graphics.” But, if you’re familiar with R for data science, you probably know it’s a lot more than that.
R was created in the 1990s by Ross Ihaka and Robert Gentleman at the University of Auckland in New Zealand. The R language was modeled based on the S language developed at Bell Laboratories by John Chambers and other employees. Today, R is an open-source language; it’s accessible as a free software compatible with many systems and platforms.
Here are some important things to know about R in data science:
- R is an open-source software. R is free and adaptable because it’s an open-source software. R’s open interfaces allow it to integrate with other applications and systems. Open-source softwares have a high standard of quality, since multiple people use and iterate on them.
- R is a programming language. As a programming language, R provides objects, operators and functions that allow users to explore, model and visualize data.
- R is used for data analysis. R in data science is used to handle, store and analyze data. It can be used for data analysis and statistical modeling.
- R is an environment for statistical analysis. R has various statistical and graphical capabilities. The R Foundation notes that it can be used for classification, clustering, statistical tests and linear and nonlinear modeling.
- R is a community. R Project contributors include individuals who have suggested improvements, noted bugs and created add-on packages. While there are more than 20 official contributors, the R community extends to those using the open-source software on their own.
R vs. Python
Python and R are both open-source software languages that have been around for a while. When comparing R vs. Python, some feel that Python is a more general programming language. Python is often taught in introductory programming courses and is the primary language for multiple machine learning workflows, RStudio reports. R is typically used in statistical computing. RStudio notes that R is often taught in statistics and data science courses. It adds that many machine learning interfaces are written in Python, while many statistical methods are written in R.
In terms of R vs. Python environments, the R environment is ideal for data manipulation and graphing. Some Python applications include web development, numeric computing and software development. Additionally, while R has numerous packages, Python has many libraries devoted to data science.
Whether or not R vs. Python is better may come down to what you’re using each for. Being knowledgeable in both languages can be beneficial in data science. In fact, RStudio notes that many data science teams are “bilingual,” using both R and Python.
How Is R Used in Data Science?
R for data science focuses on the language’s statistical and graphical uses. When you learn R for data science, you’ll learn how to use the language to perform statistical analyses and develop data visualizations. R’s statistical functions also make it easy to clean, import and analyze data.
It may be equipped with an Integrated Development Environment (IDE). According to computer software company GitHub, the purpose of an IDE is to make writing and working with software packages easier. RStudio is an IDE for R that improves the accessibility of graphics and includes a syntax-highlighting editor that helps with code execution. This may be helpful as you begin to learn R for data science.
Data Science Projects That Use R
R for data science is used in industries such as banking, telecommunications and media. Below we explore examples of data visualization in R through real-life projects.
- T-Mobile: The international communications company uses R to classify customer service texts so it can properly direct customers to an agent, Revolutions reports. T-Mobile even shared an open-source version of their messaging classification application programming interface on GitHub.
- Twitter: R can be used to perform text analysis of tweets. Text analytics and scraping of Twitter data is possible through the twitteR package.
- Google Analytics: R can be combined with Google Analytics data to complete statistical analysis and create clear data visualizations, according to Google Developers. Installing the RGoogleAnalytics package will enable these insights.
- The Financial Times: The Financial Times embraced R to create a data visualizations in its article, “Is Russia-Saudi Arabia the worst World Cup game ever?,” Revolutions reports. The visualization mapped every World Cup match since 1998 and was created using R and the ggplot2.
- BBC: Similarly, Revolutions explains how BBC uses data visualization in R to create graphics for its publications. BBC developed an R package and R cookbook to standardize their data visualization graphic creation process. Its cookbook is based on the bbplot package. BBC offers a six-week training for its data journalists to learn this process.
The Best Add-On Packages for R
There are many packages you may consider installing to help use R. Below are some R packages for data science, based on the list of recommended packages from RStudio.
- DBI helps basic communication between R and database management systems.
- RMySQL, RSQLite and other database drivers assist with loading and reading data from a database.
- stringr includes user-friendly tools that work with character strings and regular expressions.
- dplyr offers functions for summarizing, connecting and rearranging datasets.
- lubridate facilitates working with dates and times across various periods.
- ggplot2 is well known for making it easy to produce visually appealing plots and graphics.
- rgl enables three-dimensional, interactive visualizations with R in which you can rotate and zoom in on parts of a visualization.
- randomForest is a machine learning package that can also be used in unsupervised learning.
- caret is helpful for training classification and regression models.
- shiny is an R package for data science that helps you create web apps.
- xtable provides HTML or latex code when you need to paste your R project into the final document.
- ggmap is one of multiple R packages for data science that helps with spatial data; it lets you download map areas from Google Maps and integrate them into ggplots.
- xts includes tools for working with time series datasets.
- XML assists in working with XML documents.
- httr assists in working with http connections.
- devtools helps you create your own R package.
Want to learn about more R packages for data science? Browse the complete list of recommended packages from RStudio.
Interested in a Career Shift? Check Out Online Bootcamps
- Data Science Bootcamp Guide: Use this guide if you aspire to become a data scientist or are looking to learn programming languages like Python or R for data science.
- Data Analytics Bootcamp Guide: Learn more about data analytics bootcamps if you’re interested in helping companies manage and gain insights from data.
- Coding Bootcamp Guide: Look into coding bootcamps if you want to gain web development skills and coding language knowledge.
- FinTech Bootcamp Guide: Discover bootcamps that focus on financial technology, blockchain and cryptocurrencies.
Online Courses in R Programming
Below are some online R courses to consider. These courses focus on fundamental R concepts to help you learn the basics of this programming language.
- Learn R from Codecademy: This course begins by teaching the fundamentals of R. It consists of 10 lessons covering topics such as data frames, data cleaning, aggregates, variance and standard deviation. Codecademy’s course may take about 20 hours to complete. There are no prerequisites.
- R Programming Fundamentals from Pluralsight: This online R course may help teach you about R variables, data structures, functions, packages and more. It also includes demonstrations and opportunities for hands-on practice. This course may take about seven hours to complete.
- Data Analysis with R from Udacity: This course begins by discussing exploratory data analysis (EDA). Lessons build upon EDA knowledge and focus on R basics, quantifying and visualizing variables, and predictive modeling. The self-paced course may take approximately two months to complete.
- Introduction to R and Visualization from Data Society: This online R course from Data Society teaches you about data science and how it’s used in companies, how to use R, and how to create visualizations with R. It includes two hours and 40 minutes of instruction and around 25 hours of practice.
Last updated: November 2020