Master's in Data Science

  • Top Schools
    • 23 Great Schools with Master’s Programs in Data Science
    • 22 Top Schools with Master’s in Information Systems Degrees
    • 25 Top Schools with Master’s in Business Analytics Programs
  • Online Programs
    • Online Data Science Degree Programs
    • 2022 Guide to Online Bachelor’s in Computer Science Degree Programs
    • Online Masters in Business Analytics Programs
    • Online Masters in Information Systems Programs
    • Online Masters in Computer Engineering
    • Online Masters in Computer Science
    • Online Masters in Cybersecurity
    • Online Certificate Programs in Analytics
  • By State
    • Alabama
    • Arizona
    • Arkansas
    • California
    • Colorado
    • Connecticut
    • Delaware
    • Florida
    • Georgia
    • Hawaii
    • Idaho
    • Illinois
    • Indiana
    • Iowa
    • Kansas
    • Kentucky
    • Louisiana
    • Maine
    • Maryland
    • Massachusetts
    • Michigan
    • Minnesota
    • Mississippi
    • Missouri
    • Montana
    • Nebraska
    • Nevada
    • New Hampshire
    • New Jersey
    • New Mexico
    • New York
    • North Carolina
    • North Dakota
    • Ohio
    • Oklahoma
    • Oregon
    • Pennsylvania
    • Rhode Island
    • South Carolina
    • South Dakota
    • Tennessee
    • Texas
    • Utah
    • Vermont
    • Virginia
    • Washington
    • Washington, D.C.
    • West Virginia
    • Wisconsin
  • Related Degrees
    • Data Science Bachelor Degrees
    • Data Science Certificate Programs for 2022
    • Master’s in Accounting Analytics
    • Master’s in Applied Statistics
    • Master’s in Business Analytics
    • Master’s in Business Intelligence
    • Master’s in Geospatial Science & GIS
    • Master’s in Health Informatics
    • Master’s in Library Science
    • Master’s in Public Policy Data Analytics
    • MBA in Analytics/Data Science
    • PhD in Data Science Programs
    • Programs Outside the US
  • Careers
    • Business Analyst
    • Business Analyst Salary
    • Computer Engineer
    • Computer Scientist
    • Data Analyst
    • Data Analyst Salary Guide
    • Data Architect
    • Data Engineer
    • Data Mining Specialist
    • Data Scientist
    • Data Scientist Salary
    • Marketing Analyst
    • Quantitative Analyst
    • Financial Analyst
    • Information Security Analyst
    • Statistician
    • Digital Marketer
  • Online Courses
    • Your Guide for Online Data Science Courses in 2021
    • Online Data Analytics Courses
    • Machine Learning Courses
    • Blockchain Courses
    • Online Digital Marketing Courses
    • FinTech Courses
    • Financial Analysis Courses
    • Cybersecurity Courses
    • Business Analytics Courses
    • Artificial Intelligence Courses
    • UX/UI Courses
  • Bootcamps
    • Data Science Bootcamps
    • Data Analytics Bootcamps
    • Coding Bootcamps
    • Are Coding Bootcamps Worth it?
    • Cybersecurity Bootcamps
    • UX/UI Bootcamps
    • FinTech Bootcamps
    • Digital Marketing Bootcamps
  • Learning
    • What is Data Analytics?
    • What is Business Analytics?
    • What Is Cyber Security?
    • What is Computer Engineering?
    • What is Computer Science?
    • What is FinTech?
    • Best Programming Language to Learn
    • Is Computer Science a Good Major?
    • What Can You Do With a Computer Science Degree?
    • What Is a Neural Network?
    • What is an Information System?
    • Learn Data Science Online
    • Benefits of Business Intelligence Software
    • Computer Science vs. Computer Engineering
    • Cyber Security vs. Computer Science
    • Data Analyst vs Data Scientist
    • Data Analytics vs. Business Analytics
    • Data Science vs. Machine Learning
  • Resources
  • About 2U

Getting Started With Kaggle Competitions

May 13, 2020 Nicole Bennett

Kaggle is best known for its competitions—prizes up to $100,000 draw some of the brightest machine learning minds to the site. But there’s an archive of challenges for participants of all levels. Where should beginners get started and what do they need to know before making their first entry?

Kaggle is a machine learning and data science community site created in 2010 by founder and CEO Anthony Goldbloom. The site boasts a variety of data science tools, including open datasets, full courses, notebook capabilities and discussion boards. By 2017, Kaggle reached 1 million registered users and was acquired by Google, according to Venture Beat.

Goldbloom said the goal for Kaggle was to create a robust set of tools for data scientists. “We want you to be able to access great code/analysis that you can fork, data that you can analyze and join to and discussion that you can learn from,” he explained in a 2017 interview.

What Are Kaggle Competitions?

Kaggle competitions are online machine learning challenges for data science enthusiasts to learn new skills, practice old ones and sometimes win prizes. Every competition includes a dataset, evaluation metrics and rules for all participants.

Every competitor is part of a “team,” which can consist of anywhere from one person to the competition maximum, which varies by set of rules.

There are four primary types of Kaggle competitions:

Getting Started: recommended for machine learning beginners or first-time Kaggle users.
Playground: centered on fun; a slightly elevated skillset from Getting Started.
Featured: tend to use commercially relevant problems and have large prizes.
Research: experimental problems that don’t typically include a prize offering.

What are you playing for? Kaggle competitions offer a few different prizes and outcomes at the end, depending on the competition type. The Getting Started competitions and some of the Playground or Research competitions offer knowledge or kudos. Simply put, there is no prize for completing these challenges other than building your skillset.

More tangible outcomes include prizes or swag, which vary by the competition and the sponsor. For example, the DonorsChoose.org Application Screening competition on Kaggle offered prizes that included Google Pixelbook laptops, Google Pixel 2 mobile phones and gift cards to the authors of the most upvoted kernels.

Challenges can also offer money, some up to $500,000 ($1.5 million total across eight cash prizes) like the Passenger Screening Algorithm Challenge sponsored by the Department of Homeland Security. Competitions with large prizes are some of the most competitive, drawing thousands of team entries.

There are Kaggle competitions that function as interviews, and the prize is a job interview with the sponsoring company. Allstate, Facebook and Walmart have all used Kaggle as a recruiting method for data science positions in the past.

To get started, you need to create a free Kaggle account. You can then pick a competition, agree to the rules and get started cleaning the dataset. For first timers feeling overwhelmed, Kaggle provides a library full of resources and forums to make it easier. You can also check out interviews with past competition winners on their strategies and best practices.

5 Pieces of Advice from Kaggle Competition Pros

Pick a competition that excites you, even if you aren’t an expert in the applied industry.

Dmitry Gordeev and Philipp Singer (The Zoo) won the NFL Big Data Bowl (2020), even though they admit they knew almost nothing about American football before entering. “Don’t worry about having domain knowledge to attempt a specific problem,” they advised. “The main thing we learned in this competition is that you don’t necessarily need domain knowledge or industry [knowledge] to successfully tackle the data science challenge.”

Explore a variety of sources when you get started, like the challenge forum, past competitions that are similar, popular kernels and outside research papers.

Shubin Dai (bestfitting), No. 1 on the Kaggle leaderboard in May 2018, keeps all his initial findings in one space. “Within the first week of a competition launch, I create a solution document, which I follow and update as the competition continues on,” he said. “I must first try to get an understanding of the data and the challenge at hand, then research similar Kaggle competitions and all related papers.”

Craft a unique approach using what you’ve learned from your collection of sources.

Nicole Finnie, silver medalist in the 2018 Data Science Bowl, said it can help you stand out in the competition. “Once I had a good feel for the theory, then it just took lots of time and work to implement,” she explained. “When you use a popular kernel, make sure to try to implement ideas and concepts from different research papers; that will be more likely to set you apart from other Kagglers.”

Divide the work into manageable pieces.

Data Science for Good: City of Los Angeles, explained how it helps you learn and apply the theory. “Always try to break the data science problem into smaller chunks and try to solve it in an iterative process,” he said. “The most important data science skills are applied and practical data science skills.”

Practice writing robust kernels and exploratory data analysis (EDA) to get a better understanding of the data.

Martin Henze, the first Kaggle Kernels Grandmaster, considers EDA and data visualization to be a pillar of his success. “Plotting a data set from many different angles, and with many different styles and tools, helps me immensely in discovering patterns and correlations,” he said. “More importantly: A Kernel is a perfect lab book in which to document your approach and results — and therefore a great foundation for a successful competition contribution. In my view, learning how to plan, execute, and document your work is one of the most fundamental building blocks for the success of any data-related project.”

7 Kaggle Competitions to Get You Started

For the first competition: Titanic: Machine Learning from Disaster.

Hailed as “the best, first challenge for you to dive into ML competitions and familiarize yourself with how the Kaggle platform works,” the Titanic competition asks participants to predict which passengers survived the crash.

Resources to get started:

  • Alexis Cook’s Titanic Tutorial
  • Sklearn 27 Best Tips & Tricks
  • Applied Machine Learning and Data Analytics

For someone looking to expand on a statistics background: House Prices: Advanced Regression Techniques.

Predict the final price of homes in Ames, Iowa, based on 79 different variables, such as the roof style or land slope. Kaggle recommends this competition for students with some machine learning background who want to practice their skills before entering a featured competition.

Resources to get started:

  • House Prices EDA
  • Fun With Real Estate Data
  • Regularized Linear Models

For someone interested in natural language processing (NLP): Real or Not? NLP with Disaster Tweets.

Practice NLP in this competition that asks participants to predict whether tweets are describing actual disasters or not. Start out with 10,000 real tweets to build your model.

Resources to get started:

  • NLP with Disaster Tweets – EDA, Cleaning and BERT
  • NLP Getting Started Tutorial
  • A Real Disaster – Leaked Label

For someone interested in image recognition: Facial Keypoints Detection.

Detect and locate facial keypoints like the eyes, nose and mouth on a variety of images to be used for analyzing facial expressions and biometrics.

Resources to get started:

  • Getting Started With R
  • Using convolutional neural nets to detect facial keypoints tutorial
  • Facial Keypoint Detection notebook

For health data: Pima Indians Diabetes Database.

Predict whether women of Pima Indian heritage have diabetes based on variables such as blood pressure and skin thickness.

Resources to get started:

  • Step by Step Diabetes Classification-KNN-detailed
  • A Complete ML Pipeline Tutorial (ACU ~ 86%)

For economic data: Santander Customer Transaction Prediction.

Predict which bank customers will make specific transactions in the future based on anonymized variables.

Resources to get started:

  • Santander EDA and Prediction
  • Santander ML Explainability
  • 200 Magical Models – Santander – [0.920]

For urban planning data: Bike Sharing Demand.

Predict bike rental demand in Washington, D.C., based on the duration, departure and arrival locations, and time elapsed for past rentals. The dataset includes hourly data for two years’ worth of bike rentals.

Resources to get started:

  • Bike Sharing Demand [ RMSLE:: 0.3194]
  • Comprehensive EDA with XGBoost (Top 10 percentile)
  • EDA of the Bike Sharing
Share on Facebook Share
Share on TwitterTweet
Share on LinkedIn Share

Filed Under: Resources

SPONSORED DATA SCIENCE PROGRAMS

UC Berkeley - Master of Information and Data Science
Sponsored Program
Syracuse University - Master of Science in Applied Data Science
Sponsored Program

SPONSORED ANALYTICS PROGRAMS

American University - Master of Science in Analytics
Sponsored Program
Syracuse University - Master of Science in Business Analytics
Sponsored Program

Online Programs

  • Online Master’s in Data Science Programs
  • Online Master’s in Business Analytics
  • Master’s in Information Systems Online
  • Online Master’s in Computer Science
  • Online Master’s in Computer Engineering
  • Online Master’s in Cybersecurity
  • Graduate Certificates in Data Science Online

Career Profiles

  • Business Analyst
  • Data Analyst
  • Data Architect
  • Data Engineer
  • Data Scientist
  • Marketing Analyst
  • Information Security
  • Quantitative Analyst
  • Statistician

Bootcamps

  • Data Science Bootcamps
  • Data Analytics Bootcamps
  • Coding Bootcamps
  • Cybersecurity Bootcamps
  • UX/UI Bootcamps
  • Fintech Bootcamps
  • Digital Marketing Bootcamps

Online Courses

  • Online Data Science Courses
  • Online Data Analytics Courses
  • Online Machine Learning Courses
  • Online Blockchain Courses
  • Online Digital Marketing Courses
  • Online Financial Analysis Courses
  • Online Cybersecurity Courses
  • Online Business Analytics Courses
  • Online Artificial Intelligence Courses
  • Online UX/UI Courses

Industry Uses

  • Biotechnology
  • Energy
  • Finance
  • Gaming and Hospitality
  • Government
  • Health Care
  • Insurance
  • Internet
  • Manufacturing
  • Pharmaceuticals
  • Retail
  • Telecommunications
  • Travel and Transportation
  • Utilities
  • Food

Data Science Technologies

  • R
  • Python
  • SQL
  • Hadoop
  • Tableau

MastersInDataScience.org is owned and operated by 2U, Inc.
© 2U, Inc. 2022

About 2U | Privacy Policy | Terms of Use | Resources