Kaggle is best known for its competitions—prizes up to $100,000 draw some of the brightest machine learning minds to the site. But there’s an archive of challenges for participants of all levels. Where should beginners get started and what do they need to know before making their first entry?
Kaggle competitions are online machine learning challenges for data science enthusiasts to learn new skills, practice old ones and sometimes win prizes. Every competition includes a dataset, evaluation metrics and rules for all participants.
Every competitor is part of a “team,” which can consist of anywhere from one person to the competition maximum, which varies by set of rules.
There are four primary types of Kaggle competitions:
Getting Started: recommended for machine learning beginners or first-time Kaggle users.
Playground: centered on fun; a slightly elevated skillset from Getting Started.
Featured: tend to use commercially relevant problems and have large prizes.
Research: experimental problems that don’t typically include a prize offering.
What are you playing for? Kaggle competitions offer a few different prizes and outcomes at the end, depending on the competition type. The Getting Started competitions and some of the Playground or Research competitions offer knowledge or kudos. Simply put, there is no prize for completing these challenges other than building your skillset.
More tangible outcomes include prizes or swag, which vary by the competition and the sponsor. For example, the DonorsChoose.org Application Screening competition on Kaggle offered prizes that included Google Pixelbook laptops, Google Pixel 2 mobile phones and gift cards to the authors of the most upvoted kernels.
There are Kaggle competitions that function as interviews, and the prize is a job interview with the sponsoring company. Allstate, Facebook and Walmart have all used Kaggle as a recruiting method for data science positions in the past.
To get started, you need to create a free Kaggle account. You can then pick a competition, agree to the rules and get started cleaning the dataset. For first timers feeling overwhelmed, Kaggle provides a library full of resources and forums to make it easier. You can also check out interviews with past competition winners on their strategies and best practices.
5 Pieces of Advice from Kaggle Competition Pros
Pick a competition that excites you, even if you aren’t an expert in the applied industry.
Dmitry Gordeev and Philipp Singer (The Zoo) won the NFL Big Data Bowl (2020), even though they admit they knew almost nothing about American football before entering. “Don’t worry about having domain knowledge to attempt a specific problem,” they advised. “The main thing we learned in this competition is that you don’t necessarily need domain knowledge or industry [knowledge] to successfully tackle the data science challenge.”
Explore a variety of sources when you get started, like the challenge forum, past competitions that are similar, popular kernels and outside research papers.
Shubin Dai (bestfitting), No. 1 on the Kaggle leaderboard in May 2018, keeps all his initial findings in one space. “Within the first week of a competition launch, I create a solution document, which I follow and update as the competition continues on,” he said. “I must first try to get an understanding of the data and the challenge at hand, then research similar Kaggle competitions and all related papers.”
Craft a unique approach using what you’ve learned from your collection of sources.
Nicole Finnie, silver medalist in the 2018 Data Science Bowl, said it can help you stand out in the competition. “Once I had a good feel for the theory, then it just took lots of time and work to implement,” she explained. “When you use a popular kernel, make sure to try to implement ideas and concepts from different research papers; that will be more likely to set you apart from other Kagglers.”
Divide the work into manageable pieces.
Data Science for Good: City of Los Angeles, explained how it helps you learn and apply the theory. “Always try to break the data science problem into smaller chunks and try to solve it in an iterative process,” he said. “The most important data science skills are applied and practical data science skills.”
Practice writing robust kernels and exploratory data analysis (EDA) to get a better understanding of the data.
Martin Henze, the first Kaggle Kernels Grandmaster, considers EDA and data visualization to be a pillar of his success. “Plotting a data set from many different angles, and with many different styles and tools, helps me immensely in discovering patterns and correlations,” he said. “More importantly: A Kernel is a perfect lab book in which to document your approach and results — and therefore a great foundation for a successful competition contribution. In my view, learning how to plan, execute, and document your work is one of the most fundamental building blocks for the success of any data-related project.”
Hailed as “the best, first challenge for you to dive into ML competitions and familiarize yourself with how the Kaggle platform works,” the Titanic competition asks participants to predict which passengers survived the crash.
Predict the final price of homes in Ames, Iowa, based on 79 different variables, such as the roof style or land slope. Kaggle recommends this competition for students with some machine learning background who want to practice their skills before entering a featured competition.
Predict bike rental demand in Washington, D.C., based on the duration, departure and arrival locations, and time elapsed for past rentals. The dataset includes hourly data for two years’ worth of bike rentals.