Dr. Jennifer Lewis Priestley is a Professor of Statistics and Data Science at Kennesaw State University, where their Master of Science in Applied Statistics program was recently honored with a 2014 Data+ Editors’ Choice Award from ComputerWorld.com. We spoke with Dr. Priestley about the country’s current (and growing) analytical talent gap and the rise of data science programs working to fill the void. In the interview, Dr. Priestley discussed the cause of the talent gap, what students should look for when exploring data science degree programs, and what schools can do to best prepare data science students to solve real-world problems.
To learn more about Kennesaw State’s statistics and analytics programs, visit the Department of Statistics and Analytical Sciences.
McKinsey estimates that there will be a talent gap of nearly 200,000 people for analytical talent by 2018. What caused the current analytics talent gap, and what are universities doing to meet the growing demand?
Why are we seeing this gap in analytical talent? I think what’s causing this gap is that, from an employment perspective, we’re really seeing the equivalent of a run on the bank. We’ve never seen anything like this before. We will have companies across multiple domains and disciplines – like finance, healthcare, retail – come in making offers to the same students. It sounds bizarre because you look at the cross section of companies and you think, “They’re all so different,” but the reality is their needs are exactly the same.
That’s why we’re seeing this explosion in demand for analytical talent because, 80% – 90% of what you do, whether you’re in Epidemiology, retail, or web engine optimization, 80% – 90% of those analytical skills are exactly the same. I think companies are recognizing that. We have kind of this ubiquity of data and it’s cheap and easy to capture, but it’s difficult to make sense of. That phenomenon is happening at the same time across every sector of the economy. Everybody is chasing the same talent, and that kind of gets back to the point that we have kind of the employment equivalent of the run on the bank.
What are universities doing to meet the demand? I want to say that I clearly can’t represent every university in the country. I think some universities are doing a better job than others. Let me just tell you what we’re doing at Kennesaw State and then maybe I can talk a little bit more broadly in terms of what I think the general education community should be doing, if not actually doing.
At Kennesaw State, if you go back to 2006, it’s hard to believe, but we actually didn’t have any analytics courses at Kennesaw State. Our administration challenged us to say, “Okay, we see an opportunity for Kennesaw State to get involved.” At that point, we didn’t talk about data science and applied analytics and big data, but they did kind of recognize that there was an opportunity to really get more deeply involved in applied statistics. They said, “You guys have all done applied statistics professionally. We want you to really get into that. Here’s a blank piece of paper. Go build the program.” That was kind of the challenge.
We took a long, hard look at what other universities were engaged in. It became very clear to us very quickly that there are a lot of universities that have very good programs in theoretical statistics, but what’s really important is training people to sit in the center of data, to be able to extract, transport, load and then clean and analyze using this big portfolio of statistical techniques, and communicate the results of that analysis to somebody who doesn’t have the same level of analytical training, to be able to communicate that information to a decision maker, and improve the decision making process.
Nobody was doing this. Nobody was training people to walk into an organization and to be able to add value day one. We said, “Yep, that’s the gap. That is the sweet spot. That’s where we need to play.” As early as 2006, we recognized that to build that curriculum where students really would sit in the middle of the data, from 360 degrees, we were going to have to integrate Mathematics and Computer Science and Applied Statistics, but also we were going to have to do that within different applications there. Again, we weren’t trying to develop students to be theoretical statisticians, we wanted to train students to be able to live and breathe and sleep data, but in some kind of relevant application space.
More importantly, we also recognized that we needed to bring those data sets into the classroom in order to teach these concepts. We look to bringing in health care, finance, engineering data so that as we’re helping the students build these skills, they had to do it with a data set that was in some kind of applied space. As students move through the program, they get exposure to health care, finance, engineering, and other data. We make the point that you need to build some kind of specialization, but ultimately an important concept is that whether you’re dealing with health care or finance or marketing data, 90% of the skills that you need to be effective are the exact same.
Going back to your question of how Kennesaw has identified this gap, that gap is only getting bigger. I think that universities need to look at, “What are the demands of the marketplace?” “What do the job ads look like?” We brought in people from Equifax, Home Depot, Delta Airlines, Coca-Cola, AT&T, we said, “Bring your job ads with you. Tell us who you’re recruiting. Let’s make sure our curriculum and the skills that we’re teaching in the classroom ultimately are aligned with the people that you guys are recruiting.”
Again, I can’t represent every university in the country, but just consistently, logically, strategically, that process of partnering with the primary employers in your region to make sure that the skills and the curriculum that you’re teaching are aligned with their employment needs, that just seems to make so much sense to me.
Analytics talent can emerge from students of numerous academic disciplines. How are universities identifying promising analytics students, and what can they do to nurture these talents?
The first thing I would say is that in the 21st century, we as academics, we as an education community, I think we do our customers, the students, an immense disservice if we are not requiring them to learn, A, basic programming, B, some comfort and familiarity with data, and C, just some core mathematical skills. I don’t care if you’re studying European Literature or Art History; I would argue that if you’re going to be a productive member of society in the 21st century, you need to have a working knowledge of how to talk to a computer.
When they make me the benevolent dictator of education in this country, I would require every student to take Programming. That could be programming as simple as C++ or Java. It forces you to think in a very disciplined, methodical, logical way that is actually very close along the lines to the way computers think. It just creates an increased level of comfort with technology if nothing else. That’s kind of the first thing.
The second thing, going back to this point of being comfortable with data: data is ubiquitous. Again, I don’t care what discipline you’re ultimately engaged in, you’re going to have to know data and have some level of comfort with data, even if it’s just a recognition that you are a generator of data. We are agents of data everyday, whether we know it or not. We create these massive digital footprints.
You kind of have the latent and the manifest. There are obviously manifest skills, very tangible skills of learning how to work with data that ultimately contribute to you being more valuable in the marketplace, then there are latent skills of being cognizant of what you’re settings are in your cellphone and what kind of data you’re generating as a result of those settings. Just taking classes makes you very aware of the fact that you’re a data agent, which I think is just kind of a basic part of being a citizen in the 21st century.
All this is underpinned by a foundation in core mathematics. I have been appalled and stunned by the number of Master’s of Science in Analytics programs or Master’s of Arts in Analytics programs that have been popping up around the country that really are just not worth the paper that they’re listed on. You look at it and they don’t require the GRE, any calculus, anything above college algebra, any programming. I’m sorry, how are you going to teach somebody how to be a master’s level professional in analytics, advanced analytics, if you’re not even requiring basic math?
Sadly, I think there are a lot of universities out there that are kind of capitalizing on the buzzword of data science. Yesterday, it was an MBA program and today, it’s an “Analytics” program. Yesterday, it was an Operations Research program and today, it’s a “Data Science” program.
Speaking of buzzwords, Data Science has been praised as the “sexiest job of the 21st century,” but it’s not for everyone. What types of students tend to thrive in analytics and related programs on their way to a career in data science?
That’s a really cool question. I think the answer is different than what you might expect. At Kennesaw State, we have this amazing Master’s of Science in Applied Statistics program that has been explosive. The trajectory of the national recognition and growth has just been phenomenal. We were just recognized as the most innovative big data program in the country by ComputerWorld.com, and just had a student recognized as having the winning poster for the SAS Analytics Conference. Every day, we get these great points of recognition nationally, and if you look at the profile of our students, you might say they’re predominately mathematics or computer science students. The truth is actually completely different. The majority of students that come into our program have some combination of either of those, or a business background, so they’re coming out of Economics, Finance, or they already have an MBA, but they weren’t able to get the types of job that they were hoping for.
Number two, they’re coming out of the social sciences, so they kind of come out of psychology, sociology, political science, that sort of thing. We also tend to see engineers and people who have some previous work experience in the health care sector. They realized that the health care sector is now just characterized by big data. You can’t be a health care professional and not have a working knowledge of data, and that’s only getting bigger. That tends to be kind of the cross section that’s kind of an interesting mosaic of students that come to our program.
Working with big data involves proficiency in numerous disciplines. How can universities ensure that their programs prepare students for jobs with complex real-word challenges after graduation?
That’s a great question. This is something I get very passionate about, so buckle your seat belt. In most university programs, the data sets were common. Any of the data that students work with tends to be with like one hundred observations, and three variables, and it’s in an Excel spreadsheet and everything had perfect correlation, and everything is clean, and there’s nothing missing. The students get really good at working with data sets that could basically sit on the end of your pinkie.
I think that we are committing academic fraud if that’s the only data that students see. It would almost be better if they didn’t see anything than if they saw that because that’s not real. We’re creating such a misaligned expectation in terms of what it means to work with data. What needs to happen is that the data that’s brought into the classroom needs to be representative of the data that they’re going to experience after they graduate. That means it needs to be messy, complex, hairy, and difficult. And, it needs to come in a lot of different forms where they have to figure out how to do an ETL process.
An example: they’ve got one data set that’s sitting in an Oracle database, and they’ve got one that’s a transport file, and they’ve got one that’s sitting in some other type of database, and they’ve got to be able to extract all three of those, and put them together, and merge them. “Oh my gosh, I’ve got missing match keys. What do I do if I don’t have a match key on this particular file and then I bring it in and 2% of my data is completely missing? I’ve got to go through some kind of imputation process and depending on what imputation strategy I use, that’s going to have downstream implications, ultimately to my final answer.”
Anyway, these complexities and these frustrations, and these difficulties of working with messy, ugly, difficult data, I think is critical. It’s foundational to the learning process, to the education process of becoming an analytics professional. Again, as academics, there’s a ton of data out there. There’s a ton of publicly available data that you can go out and get, but there’s nothing more valuable than having an executive from a company come in with their data.
For example, one of the reasons ComputerWorld.com recognized us a little earlier in the month, was based on a class that we built with real world data that came from a Subprime lender, and the Chief Risk Officer from that company gave us the data sets. He brought that data set and it had 17 million observations, so 17 million card members spread across 3 different files, with about 400 variables. So you have a data set that’s about 17 million by 400.
He came into the classroom and he talked about the data and he said, “This is what my guys do in my shop. This is the process that they have to go through. Let me tell you some of the frustrations that they have. This is the frustration that you’re going to have to go through.” That is nirvana from the standpoint of teaching an analytics course, to have a senior executive from a local company come in with their data and talk about the challenges and the issues and the opportunities with their data, and then setting the students loose on it. That is what needs to happen in every classroom in the country.
Many universities are launching programs in big data and data science in order to meet the growing demand. With more options available, what should students look for when exploring programs related to data science?
The first thing I would say is I’d stay away from online programs. Here’s my thinking there. An online course is a great idea for somebody who’s trying to tweak their skill set or if they’re trying to learn a new software package. I think if you’ve been working as a journalist and now, you want to become a data scientist and you have to completely retool your skill set and start from scratch, you’re going to miss so much richness and texture if you are not sitting in a classroom and actually interacting with the other students, and hearing about their experiences. An important part of the education process is hearing how other people are processing the information. If you miss that richness, you also miss the opportunity to have that direct experience with the professor.
And importantly, if you are retooling your skill set and you’re starting from scratch, you have to have an internship, or a co-op, or a project, or something that is in a real world context. There’s just nothing that substitutes throwing a student into a company and, “Here’s your desk, here’s your laptop, you’ve got to pull in a billion records and you got to run the descriptive statistics on it and I’m going to ultimately need a model by next week. Here’s my number, call me if you need me.”
I wrote an article where I said, “Data science does not belong in the business school.” It was very controversial. I’m actually thankful that nobody slit my tires. Here’s where I was coming out on that. If you’re going to become truly an analytical professional and you’re going to really represent yourself as a data scientist, then you really do need to have some working skill sets that sit at that intersection of mathematics and statistics and computer science. My point in that article was not that business students don’t need to learn those skills. In fact, I think it’s the exact opposite, I think they do, but I think that if the university houses the analytics program in the business school, then typically they don’t have access to the same level of courses, again, mathematics, statistics and computer science.
I’m always intrigued by NC State for example – huge fan of Michael Rappa. I think NC State has got it exactly right. What NC State did, is with their Institute for Advanced Analytics, that program actually sits outside of the traditional university college structure. That program is not housed in a business school, the math department, or the computer science department. It actually sits outside of that structure. Then they can pool from all of those different places, all of those different disciplines.
One issue in educating future analytics professionals is the availability of data for students to work with. What can universities do to provide relevant datasets that allow students to hone their skills?
Yeah, I can’t emphasize enough the mutual benefit of these public-private partnerships. I have yet to talk to a company in the metropolitan Atlanta area who says, “No, I won’t give you a data set. I don’t want to help you.” Everybody is so willing and forthcoming to give us data to bring into the classroom for teaching purposes.
I would just say that universities need to have this very permeable membrane with their local economy, with companies and corporations that are in their local footprint. You have kind of this win, win. I partner with the Home Depot or I partner with Equifax, and they bring me data – real world data from their organization. They come in and they talk about their data. They also get an opportunity to kind of meet the students and learn more about what the students are doing and get a much deeper sense of what their skill sets are.
The students then work with that data. It’s almost like going through a training program in some ways, by working with that data, because now they understand Home Depot’s data, now they understand Equifax’s data. Then the executives oftentimes will come back at the end of the semester and they’ll look for students for a project. They’ll look at the student’s final model, they, “Hey, this is what I found,” and it gives them an opportunity to train candidates so it’s just a win, win. There’s no downside to a university partnering with a corporation this way. There’s zero downside.
What do you see for the future of analytics careers, and how can universities “keep up” in order to meet evolving needs?
I think the gap in analytical talent it’s going to get worse before it gets better. I don’t think the demand for analytical talent is going to go away anytime soon. You can see that kind of in the education marketplace. Every university in the country now it seems is popping up with analytics program and some of them are not good, to be perfectly honest. I don’t think it’s going to go away, I think it’s only going to get bigger.
How can universities keep up? We only have so many seats in the classroom. We sort of need to push the skills back a little bit in terms of the learning process. I made the point earlier that in the 21st century, I think, everybody coming out of a university should know basic programming. They should know basic mathematics. They should have some familiarity with data.
Increasingly, I would argue, that that doesn’t need to happen at the university level – that could be happening at the K-12 level. At the K-12 level, as a country, we should be doing a better job integrating what I consider to be now foundational skills, elementary skills. Learn how to code. Learn how to take that little turtle on the screen and have him learn how to crawl over the rocks, and find the food. Learn how to take what’s in your brain, and make the computer do what you want it to do. Again, that doesn’t have to happen at the freshman level, but should be happening at the fifth, sixth, seventh grade level. By the time they get to the university, we can get them to a much higher level.