Opportunities in Insurance Data Science
The Promise of Big Data
Even the insurance industry, the grand old dame of data analysis, has been taken aback by the amount of data currently deluging the digital domain. Where it was once difficult to gather data about potential risks, today’s insurers have an embarrassment of riches.
As Matt Josefowicz noted at an insurance leadership forum, the traditional underwriting process “was designed for a world of information scarcity and is trying to adapt now to information super-abundance.”
On any given day, insurance data scientists may gather data from:
- Telematics devices
- Smart phones
- Social media
- CCTV footage
- Electoral rolls
- Credit reports
- Website analytics
- Government statistics
- Satellite data
What’s more, the advent of cloud computing makes it relatively easy for companies to aggregate and store it all.
These sources tell insurers far more than did historical data from policy administration systems, claims management applications and billing systems, and the mortality reports of yesteryear. Through a judicious analysis of big data, insurers have now been empowered to improve their pricing accuracy, create customized products and services, forge stronger customer relationships and facilitate more effective loss prevention.
That’s good news for budding insurance data scientists. As big data continues its exponential growth, insurers are going to need help in deciding how to put it to good use.
Sponsored Online Master's Programs
Learn MoreSyracuse University
* GRE waivers are available.
Learn MoreSouthern Methodist University
* GRE waivers available for applicants with 3+ years work experience.
Learn MoreUniversity of Denver
Learn MoreUniversity of California, Berkeley
* No GRE Scores Required
Learn MoreUniversity of Dayton
Learn MoreAmerican University
Learn MorePepperdine University
Personalized Risk Pricing
Once upon a time, insurance agents were like local doctors – they knew individuals and communities inside-out. That meant they were aware of the risks in selling a policy to the town drunk.
To match that level of knowledge in the age of decentralization and the Internet, the insurance industry is turning to big data. Insurance data scientists are now combining analytical applications – e.g., behavioral models based on customer profile data – with a continuous stream of real-time data – e.g., satellite data, weather reports, vehicle sensors – to create detailed and personalized assessments of risk.
This has an impact on the company’s bottom line. As Sarah Adams points out:
“Premiums can be better correlated to risks, something particularly pertinent now given the impending arrival of Solvency II. If risk-based capital can be calculated more accurately, this influences the minimum amount of capital that needs to be held.”
What’s more, Josefowicz notes, “there is more opportunity to pick risks you want and spend less time throwing out risks you don’t want.”
Picture a world in which wireless “telematics” devices transmit real-time driving data back to an insurance company. Now picture a bunch of auto insurers drooling over their desks.
Telematics-based insurance products have been around since 1998, when Progressive first launched them. But technology has come a long way in the intervening years. Telematics devices currently include embedded navigation systems (e.g., GM’s OnStar), on-board diagnostics (e.g., Progressive’s Snapshot) and smart phones.
These can be used to create personalized plans. In an SAS white paper, Telematics: How Big Data is Transforming the Auto Insurance Industry, the authors highlight two of these options:
- PAYD: Pay-As-You-Drive
- PHYD: Pay-How-You-Drive
PAYD is pretty straightforward. It charges customers based on the number of miles or kilometers driven. Hollard Insurance, a South African insurer, has six mileage options.
But PAYD does not take into account driving habits. PHYD plans use telematics to monitor a wide variety of factors – speed, acceleration, cornering, braking, lane changing, fuel consumption – as well as geo-location, date and time. If an accident occurs, the insurance company has the ability to recreate the situation.
Auto insurers can then provide customers with driving scores, ideas for improvement and individual pricing. By 2020, the SAS Institute is predicting that over 25 percent of U.S. auto insurance premium revenue will be generated via telematics.
In a move similar to auto, property insurance companies are assessing how they can use telematics to create usage-based home insurance. These data sources can include:
- Moisture sensors that detect flooding or leaks
- Utility and appliance usage records
- Security cameras
- Sensors that track occupancy
Combine this with information from outside sources (e.g., local crime reports and traffic) and you can arrive at a multi-faceted, comprehensive assessment of one person’s property claim risk.
Going a step further, these sources can be used to protect a customer. For example, with predictive analytics, insurers can calculate the likelihood of an event such as theft or a hurricane and take steps to avoid pain and suffering – as well as, of course, big claims.
Life and Health Insurance
We live in a monitored world. Life and health insurance companies know this more than anybody. To create profiles of customer health and develop individual “well-being” scores, insurers are now casting the information net very wide indeed. They can collect:
- Transactional data – e.g., where and what (junk food?) customers buy
- Body sensors – i.e., devices that monitor consumption or alert the wearer to early signs of illness
- Exterior monitors – e.g., data from workout machines
- Social media – e.g., tweets about one’s personal health or state of mind
Health insurers are particularly interested in what hospital data sets have to tell them.
For more details on big-data applications in this area, see our related profile of the Health Care Industry.
360-Degree Customer Profiles
Like every other industry, insurance aims to improve customer satisfaction, and it is employing big data to accomplish that. The more an insurer knows about its customers’ quirks, the theory goes, the easier it is to keep them happy – and paying premiums.
Companies are combining all their direct customer connections – e.g., email, call center, adjuster reports, etc. – with indirect sources – e.g., social media, blog comments, website and clickstream data – to create a 360-degree profile of each individual.
At Metlife, they call it “The Wall”:
- Working with MongoDB, Metlife has created a customer service application that gives employees a consolidated view of each customer.
- The Wall uses data from 70+ existing systems and includes relevant points like policy details and transactions across lines of business.
With a 360-degree profile in hand, insurers have the means to refine their approach to sales, marketing and existing customer service.
Call Center Optimization
A call center is a seething cauldron of data. For insurance data scientists, it’s also a golden opportunity. These folks are investigating ways to:
- Combine claims data with telecom data from CDRs to analyze call center activities and refine training guidelines.
- Analyze raw telecom data, model temporal call patterns, and create a plan for staffing optimization.
- Use sentiment analysis – e.g., speech analytics on call center conversations or Natural Language Processing (NLP) and text analytics on social media – to improve customer service.
Call-center employees are also in an ideal situation to sell customers additional products. One use of a 360-degree profile is to give that friendly voice on the phone the means to offer you the most relevant product for your particular needs.
Fraud costs the insurance industry millions, if not billions, each year. In response, insurers are marshaling their data resources and creating a multi-channel approach to fraud detection. They’re taking a very close look at both traditional structured data (such as claims and policy data), and textual data (such as adjuster notes, police reports and social media).
- Text analytics
- Predictive analytics
- Behavioral analytics
- Pattern, graph and link analysis techniques
… not to mention a host of other handy tools, data scientists are cracking down on suspicious claims.
Data Risks and Regulations
The Challenges Ahead
Insurance companies still have a few hurdles to cross before they can become fully data-driven. Some of those hurdles are already apparent to the industry. They include a:
- Lack of rich transactional data (e.g., credit card transactions)
- Low consistency data, the result of siloed data capture and management
- Lack of ready cash to invest in IT
Midsize and small insurance companies, in particular, are finding it costly to incorporate big-data analytics in their financial and risk management strategy.
Smaller companies have another problem. They lack the customer base of the big guns. This gives them a more limited view of both their clients and the market as a whole.
Big companies have their own issues. Many are currently dealing with creaky IT infrastructures that are not equipped to handle the volume, velocity or variety of data that are streaming through their doors. Oliver Ralph summarizes data from a survey by broker Willis Towers Watson where 74% of respondents believed that the insurance industry failed to show leadership in technological innovation.
Big data can be used to solve many problems, but only if you have employees who are trained to ask the right questions.
And many insurance companies don’t. The insurance industry is replete with statistical ability. It’s only a matter of time before the supply of analytics skills catches up to the demand.
But perhaps the most complicated issue centers on a customer’s right to privacy. The Finance Industry in general is subject to a host of federal and state regulations that were enacted to protect consumer privacy and avoid discriminatory practices. These have been joined by a series of stringent rules on data collection – all of which an insurance legal department must be aware of.
Just as importantly, insurance companies need to think about how they treat customer information. It’s all very well to imagine a world run by telematics, but many consumers are rightly afraid of ceding their personal data to a private company. Even the lure of more affordable premiums may not be enough to change their mind.
Insurance data scientists also have to be very careful they’re not mistakenly assuming the role of Big Brother – whether benevolent or not. Despite the hype, not even big data can tell you everything about a person.
As an example, I’ll leave you with the cautionary tale of Quebec’s Natalie Blanchard.
In 2009, Blanchard went on disability leave due to a case of severe depression. One day, she went to the bank and discovered her health insurance benefits had been terminated.
The reason? Her insurance company, while trawling for data, had captured smiling photos on her Facebook page and decided she wasn’t depressed enough to be disabled.
History of Data Analysis and Insurance
“Most problems have either many answers or no answer. Only a few problems have a single answer.” – Edmund C. Berkeley, Right Answers – A Short Guide for Obtaining Them (September 1969)
Insurance has always been a numbers game. What are the odds of a ship sinking? Of the head of the household dying prematurely? Of a wooden house burning down? Since the third millennium B.C., humans have been trying to protect themselves from the risks of living.
Keeping track of those risks means knowing the numbers – the data. Increasingly sophisticated techniques were added over time to better calculate the odds. Three and a half centuries ago, “knowing the numbers” was maturing into the mathematics of risk – actuarial science – one of the foundations of modern data analysis.
The Birth of Actuarial Science
In the late 17th century, demand for long-term insurance (e.g., burial, life and annuities) was becoming hard to ignore.
Insurance companies were happy to offer citizens these products, but they were faced with a variety of statistical conundrums in understanding their data:
- What was the likelihood of an insurance-holder dying within a certain time frame?
- How should insurers price their products?
- What percentage of premiums should they set aside to pay for future benefits (e.g., annuities)?
- How much could they afford to invest elsewhere? What would the rate of interest be?
Graunt’s Table and Halley’s Annuities
Fortunately, mathematics had reached a point where it was ready to provide the answers. In 1662, John Graunt, a London haberdasher, conducted a study of mortality rolls in the city.
In his analysis, he found predictable patterns of longevity and death rates in groups of people of the same age. This gave him the means to calculate the probabilities of survival. His work formed the nucleus of the first “life table.”
Thirty years later, in 1693, Edmond Halley took a break from calculating the orbits of comets and descending to the bottom of the Thames in a diving bell to publish an article on life annuities.
Using extremely accurate demographic data from Breslau, a city in Silesia, Halley produced a life table of the population, organized by age and survival. From this, he was able to calculate the premium amount that any man or woman, at each year of age, should pay in order to purchase a life annuity. From this time on, actuarial data multiplied.
The Father of the Computer and His Descendants
Over the next few centuries, to accompany the data, actuarial science grew both in popularity and in the complexity of its calculations. It’s no surprise that Charles Babbage, father of the computer, found time to dabble in it.
During the 1820s, he created actuarial tables from Equitable Society mortality data and published a handy guide to the life insurance industry titled A Comparative View of the Various Institutions for the Assurance of Lives.
But it was the adoption of punch-card tabulating machines and, subsequently, early computer technology, that the insurance industry began the march towards data dominance.
During the late 1930s, Edmund Berkeley of the Prudential Insurance Company began to investigate the potential of shifting work to calculating machines, and, later, computers.
The Post-War Push
Regarded by his colleagues as equal parts nut and genius, Berkeley was a pioneer in computing and data processing. In 1947, he prodded Prudential to purchase one of the first UNIVAC computers from the Eckert-Mauchly Computer Corporation.
Computers were arriving at just the right time. Boys were coming back from the front and setting up house. The birth rate was booming. Due to wage controls, employer-sponsored health insurance plans surged in popularity.
As Joanne Yates points out, in the years between 1948 and 1953:
- The number of insurance policies in force rose over 24%
- Total employment in the life insurance industry grew almost 14%
Large insurance firms moved fairly quickly. In The Digital Hand : Volume II: How Computers Changed the Work of American Financial, Telecommunications, Media, and Entertainment Industries, James Cortada notes that by the end of 1955, there were over 20 mainframe systems installed in the industry.
The next big shift came in the late 1960s and 1970s. More powerful machines and better software were coming into play. Online systems allowed workers to share information freely and conduct inquiries in real time. Investment in technology increased steadily.
By the 1980s, the insurance industry was on top of IT trends.
The Industry Goes Ballistic
The arrival of the Internet in the 1990s spurred insurance data science to grow even faster.
- Individuals were able to bypass intermediaries and shop for coverage on their own terms.
- Company and consumer websites sprang up to satisfy demand.
- Banks seized the opportunity to expand into the industry.
As a consequence, the amount of customer data being gathered and exchanged exploded.
At the same time, the costs of data processing and storage were dropping rapidly. In lieu of the mass modeling of the past, insurers were gaining the capabilities (and the technical tools) to calculate risk on an individual level. The era of big data was just around the corner.