Data Science in Insurance

January 29, 2014

Opportunities in Insurance Data Science

The Promise of Big Data
Where it was once difficult to gather data about potential risks, today’s insurers have lots of data to work with.

As Matt Josefowicz noted at an insurance leadership forum, the traditional underwriting process “was designed for a world of information scarcity and is trying to adapt now to information super-abundance.”

On any given day, insurance data scientists may gather data from:

Telematics devices
Smart phones
Social media
CCTV footage
Electoral rolls
Credit reports
Website analytics
Government statistics
Satellite data

What’s more, the advent of cloud computing helps companies to aggregate and store it all.

These sources tell insurers far more than did historical data from policy administration systems, claims management applications and billing systems, and the mortality reports of yesteryear. Through a judicious analysis of big data, insurers improve their pricing accuracy, create customized products and services, forge stronger customer relationships and facilitate more effective loss prevention.

Data Risks and Regulations

The Challenges Ahead

Insurance companies still have a few hurdles to cross before they can become fully data-driven. Some of those hurdles are already apparent to the industry. They include:

The siloed nature of data collected makes it challenging to synthesize data
Unstructured data
Outdated fraud detection technology that cannot keep pace with today’s level and type of fraud

Elderly Infrastructures

Big companies have their own issues. Some deal with creaky IT infrastructures that are not equipped to handle the volume, velocity or variety of data that are streaming through their doors.

Skill Shortage

Big data can be used to solve many problems, but only if you have employees who are trained to ask the right questions.

And many insurance companies don’t. The insurance industry is replete with statistical ability. It’s only a matter of time before the supply of analytics skills catches up to the demand.

Customer Privacy

But perhaps the most complicated issue centers on a customer’s right to privacy. The Finance Industry in general is subject to a host of federal and state regulations that were enacted to protect consumer privacy and avoid discriminatory practices. These have been joined by a series of stringent rules on data collection – all of which an insurance legal department must be aware of.

Just as importantly, insurance companies may need to think about how they treat customer information. It’s all very well to imagine a world run by telematics, but many consumers are rightly afraid of ceding their personal data to a private company. Even the lure of more affordable premiums may not be enough to change their mind.

Insurance data scientists also have to be very careful they’re not mistakenly assuming the role of Big Brother – whether benevolent or not. Despite the hype, not even big data can tell you everything about a person.

As an example, I’ll leave you with the cautionary tale of Quebec’s Natalie Blanchard.

In 2009, Blanchard went on disability leave due to a case of severe depression. One day, she went to the bank and discovered her health insurance benefits had been terminated.

The reason? Her insurance company, while trawling for data, had captured smiling photos on her Facebook page and decided she wasn’t depressed enough to be disabled.

History of Data Analysis and Insurance

“Most problems have either many answers or no answer. Only a few problems have a single answer.”

– Edmund C. Berkeley, Right Answers – A Short Guide for Obtaining Them (September 1969)

Insurance has always been a numbers game. What are the odds of a ship sinking? Of the head of the household dying prematurely? Of a wooden house burning down? Since the third millennium B.C., humans have been trying to protect themselves from the risks of living.

Keeping track of risks means knowing the numbers – the data. Increasingly sophisticated techniques were added over time to better calculate the odds. Three and a half centuries ago, “knowing the numbers” was maturing into the mathematics of risk – actuarial science – one of the foundations of modern data analysis.

The Birth of Actuarial Science

In the late 17th century, demand for long-term insurance (e.g., burial, life and annuities) was becoming hard to ignore.

Insurance companies were happy to offer citizens these products, but they were faced with a variety of statistical conundrums in understanding their data:

What was the likelihood of an insurance-holder dying within a certain time frame?
How should insurers price their products?
What percentage of premiums should they set aside to pay for future benefits (e.g., annuities)?
How much could they afford to invest elsewhere? What would the rate of interest be?

Graunt’s Table and Halley’s Annuities

Fortunately, mathematics had reached a point where it was ready to provide the answers. In 1662, John Graunt, a London haberdasher, conducted a study of mortality rolls in the city.

In his analysis, he found predictable patterns of longevity and death rates in groups of people of the same age. This gave him the means to calculate the probabilities of survival. His work formed the nucleus of the first “life table.”

Thirty years later, in 1693, Edmond Halley took a break from calculating the orbits of comets and descending to the bottom of the Thames in a diving bell to publish an article on life annuities.

Using accurate demographic data from Breslau, a city in Silesia, Halley produced a life table of the population, organized by age and survival. From this, he was able to calculate the premium amount that any man or woman, at each year of age, should pay in order to purchase a life annuity. From this time on, actuarial data multiplied.

The Father of the Computer and His Descendants

Over the next few centuries, to accompany the data, actuarial science grew both in popularity and in the complexity of its calculations. It’s no surprise that Charles Babbage, father of the computer, found time to dabble in it.

During the 1820s, he created actuarial tables from Equitable Society mortality data and published a handy guide to the life insurance industry titled A Comparative View of the Various Institutions for the Assurance of Lives.

But it was the adoption of punch-card tabulating machines and, subsequently, early computer technology, that the insurance industry began the march towards data dominance.

During the late 1930s, Edmund Berkeley of the Prudential Insurance Company began to investigate the potential of shifting work to calculating machines, and, later, computers.

The Post-War Push

Regarded by his colleagues as equal parts nut and genius, Berkeley was a pioneer in computing and data processing. In 1947, he prodded Prudential to purchase one of the first UNIVAC computers from the Eckert-Mauchly Computer Corporation.

Computers were arriving at an interesting time – as World War II was coming to a close and the Baby Boomer generation was just beginning.

As Joanne Yates points out, in the years between 1948 and 1953:

The number of insurance policies in force rose over 24%
Total employment in the life insurance industry grew almost 14%

Large insurance firms moved fairly quickly. In The Digital Hand : Volume II: How Computers Changed the Work of American Financial, Telecommunications, Media, and Entertainment Industries, James Cortada notes that by the end of 1955, there were over 20 mainframe systems installed in the industry.

Data Consolidation

The next big shift came in the late 1960s and 1970s. More powerful machines and better software were coming into play. Online systems allowed workers to share information freely and conduct inquiries in real time. Investment in technology increased steadily.

By the 1980s, the insurance industry was on top of IT trends.

The Industry Goes Ballistic

The arrival of the Internet in the 1990s helped insurance data science.

Individuals were able to bypass intermediaries and shop for coverage on their own terms.
Company and consumer websites sprang up to satisfy demand.
Banks seized the opportunity to expand into the industry.

As a consequence, the amount of customer data being gathered and exchanged exploded.

At the same time, the costs of data processing and storage were dropping rapidly. In lieu of the mass modeling of the past, insurers were gaining the capabilities (and the technical tools) to calculate risk on an individual level. The era of big data was just around the corner.

Last updated: June 2020

Data Science in Insurance

Opportunities in Insurance Data Science

Sponsored Schools

Personalized Risk Pricing

Auto Insurance

Property Insurance

Life and Health Insurance

360-Degree Customer Profiles

Call Center Optimization

Fraud Detection