Data Science in Finance

The Promise of Big Data

Social media activity, mobile interactions, server logs, real-time market feeds, customer service records, transaction details, information from existing databases – there’s a lot of data to explore.

Sponsored Schools

Case Western Reserve University


CWRU Data Analytics Boot Camp

CWRU Data Analytics Boot Camp is a rigorous, part-time program that prepares students with the fundamental skills for data analytics and visualization. Through hands-on, in-person instruction, you’ll cover a wide range of topics and graduate ready to apply your skills in the workforce.

Columbia University


Columbia Engineering Data Analytics Boot Camp

Are you ready to become a data-driven professional? Columbia Engineering Data Analytics Boot Camp is a challenging, part-time bootcamp that equips learners with the specialized skills for data analytics and visualization through hands-on, in-person classes.

University of California, Berkeley


Berkeley Data Analytics Boot Camp

Turn data into actionable insights. Berkeley Data Analytics Boot Camp is a dynamic, part-time program that covers the in-demand tools and technologies for data analytics and visualization through rigorous, project-based classes.

University of Texas at Austin


The Data Analysis & Visualization Boot Camp at Texas McCombs

The Data Analysis and Visualization Boot Camp at Texas McCombs puts the student experience first, teaching the knowledge and skills to conduct data analysis on a wide array of real-world problems. Students dive into a comprehensive curriculum, learning how to collect, analyze, and visualize big data.

University of Southern California


USC Viterbi Data Analytics Boot Camp

Expand your skill set and grow as a data analyst. This program covers the specialized skills to be successful in the field of data in 24 weeks.


To make sense of these giant data sets, companies employ data scientists for answers. These professionals are:

  • Capturing and analyzing new sources of data, building predictive models and running live simulations of market events
  • Using technologies such as Hadoop, NoSQL and Storm to tap into non-traditional data sets (e.g., geolocation, sentiment data) and integrate them with more traditional numbers (e.g., trade data)
  • Finding and storing increasingly diverse data in its raw form for future analysis

They’ve been aided in this quest by the development of cloud-based data storage and sophisticated analytics tools.

Sentiment Analysis

Sentiment analysis (aka opinion mining) applies natural-language processing, text analysis and computational linguistics to source material to discover what folks really think.

Businesses like Think Big Analytics and MarketPsych Data are using it to:

  • Build algorithms around market sentiment data (e.g., Twitter feeds) that can short the market when disasters (e.g., storms, terrorist attacks) occur
  • Track trends, monitor the launch of new products, respond to issues and improve overall brand perception
  • Analyze unstructured voice recordings from call centers and recommend ways to reduce customer churn, up-sell and cross-sell products and detect fraud

Some data companies are even acting as intermediaries, collecting and selling sentiment indicators to retail investors.

Real-Time Analytics

In days of yore, financial institutions were hampered by the lag-time between data collection and data analysis. Real-time analytics short-circuits this problem and provides the industry with new ways to:

  • Fight Financial Fraud: Banks and credit card companies routinely analyze account balances, spending patterns, credit history, employment details, location and a load of other data points to determine whether transactions are above aboard. If suspicious activity is detected, they can immediately suspend the account and alert the owner.
  • Improve Credit Ratings: A continuous feed of online data allows real-time credit ratings. This provides lenders with a more accurate picture of a customer’s assets, business operations and transaction history.
  • Provide More Accurate Pricing: Progressive Insurance already tailors its policies to account for a customer’s changing financial situation. In the Internet of Things, data from automobile sensors will also help insurance companies issue its policyholders with warnings about accidents, traffic jams and weather conditions. That makes for safer drivers and fewer payouts.

The Billion Prices Project is an example of this phenomenon in action. Frustrated with the lag time on the U.S. Bureau of Labor Statistics’s consumer price index (CPI), MIT’s Alberto Cavallo and Roberto Rigobon turned to information from the web.

Every day, their software collected half a million prices of products sold in the U.S. and analyzed the results. In 2008, just after Lehman Brothers filed for bankruptcy, their tool was able to detect a deflationary swing in prices far earlier than the official CPI report did.

Today, banks and other major financial institutions use PriceStats – the project’s commercial spinoff – to analyze inflation trends around the world.

Customer Segmentation

Banks purchase data from retailers and service providers in an effort to create a 360-degree view of their customers.

This kind of customer segmentation allows them to:

  • Offer customized product offerings and services
  • Improve existing profitable relationships and avoid customer churn
  • Create better marketing campaigns and more attractive product offerings
  • Tailor product development to specific customer segments

Predictive Analytics

By combining segmentation with predictive analytics, companies can also cut down on risk. For example, to decide whether certain customers are likely to pay off their credit cards, some major banks use technology.

Similar strides have been made in forecasting market behavior. Once upon a time (e.g., 2009), high-frequency trading (HFT) – the speedy exchange of securities – was hugely lucrative. With competition came a drop in profits and the need for a new strategy.

HFT traders adapted by employing strategic sequential trading, using big data analytics to identify specific market participants and anticipate their future actions. In a field of breakneck speed, this gives HFT traders an unmistakable advantage.

Predictive analytics can also be used to issue early warnings on the market. Machine learning algorithms are being used to predict stock market crashes based simply on pricing information.

By studying search volume data provided by Google Trends, they were able to identify online precursors for stock market moves. Their results suggest that increases in search volume for financially relevant search terms usually precede big losses in financial markets.

Data Risks and Regulations

The Importance of Data Humility

One pitfall for data scientists is overestimating what the data can tell them. It is important to combine the usage of data analytics tools to get the best picture of a situation possible.

In fact, the market has already discovered that trusting faulty algorithms can lead to disastrous results:

And then, of course, there’s a little thing called privacy…

Fair Credit Reporting Act (FCRA)

Financial institutions are subject to some of the U.S. government’s privacy laws and regulations, including consumer rights.

During the 1960s, the Retail Credit Company made a move to computerize its records. In response to consumer concern about the availability of information, the U.S. Congress held a series of hearings in 1970.

The result was the Fair Credit Reporting Act (FCRA), which set forth legal standards governing the collection, use, and communication of credit and other information about consumers. This includes information about a consumer’s credit worthiness, credit standing, credit capacity, character, general reputation, personal characteristics, or mode of living, that is to be used for these purposes.

The act applies to financial institutions and any business or individual who uses a consumer report for a business purpose.

In addition to the FCRA, the Gramm-Leach-Bliley Act (GLBA) contains restrictions for disclosure of nonpublic personal information to nonaffiliated third parties. All financial institutions are required to provide consumers with a notice and opt-out opportunity.

Equal Credit Opportunity Act (ECOA)

At the moment, financial institutions are at liberty to use predictive and behavioral analytics. Provided, of course, that they’re not breaking the law.

In 1974, the government passed the Equal Credit Opportunity Act (ECOA). The ECOA makes it unlawful for any creditor – including banks, retailers, bankcard companies, finance companies and credit unions – to discriminate against any applicant with respect to any aspect of a credit transaction:

  1. On the basis of race, color, religion, national origin, sex or marital status, or age (provided the applicant has the capacity to contract)
  2. Because all or part of the applicant’s income derives from any public assistance program
  3. Because the applicant has in good faith exercised any right under the Consumer Credit Protection Act

Predictive models that unintentionally discriminate against applicants run the risk of running afoul of the long arm of the law.

Keeping Data Safe

In addition to using consumer data ethically, financial institutions are legally bound to store and protect it from theft.

Examples of the Fed’s safety rules and regulations include:

  • Bank Secrecy Act (BSA): This requires all U.S. financial institutions to keep records of cash purchases of negotiable instruments, file reports of cash transactions exceeding $10,000 (daily aggregate amount), and to report suspicious activity that might signify money laundering, tax evasion and other criminal activity.
  • Fair and Accurate Credit Transactions Act (FACTA): FACTA contains several provisions that require financial institutions, creditors, and other businesses that rely on consumer reports to detect and resolve fraud by identity theft.
  • FACTA Disposal Rule: The child of FACTA also states that any business or individual who uses a consumer report for a business purpose must properly dispose of the information in the consumer reports and records to protect against “unauthorized access to or use of the information.”
  • Payment Application Data Security Standards (PA DSS): These data security standards apply to software vendors and others who develop applications that store, process, or transmit cardholder data as part of authorization or settlement, where these payment applications are sold, distributed or licensed to third parties.
  • Payment Card Industry Standard (PCI DSS): PCI DSS provides a baseline of technical and operational requirements designed to protect cardholder data and requisites for compliance reporting and business certification for processors of cardholder data.

As storage moves to the cloud and data access approaches the speed of light, financial institutions must be careful to keep their sensitive information very safe indeed.

History of Data Analysis and Finance

“Money management has been a profession involving a lot of fakery — people saying they can beat the market and they really can’t.” – Robert Shiller, Robert Shiller: A Skeptic and Nobel Winner

On the morning of March 22, 1899, in a rented office on the fifth floor of the Gould Building in Atlanta, a brand new company opened for business. Seated at their desks were two brothers: Cator and Guy Woolford. Printed on the door, in fresh black ink, was the sign, “Retail Credit Company.”

Now known as Equifax, the Woolfords’ venture marked a turning point in the history of finance. Data intelligence, the Woolfords realized, could be profitable.

The Rise of Credit Reports

It began as the “Merchant’s Guide” – a $15 hard-covered book containing a list of customers and information on their credit worthiness. This enabled merchants and retailers to decide who should be entrusted with personal charge accounts.

In 1901, they were saved from disaster by a request from a cashier from the Home Life of New York company. Could the Woolfords please supply information on three local applicants for life insurance?

To provide accurate credit and insurance reports, Equifax began to:

  • Collect data on the health, habits and morals of U.S. citizens
  • Examine employment records and investigate financial decisions
  • Accrue statistics on childhood, marriage, education and politics

Nor was it alone in this endeavor. In 1969, TransUnion acquired 3.6 million card files stored in 400 cabinets – the valuable assets of the Credit Bureau of Cook County (CBCC)

By the early 1970s, it had replaced this manual mess with automated tape-to-disc transfer.

By the 1980s, it was part of one of the largest conglomerates in the country.

A Revolutionary Concept: Credit Scores


Under the balmy skies of San Rafael in 1956, two alumni of the Stanford Research Institute (SRI) were setting up shop in a studio apartment on Lincoln Avenue.

Bill Fair was an engineer; Earl Isaac was a mathematician. Both were aware of the power of computers through their research for the Defense Department. Both were enthralled with the potential of applying data analytics to solve business problems.

In 1958, Fair, Isaac and Company (FICO) sent a letter to fifty of the largest U.S. credit grantors offering to demonstrate a new tool: credit scoring.

With this simple score in hand, major lenders could instantly determine an applicant’s credit risk.

Just one company responded to their letter.

Making It Work

Economists were excited by the potential of applying large-scale data analytics to the financial market.

All of this information needed to be organized and put to good use:

  • 1960s: Conrad Hilton installed an IBM computer system for Carte Blanche that performed a daily check on the state of accounts and send reminders to delinquent cardholders.
  • 1972: Isaac’s software for the Automated Strategic Applications Processing (ASAP) system debuted at Well Fargo. Built on analytics models, this was the first automated loan application-processing system in the country.
  • 1975: Fair Isaac developed the first behavior scoring system to predict the credit risk of existing customers.

To Market, To Market

Economists were excited by the potential of applying large-scale data analytics to the financial market

Take 1973. Not many noticed when the Journal of Political Economy published a paper by Fischer Black and Myron Scholes entitled, “The Pricing of Options and Corporate Liabilities“. Nor did many care to read their descriptions of stochastic partial differential equations.

Yet the creation of the Black-Scholes Model (as it would come to be known) was a key event in data science. Thanks to Black and Scholes, along with the subsequent work of Robert Merton, this model allowed traders to estimate the optimal price for stock options over time. It sliced risk off the buying and selling of underlying assets, prompted a boom in options trading and netted Merton and Scholes a Nobel Prize in Economics.

Harvard Meets Yale

During the 1980s, a Harvard graduate and Vietnam vet named Karl Case was absorbed in an economics project. To study the ebb and flow of home pricing trends, Case had accrued several years of data on Boston house sales and was developing a rudimentary index to compare repeat sales of the same homes.

In 1985, Case met Robert Shiller, a Yale economist interested in behavioral aspects of economic bubbles. Working together, Case and Shiller added housing data from other cities and refined Case’s work into the Case-Shiller index – a tool that could track the relative changes in the price of real estate over time.

In 1989, they produced the first empirical paper on housing bubbles. Analysis of big data, they demonstrated, could be used for the greater good. Shiller went on to predict the stock market bubble of 2000 and forecast early warnings about the Great Recession. In 2013, he too won the Nobel Prize in Economics.

The World Goes Online

Then things got really fast.

When the world came online in the late 20th century, a new economy sprang up overnight. The exchange of financial information increased exponentially. E-commerce companies grew like weeds. Investors heard the siren call of Silicon Valley. In 1999, there were 457 IPOs, most of which were technology-related.

  • 1995: Security First Network Bank, the first Internet bank in the world, was born.
  • 1998: PayPal launched its service for transferring payments through the Internet.
  • 2000: The bubble reached its limit on March 10. The NASDAQ peaked at 5408.60 in intraday trading and closed at 5048.62.

The Internet also changed how the financial industry conducted business. In the first decade of the 21st century:

  • Investors from every corner of the planet could watch the leaps and plunges of the market unfold in real-time.
  • Thanks to the widespread availability of market data, financial education tools and expert commentary, all users had the ability to educate themselves about the industry.
  • Bank accounts, brokerages, investment management, insurance, credit cards, securities, futures – all these and more made a steady migration to online settings.
  • Social media began to supply companies with an unfiltered view of consumer opinion.
  • With the arrival of mobile devices, finance took to the streets, providing 24/7 access for every participant.

Last updated: January 2021