Data Science in the Government

November 4, 2013

Opportunities in Government Data Science

The Promise of Big Data

Mobile devices and smart sensors, cloud-based storage, social media and Internet traffic – all these developments and more are creating new opportunities in big data analytics.

Government data scientists are nonetheless needed to:

Prevent waste, fraud and abuse
Combat cyber-attacks and safeguard sensitive information
Use business intelligence to make better financial decisions
Improve defense systems and protect soldiers on the ground

To do this, they sort through a mix of information management, storage and security systems. They collect, process and analyze a large amount of data from a variety of sources.

Case Western Reserve University

info

CWRU Data Analytics Boot Camp

CWRU Data Analytics Boot Camp is a rigorous, part-time program that prepares students with the fundamental skills for data analytics and visualization. Through hands-on, in-person instruction, you’ll cover a wide range of topics and graduate ready to apply your skills in the workforce.

Columbia University

info

Columbia Engineering Data Analytics Boot Camp

Are you ready to become a data-driven professional? Columbia Engineering Data Analytics Boot Camp is a challenging, part-time bootcamp that equips learners with the specialized skills for data analytics and visualization through hands-on, in-person classes.

University of California, Berkeley

info

Berkeley Data Analytics Boot Camp

Turn data into actionable insights. Berkeley Data Analytics Boot Camp is a dynamic, part-time program that covers the in-demand tools and technologies for data analytics and visualization through rigorous, project-based classes.

University of Texas at Austin

info

The Data Analysis & Visualization Boot Camp at Texas McCombs

The Data Analysis and Visualization Boot Camp at Texas McCombs puts the student experience first, teaching the knowledge and skills to conduct data analysis on a wide array of real-world problems. Students dive into a comprehensive curriculum, learning how to collect, analyze, and visualize big data.

University of Southern California

info

USC Viterbi Data Analytics Boot Camp

Expand your skill set and grow as a data analyst. This program covers the specialized skills to be successful in the field of data in 24 weeks.

info SPONSORED

History of Data Science and Government

“I think the government is awakening to the idea that data science can provide models that have great utility for a variety of missions.” – Robert Hummel, from 3rd Annual Government Big Data forum

Governments have forever been poking their noses into the lives of their citizens:

In A.D. 2, China’s Han Dynasty proudly recorded a population of 59.6 million.
After overrunning the Anglo-Saxons, William the Conqueror commissioned a comprehensive look at his new territory in the Domesday Book (A.D. 1086).

But until the 19th century, collecting and recording government data was manual and extremely labor-intensive. It would take a minor revolution to shake things up.

The Father of Automatic Computation

In the late 1880s, Herman Hollerith submitted his Ph.D. at New York’s Columbia University, An Electric Tabulating System..

Hollerith’s invention – an electromechanical tabulating machine that employed electrical circuits to count and sort punch cards – was used to complete the 1890 U.S. census. A task that had been predicted to take more than ten years took less than three. The U.S. government saved $5 million.

Hollerith capitalized on his success by forming the Tabulating Machine Company, one of four companies that merged and eventually became International Business Machines Corporation (IBM).

During the Great Depression, IBM would be contracted by the U.S. government to keep employment records on 26 million working Americans and 3 million employers

Victory or Death

When World War II arrived, the Western powers threw their might behind data intelligence projects:

1942: Cryptography requires big data calculations. To help break Nazi spy codes, engineers at Britain’s famous Bletchley Park invented a series of mass data-processing machines. The first programmable electronic computer – Colossus – was able to read paper tape at 5,000 characters per second.
1943: Work on the top-secret Project PX began at the University of Pennsylvania. The Electronic Numerical Integrator And Computer (ENIAC) – a thirty-ton behemoth that covered 1,800 square feet – was originally created to compute artillery ballistic tables.
1945-1946: In 1945, mathematicians working at Los Alamos used ENIAC to run calculations for the hydrogen bomb.

War work had significant post-war applications. By the end of March 1950, ENIAC was able to generate the first models to forecast the weather.

The Russians Are Coming!

During the Cold War, the government desperately wanted to know what the Soviets were up to. And to do that, it needed data.

Formed in 1952, the National Security Agency (NSA) was a successor to the Armed Forces Security Agency. Its mission was to monitor, collect, decode and analyze foreign intelligence and counterintelligence data.

Giant supercomputers like the IBM 7950 (Harvest), a customized version of IBM’s Stretch, were installed to handle the flood.

The Privacy Act of 1974

SAS, one of the world’s large advanced analytics companies, began its life as a university research project for the U.S. Department of Agriculture. Its purpose was to analyze crop data and make recommendations to increase output.

But researchers were tired of rounding up information from scattered sources. From 1965-1966, the federal government began looking at the possibility of a national data center that could centrally store information collected by various statistical agencies.

The outcry from citizens was loud and long. There were shouts of Orwell’s 1984. Congress stepped in, holding a series of sessions to discuss the effect of computerized databases on individual’s privacy rights. Bills were presented. On New Year’s Eve, President Ford signed the Privacy Act of 1974 into law.

One Second vs. 30,000 Years

As the days of free love gave way to the wrath of punk, the government continued to gobble up data for research and defense. It was helped along in its work by the development of a new channel of communication.

The Internet, which had its roots in a Defense Advanced Research Projects Agency (DARPA) project called ARPANET, quickly became entrenched in universities and government research institutions.

In 1985, the U.S. National Science Foundation established the National Science Foundation Network (NSFNET), a hub that connected five supercomputing centers to the National Center for Atmospheric Research.

Then Tim Berners-Lee came up with the idea of setting information free in a World Wide Web. This democratic concept quickly took hold in the collective imagination. By the mid 1990s, there were hundreds, thousands, millions of new data streams crisscrossing the globe.

Which, of course, the government wanted to track:

“We are developing a supercomputer that will do more calculating in a second than a person with a hand-held calculator can do in 30,000 years,” President Clinton boasted in 1996.

The Day the Towers Fell

In the wake of 9/11, the Defense Department – which had already been experimenting with large-scale data mining and analysis – stepped up its efforts.

In 2002, DARPA started to develop the Total Information Awareness System. This project would use biometrics, language processing, predictive modeling and database technologies to analyze government data sets. It would examine everything from communications to medical and travel records to identify suspicious individuals. Though the project was shuttered in 2003, many aspects of it migrated to other agencies.

The government was also quickly realizing that counterterrorism agencies needed – as a preschool teacher might put it politely – to learn how to share. In 2004, the 9/11 Commission suggested a “network-based information sharing system that transcends traditional government boundaries.”

Data for All

Agencies were aided in these efforts by an escalating interest in the potential of big data applications – from e-commerce to search engines to science. The lines between commerce, research and government became increasingly blurred.

2004: The CIA’s not-for-profit venture arm, In-Q-Tel, produced a new company called Palantir. Building on technology developed at PayPal to detect fraudulent activity, Palantir provided the Pentagon and CIA with sophisticated terrorism analysis software.
2009: As part of his Open Government Initiative, President Obama launched data.gov, increasing public and corporate access to thousands of data sets generated by the federal government.
2012: U.S. Secretary of State Hillary Clinton announced Data2X, a public-private partnership to collect economic, political and social status statistics on females around the world.

Taking the Initiative

The government made this clear in March 2012 when it announced the $200 million Big Data Research and Development Initiative. This wake-up call detailed the need for each agency to have a big data strategy and improve their analytic tools and techniques.

The Department of Defense began looking at autonomous systems that could learn from experience, maneuver and make decisions on their own.
DARPA started the XDATA program, which aims to develop computational techniques and software tools for processing and visualizing imperfect and incomplete data in order to achieve greater battlefield awareness for many types of personnel, whether in planning or on missions.
The Department of Energy established the Scalable Data Management, Analysis, and Visualization (SDAV) Institute, an effort to unite the expertise of six national laboratories and seven universities on the department’s supercomputers.

Show Me the Money

Though $200 million may seem like small change in government circles, it can yield big results. According to McKinsey, the government could gain as much as a trillion dollars with the use of data analytics.

Open Source

But you don’t have to be IBM to get a glimpse of government data. As we’ve seen, Obama’s administration has been particularly interested in encouraging data research. As part of the first U.S. National Action Plan (September 2011), the government made over 390,000 agency data sets available for public consumption.

Healthy Data, Healthy Body

Breakthroughs in the delivery of care create large amounts of data. Recent health care reforms and ongoing changes to regulations may have further increased the workload. Thanks to HealthData.gov, there is a plethora of data available to the public for use in research, policy making, and business.

Through a Prism Darkly

One particularly controversial 21st century federal data initiative concerned the National Security Agency:

In May 2013, the Guardian and the Washington Post broke a story on a top-secret program called Prism. According to information contained in an NSA PowerPoint presentation supplied by whistleblower Edward Snowden, the NSA has direct access to data in the systems of Google, Apple, Microsoft, Skype and other giant Internet companies.

Examples of these data include:

Email
Video and voice chat
File transfers
Voice-over-IP
Videos
Photos
Social networking details

And more.

Prism allowed for extensive, in-depth monitoring of live communications and stored information, including data from overseas. Privacy advocates described it as a big step toward a police state.

In response, the U.S. government denied the charge that Prism could be used on domestic targets without a warrant. Officials also noted that it received independent oversight from all three branches of the federal government.

Revelations about other surveillance and big data security projects – Boundless Informant, Bullrun, the British black-ops surveillance program Tempora – continued throughout the summer and fall of 2013.

Data Risks and Regulations

The End of Privacy?

Which brings us back to President Ford’s 1974 Privacy Act. For although the law asserts that agencies must follow certain principles – “fair information practices” – when gathering and handling personal data, it also allows law enforcement agencies to excuse themselves from the Act.

What’s more, the 2001 U.S.A. Patriot Act significantly increased the government’s powers of surveillance and investigation. Sweeping amendments were made to the:

Wiretap Statute
Electronic Communications Privacy Act
Computer Fraud and Abuse Act
Foreign Intelligence Surveillance Act
Family Education Rights and Privacy Act
Pen Register and Trap and Trace Statute
Money Laundering Act
Immigration and Nationality Act
Money Laundering Control Act
Bank Secrecy Act
Right to Financial Privacy Act
Fair Credit Reporting Act

These amendments included changes to voice mail communications, secret searches, surveillance orders, search warrants and a host of other law enforcement tools.

Amidst the debate about the NSA, the federal government is also responsible for legislation such as the:

Many of these laws are explicitly concerned with protecting the privacy of U.S. citizens and preventing businesses from misusing personal information. Industries including retail and manufacturing are already testing the limits on the uses of individual data for predictive and behavioral analytics.

As data gets bigger and boundaries grow blurrier, arguments in Congress may become much louder.

Last updated: June 2020

Data Science in the Government

Opportunities in Government Data Science

Sponsored Schools

Case Western Reserve University

CWRU Data Analytics Boot Camp

Columbia University

Columbia Engineering Data Analytics Boot Camp

University of California, Berkeley

Berkeley Data Analytics Boot Camp

University of Texas at Austin

The Data Analysis & Visualization Boot Camp at Texas McCombs

University of Southern California

USC Viterbi Data Analytics Boot Camp

History of Data Science and Government

The Father of Automatic Computation

Victory or Death

The Russians Are Coming!

The Privacy Act of 1974

One Second vs. 30,000 Years

The Day the Towers Fell

Data for All

Taking the Initiative

Show Me the Money

Open Source

Healthy Data, Healthy Body

Through a Prism Darkly

Data Risks and Regulations

The End of Privacy?