Opportunities in Government Data Science
The Promise of Big Data
Mobile devices and smart sensors, cloud-based storage, social media and Internet traffic – all these developments and more are creating new opportunities in big data analytics.
Government data scientists are nonetheless needed to:
- Prevent waste, fraud and abuse
- Combat cyber-attacks and safeguard sensitive information
- Use business intelligence to make better financial decisions
- Improve defense systems and protect soldiers on the ground
To do this, they sort through a mix of information management, storage and security systems. They collect, process and analyze a large amount of data from a variety of sources.
Case Western Reserve University
University of California, Berkeley
Georgia Institute of Technology
The University of Texas at Austin
History of Data Science and Government
“I think the government is awakening to the idea that data science can provide models that have great utility for a variety of missions.” – Robert Hummel, from 3rd Annual Government Big Data forum
Governments have forever been poking their noses into the lives of their citizens:
- In A.D. 2, China’s Han Dynasty proudly recorded a population of 59.6 million.
- After overrunning the Anglo-Saxons, William the Conqueror commissioned a comprehensive look at his new territory in the Domesday Book (A.D. 1086).
But until the 19th century, collecting and recording government data was manual and extremely labor-intensive. It would take a minor revolution to shake things up.
The Father of Automatic Computation
In the late 1880s, Herman Hollerith submitted his Ph.D. at New York’s Columbia University, An Electric Tabulating System..
Hollerith’s invention – an electromechanical tabulating machine that employed electrical circuits to count and sort punch cards – was used to complete the 1890 U.S. census. A task that had been predicted to take more than ten years took less than three. The U.S. government saved $5 million.
Hollerith capitalized on his success by forming the Tabulating Machine Company, one of four companies that merged and eventually became International Business Machines Corporation (IBM).
During the Great Depression, IBM would be contracted by the U.S. government to keep employment records on 26 million working Americans and 3 million employers
Victory or Death
When World War II arrived, the Western powers threw their might behind data intelligence projects:
- 1942: Cryptography requires big data calculations. To help break Nazi spy codes, engineers at Britain’s famous Bletchley Park invented a series of mass data-processing machines. The first programmable electronic computer – Colossus – was able to read paper tape at 5,000 characters per second.
- 1943: Work on the top-secret Project PX began at the University of Pennsylvania. The Electronic Numerical Integrator And Computer (ENIAC) – a thirty-ton behemoth that covered 1,800 square feet – was originally created to compute artillery ballistic tables.
- 1945-1946: In 1945, mathematicians working at Los Alamos used ENIAC to run calculations for the hydrogen bomb.
War work had significant post-war applications. By the end of March 1950, ENIAC was able to generate the first models to forecast the weather.
The Russians Are Coming!
During the Cold War, the government desperately wanted to know what the Soviets were up to. And to do that, it needed data.
Formed in 1952, the National Security Agency (NSA) was a successor to the Armed Forces Security Agency. Its mission was to monitor, collect, decode and analyze foreign intelligence and counterintelligence data.
Giant supercomputers like the IBM 7950 (Harvest), a customized version of IBM’s Stretch, were installed to handle the flood.
The Privacy Act of 1974
SAS, one of the world’s large advanced analytics companies, began its life as a university research project for the U.S. Department of Agriculture. Its purpose was to analyze crop data and make recommendations to increase output.
But researchers were tired of rounding up information from scattered sources. From 1965-1966, the federal government began looking at the possibility of a national data center that could centrally store information collected by various statistical agencies.
The outcry from citizens was loud and long. There were shouts of Orwell’s 1984. Congress stepped in, holding a series of sessions to discuss the effect of computerized databases on individual’s privacy rights. Bills were presented. On New Year’s Eve, President Ford signed the Privacy Act of 1974 into law.
One Second vs. 30,000 Years
As the days of free love gave way to the wrath of punk, the government continued to gobble up data for research and defense. It was helped along in its work by the development of a new channel of communication.
The Internet, which had its roots in a Defense Advanced Research Projects Agency (DARPA) project called ARPANET, quickly became entrenched in universities and government research institutions.
In 1985, the U.S. National Science Foundation established the National Science Foundation Network (NSFNET), a hub that connected five supercomputing centers to the National Center for Atmospheric Research.
Then Tim Berners-Lee came up with the idea of setting information free in a World Wide Web. This democratic concept quickly took hold in the collective imagination. By the mid 1990s, there were hundreds, thousands, millions of new data streams crisscrossing the globe.
Which, of course, the government wanted to track:
The Day the Towers Fell
In the wake of 9/11, the Defense Department – which had already been experimenting with large-scale data mining and analysis – stepped up its efforts.
In 2002, DARPA started to develop the Total Information Awareness System. This project would use biometrics, language processing, predictive modeling and database technologies to analyze government data sets. It would examine everything from communications to medical and travel records to identify suspicious individuals. Though the project was shuttered in 2003, many aspects of it migrated to other agencies.
The government was also quickly realizing that counterterrorism agencies needed – as a preschool teacher might put it politely – to learn how to share. In 2004, the 9/11 Commission suggested a “network-based information sharing system that transcends traditional government boundaries.”
Data for All
Agencies were aided in these efforts by an escalating interest in the potential of big data applications – from e-commerce to search engines to science. The lines between commerce, research and government became increasingly blurred.
- 2004: The CIA’s not-for-profit venture arm, In-Q-Tel, produced a new company called Palantir. Building on technology developed at PayPal to detect fraudulent activity, Palantir provided the Pentagon and CIA with sophisticated terrorism analysis software.
- 2009: As part of his Open Government Initiative, President Obama launched data.gov, increasing public and corporate access to thousands of data sets generated by the federal government.
- 2012: U.S. Secretary of State Hillary Clinton announced Data2X, a public-private partnership to collect economic, political and social status statistics on females around the world.
Taking the Initiative
The government made this clear in March 2012 when it announced the $200 million Big Data Research and Development Initiative. This wake-up call detailed the need for each agency to have a big data strategy and improve their analytic tools and techniques.
- The Department of Defense began looking at autonomous systems that could learn from experience, maneuver and make decisions on their own.
- DARPA started the XDATA program, which aims to develop computational techniques and software tools for processing and visualizing imperfect and incomplete data in order to achieve greater battlefield awareness for many types of personnel, whether in planning or on missions.
- The Department of Energy established the Scalable Data Management, Analysis, and Visualization (SDAV) Institute, an effort to unite the expertise of six national laboratories and seven universities on the department’s supercomputers.
Show Me the Money
Though $200 million may seem like small change in government circles, it can yield big results. According to McKinsey, the government could gain as much as a trillion dollars with the use of data analytics.
But you don’t have to be IBM to get a glimpse of government data. As we’ve seen, Obama’s administration has been particularly interested in encouraging data research. As part of the first U.S. National Action Plan (September 2011), the government made over 390,000 agency data sets available for public consumption.
Healthy Data, Healthy Body
Breakthroughs in the delivery of care create large amounts of data. Recent health care reforms and ongoing changes to regulations may have further increased the workload. Thanks to HealthData.gov, there is a plethora of data available to the public for use in research, policy making, and business.
Through a Prism Darkly
One particularly controversial 21st century federal data initiative concerned the National Security Agency:
In May 2013, the Guardian and the Washington Post broke a story on a top-secret program called Prism. According to information contained in an NSA PowerPoint presentation supplied by whistleblower Edward Snowden, the NSA has direct access to data in the systems of Google, Apple, Microsoft, Skype and other giant Internet companies.
Examples of these data include:
- Video and voice chat
- File transfers
- Social networking details
Prism allowed for extensive, in-depth monitoring of live communications and stored information, including data from overseas. Privacy advocates described it as a big step toward a police state.
In response, the U.S. government denied the charge that Prism could be used on domestic targets without a warrant. Officials also noted that it received independent oversight from all three branches of the federal government.
Revelations about other surveillance and big data security projects – Boundless Informant, Bullrun, the British black-ops surveillance program Tempora – continued throughout the summer and fall of 2013.
Data Risks and Regulations
The End of Privacy?
Which brings us back to President Ford’s 1974 Privacy Act. For although the law asserts that agencies must follow certain principles – “fair information practices” – when gathering and handling personal data, it also allows law enforcement agencies to excuse themselves from the Act.
What’s more, the 2001 U.S.A. Patriot Act significantly increased the government’s powers of surveillance and investigation. Sweeping amendments were made to the:
- Wiretap Statute
- Electronic Communications Privacy Act
- Computer Fraud and Abuse Act
- Foreign Intelligence Surveillance Act
- Family Education Rights and Privacy Act
- Pen Register and Trap and Trace Statute
- Money Laundering Act
- Immigration and Nationality Act
- Money Laundering Control Act
- Bank Secrecy Act
- Right to Financial Privacy Act
- Fair Credit Reporting Act
These amendments included changes to voice mail communications, secret searches, surveillance orders, search warrants and a host of other law enforcement tools.
Amidst the debate about the NSA, the federal government is also responsible for legislation such as the:
- Health Insurance Portability Act (HIPAA)
- Equal Credit Opportunity Act (ECOA)
- Telecommunications Act of 1996
Many of these laws are explicitly concerned with protecting the privacy of U.S. citizens and preventing businesses from misusing personal information. Industries including retail and manufacturing are already testing the limits on the uses of individual data for predictive and behavioral analytics.
As data gets bigger and boundaries grow blurrier, arguments in Congress may become much louder.
Last updated: June 2020