In March, HealthMap.org, a disease surveillance website that aggregates reports from social media, news websites, and public health officials, shared a link from a French language news website in Guinea reporting eight deaths from a mystery illness that appeared similar to Lassa fever. Nine days later, the World Health Organization reported the start of the Ebola outbreak in Guinea.
HealthMap.org's successful identification of early Ebola cases is just one example of ways in which researchers are capitalizing on the availability of big data in the life sciences to protect public health and safety. However, such efforts can also require grappling with research obstacles and security risks, which are examined in "National and Transnational Security Implications of Big Data in the Life Sciences," a new report by AAAS, the FBI, and the UN Interregional Crime and Justice Research Institute (UNICRI).
For example, many researchers are hindered by the absence of shared terminology across datasets, preventing researchers from performing data analysis. This may seem like a simple challenge to overcome when, for example, data analytic technologies cannot easily compare data from healthcare providers who classify individuals as male/female, man/woman, or M/F in a small dataset. Yet terminology differences in data collection become increasingly difficult to resolve when researchers are working with a large amount of data, even with advanced computing techniques.
These and other challenges are preventing researchers from utilizing big data's full potential to address national and transnational security issues, including infectious disease surveillance, drug development, societal verification of arms control treaties, emergency preparation and response, agriculture and food security, and more, according to the report.
The AAAS Center for Science, Technology and Security Policy (CSTSP), the FBI Weapons of Mass Destruction Directorate (WMDD) Biological Countermeasures Unit (BCU), and UNICRI engaged in a year-long study on the national security implications of the use of big data in the life sciences. Specifically, AAAS, the FBI, and UNICRI aimed to identify the current state of big data and analytics, the benefits and risks of big data in the life sciences to national security, and needed solutions for addressing exploitation of system vulnerabilities or intentional use for harmful or criminal purposes.
The report noted that scientists in some fields are making headway to address the challenges related to using the full potential of big data. For example, researchers agreed to use standard terminology for genomic data during the early phases of the Human Genome Project. In August 2014, the National Institutes of Health (NIH) issued the Genomic Data Sharing (GDS) policy which included the need for standardized language and applies to all NIH-funded research that generates large-scale genomic data at all funding levels, as well as the use of data for subsequent research. This example may be instructive for other fields and demonstrates that scientists and government agencies can work together to ensure they are speaking the same language.
The report also analyzes three hypothetical scenarios where big data from the life sciences could be exploited in harmful ways. In one, a country wants to targets a specific subpopulation within its borders by designing (not necessarily creating) a virus that preferentially infects a group among its citizens. In the second, a group outside the U.S. wants to prevent government and international health officials from discovering its covert scientific activities to modify a biological agent it acquired from a local laboratory. And, in the third, a non-state actor uses new advances in computational analyses of multiple types of scientific data to determine which genetic segments are found only in dangerous strains of a specific pathogen.
The analysis reveals that "gaps remain in law and technology development, which currently results in a greater reliance on institutional and individual responsibility to prevent theft, manipulation, and exploitation of Big Data in the life sciences," the report concludes. The authors recommend that scientific and security communities work together to identify risks and vulnerabilities, and subsequently develop measures to detect, deter, prevent, and respond to any security concerns associated with Big Data and its various applications in the life sciences.
The AAAS, FBI, and UNICRI report was produced by a working group composed of leaders in technology, security, law enforcement, computer and data sciences, and the life sciences representing government agencies, intergovernmental organizations, the private sector, and academia. The group was assisted by experts in cyber security, chemical and biological defense, amateur biology, and philosophy.
"The hope is to provide scientists and security experts with the tools and a process to evaluate the risks and benefits of emerging and enabling technologies, as well as solutions to minimize the risk and maximize benefit proactively," said Kavita Berger, AAAS CSTSP associate director.
The report created risk-benefit assessment frameworks and offered options for the U.S. government's action to capitalize on the benefits of big data in the life sciences while mitigating risks, as well as technical, legal, institutional, and individual solutions to several of the security risks related to big data in the life sciences.
"As more information (both biological and non-biological) is collected, computational analysis tools advance, and standards for data collection and sharing are developed, the potential for Big Data technologies to enhance both national security and adversary capacities is likely to increase," the report said. "Now is the time to develop tools through which scientists, policymakers, and security experts can evaluate the risks and benefits concurrently, and consider solutions to prevent or mitigate risks identified."
The private sector, academia, and governments all play important roles in supporting and conducting research and developing new data technologies, the report said. "Thoughtful consideration of the possible risks (from system vulnerabilities and intentional misuse) and benefits, qualitative assessment of the risks and benefits, and identification of existing and needed solutions are extremely important to ensure that Big Data in the life sciences is developed and applied for maximum benefit."