Friday, May 9, 2014

White House Big Data and Privacy Report: Wake Up Call for Geospatial Community?

On May 1, the White House released a report: "Big Data: A Technological Perspective". The report was prepared by the President's Council of Advisors on Science and Technology (PCAST), a group of leading scientists and engineers that make policy recommendations to the President on important issues. The President had asked PCAST to prepare a report on the privacy implications of Big Data.

The Report

The report provides a detailed analysis of the privacy risks associated with Big Data. Many of these risks have been well documented.  However, this report also goes into detail on the risks associated with what it refers to as "born analog" data. Born-analog data is defined in the report as arising "from the characteristics of the physical world.” The data becomes accessible electronically when it “impinges upon a ‘sensor’, an engineered device that observes physical effects and converts them to digital form”.  As this sensor is frequently associated with a physical location, much of this data is geospatial and has been collected, processed, visualized, analyzed, stored and distributed for years by the geospatial community without any privacy considerations.
Undoubtedly, some will question the need for such measures as it is a change in the way they have operated in the past with respect to geospatial datasets. However, it is important for the geospatial community to recognize that privacy concerns have evolved, due to in part to the rapid technological advancements that they helped create.  Other sectors – finance, medical, education – that collect and use data are required to take these steps.  As geospatial technology moves into the mainstream, and the number and variety of commercial uses grow, geospatial companies can expect to become subject to similar requirements. The alternative could be much worse.

However, that is beginning to change. According to the report, the privacy concerns associated with born-analog datasets are that they "likely contain more information than the minimum necessary for their immediate purpose." Data minimization – collecting the minimum required to perform the task at hand – is one of the tenets of privacy protection around the world.  While the report acknowledges that there are a number of technological and business reasons for this to occur, the authors suggests that there are inherent privacy risks with such an approach. For example, “[a] consequence is that born-analog data will often contain information that was not originally expected. Unexpected information could in many cases lead to unanticipated beneficial products and services, but it could also give opportunities for unanticipated misuse.”(p.23)  The line as to whether a use constitutes an unanticipated benefit or an unanticipated misuse often depends upon your point of view.

Many in the geospatial community have believed they are immune from the privacy discussions because the technology they use is not capable of “identifying” a specific individual. For example, satellite and most aerial images are not of sufficient quality to identify an individual’s face or read a license plate. However, privacy risks have evolved. For example, the report cites the increased power of data fusion in connection with born-analog data and states that the risks are not simply in “identifying” an individual but also in developing correlations and creating profiles.

“Data fusion occurs when data from different sources are brought into contact and new facts emerge (See section 3.2.2). Individually, each data source may have a specific limited purpose. Their combination, however, may uncover new meanings. In particular, data fusion can result in the identification of individual people, the creation of profiles of an individual and the tracking of an individual’s activities. More broadly, data analytics discovers patterns and correlations in large corpuses of data, using increasingly powerful statistical algorithms. If those data include personal data, the inferences flowing from data analytics may then be mapped backed to inferences, both certain and uncertain about individuals” (p.x)

The report then goes on to describe various types of technologies that create born-analog data that contains “personal information”. The geospatial community relies on many of these for their products and services, including (i) video from . . . overhead drones; (ii) imaging infrared video; and (iii) synthetic aperture radar (SAR). (p 22) The report also identifies privacy risks associated with LiDAR, acknowledging that while LiDAR is important to governments, industry and a broad range of academic disciplines, “[s]cene extraction is an example of inadvertent capture of personal information and can be used for data fusion to reveal personal information.” (p. 27) In addition, the report cites the privacy risks associated with “precise geolocation in imagery from satellites and drones”. (p. 28)
The report makes several recommendations to the President. The most relevant to the geospatial community are:
·         Policy attention should focus more on the actual uses of big data and less on its collection and analysis;
·         Policies and regulation, at all levels of government, should not embed particular technological solutions, but rather should be stated in terms of intended outcomes; and,
·         The United States should take the lead both in the international arena and at home by adopting policies that stimulate the use of practical privacy-protecting technologies that exist today.  It can exhibit leadership both by its convening power (for instance, by promoting the creation and adoption of standards) and also by its own procurement practices (such as its own use of privacy-preserving cloud services). 

 What does the Report Mean For the Geospatial Community?

It is unlikely that the White House report will result in any laws being passed in this session of Congress that will specifically address privacy risks associated with born-analog data. However, the report has reframed the discussion on privacy in a way that will have a direct impact on the geospatial community. For example, suppliers of geospatial data products and services to the federal government soon may be required to certify that they are taking proper steps to protect any personal information acquired from born-analog data. The geospatial community also should expect that regulators, such as the Federal Trade Commission - and the Federal Aviation Agency with respect to UAVs – will begin citing the findings of this report in future discussion on policies and regulations. Lawyers will also likely cite the report to influence court decisions on matters regarding privacy concerns associated with geospatial data.          

As a result, organizations that collect, use, store and/or distribute geospatial data should consider taking a number of steps. These include:
-          Conducting an inventory of their born-analog data to identify potential privacy risks;
-          Developing privacy policies (external) and privacy statements (internal) with respect to born-analog datasets that do (or could) contain personal information;
-          Incorporating explicit language requiring compliance with privacy laws and regulations in their vendor and customer agreements; and
-          Training employees who work with born-analog data on privacy and internal procedures.

In addition, if geospatial datasets are deemed by law to contain “personal information”, there may be additional obligations imposed upon geospatial organizations. For example, they may be required to implement specific information security measures, such as encryption, when the data is transferred or stored. Geospatial organizations may also become subject to state data breach laws, which details specific steps to be taken if networks are hacked, or certain data is lost or stolen.

No comments: