CDT files brief urging privacy safeguards for Google Books
CDT has filed an amicus brief in the Google Books copyright lawsuit, asking for the judge to approve the proposed settlement in the case, but also to ensure that reader privacy is protected as Google implements the expanded services envisioned in the settlement. The settlement would dramatically transform Google Books from an index and finding aid into a resource for users to browse, preview, and purchase full-text access to the millions of books Google has scanned from libraries around the world. Such increased access to knowledge will certainly be of great benefit to scholarship and education, but CDT and several other information privacy advocates have identified serious potential threats to reader privacy and intellectual freedom-values that have for many years been taken seriously and highly protected by libraries.
As a general matter, CDT supports approval of the settlement because the array of services described will greatly benefit the public by enabling unprecedented digital access to millions of books. Google will gain the right to offer expanded previews of books returned in search results ("Previews"); to sell online access to books in their entirety ("Consumer Purchases"); to sell subscriptions to the entire database of books for institutions ("Institutional Subscriptions"); and to offer free access to the entire database via terminals in public libraries ("Public Access Service"). In exchange, Google will provide ongoing compensation to rightsholders, in addition to a one-time payment for books already scanned. These payments to copyright holders will be coordinated and distributed by a newly established "Books Rights Registry." These valuable new services, and the accompanying compensation to rightsholders, would not have been possible even if Google had won a fair-use victory in the lawsuit.
However, these services also give rise to serious privacy risks. Managing and differentiating among the services will require extensive data collection on Google's part, including the collection of sensitive personal information about which books people are reading. In addition, Google will need to share some information about usage of the services with the Registry so that it can administer payments to rightsholders. However, the settlement agreement does little to describe what specific information will be collected, how it will be secured, and what limitations will be placed on Google in terms of how it will be used and shared. Because readers' privacy rights were not a central issue in the copyright dispute being settled, it is understandable that these issues were not considered in the agreement. Nevertheless, they need to be addressed.
CDT "friend of the court" brief
CDT report on Google Book Service (July 2009)
Both as a legal and policy matter, readers have long enjoyed a high level of anonymity and privacy with respect to their reading habits. The First Amendment arguably protects the right to receive information anonymously. Accordingly, the American Library Association Code of Ethics has a longstanding commitment to intellectual freedom and patron privacy, stating, "We protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired, or transmitted." Forty-eight states have reinforced individuals' right to privacy with respect to what they read by enacting statutes that either expressly protect these records or exempt them from public disclosure rules. With respect to book purchases, courts have likewise been generally been reluctant to compel bookstore owners to reveal information about their customers' purchases.
In this context, the settlement represents a sea change with respect to the treatment of material that has historically been highly protected, and presents a unique opportunity to ensure that strong protections remain in place as 'the library' moves online. Google is in many ways taking on the role of the public library as a gateway to information, only on a much larger and more comprehensive scale. By hosting the scans of the books and closely managing user access, Google will have the capability to collect data about individual users' searches, preview pages visited, books purchased, and even time spent reading particular pages. Whereas in the offline world such data collection is either impossible or widely distributed among libraries and bookstores, Google will hold a massive centralized repository of books and of information about how people access and read books online.
To comply with the terms of the settlement, Google will collect detailed user information to authenticate users, to differentiate among the services offered, to calculate payments to rightsholders, and to prevent fraud and unauthorized access to the scanned books. (For a fuller discussion of the data collection the settlement necessitates, see CDT's July report.) Additionally, Google will have to share data with the Registry. Some collection and sharing is of course necessary to effectuate the settlement, but while the agreement does state that Google cannot be forced to disclose "confidential or personally identifiable information except as compelled by law or valid legal process" in the case of a security breach, it does not place any limitations on voluntary disclosure by Google.
More generally, the agreement also does not address Google's collection, use, retention, and sharing of user data outside a few narrow contexts. The fact that Google offers dozens of information services and maintains rich profiles on users' use of Google's services only compounds the privacy risk. In the absence of binding limits on what Google can do with the data it collects about readers through GBS, Google would remain free to combine that data with other data that Google collects, adding a rich and personal dimension to user profiles and making them more attractive for a variety of uses, from marketing to litigation.
Google's increasingly comprehensive stores of user data will likely be a tempting information source for government surveillance as well. Law enforcement has already shown considerable interest in search engine data and other kinds of records showing Internet usage. As Google's data collection capability grows, it is imperative that its thresholds for governmental disclosure are adequate to ensure user privacy and due process.
With such potential for data collection and surveillance, formal privacy safeguards are necessary to ensure that readers maintain the privacy they have traditionally enjoyed, thus preserving the right to read anonymously. To this end, CDT has recommended, both in our report and in our brief to the court, specific protections based on Fair Information Practices that Google should abide by as it implements the services outlined in the Settlement. If adopted, the recommendations would ensure that Google limit its collection and retention of data to that which is necessary to effectuate the terms of the settlement; guarantee that Google share only aggregate and non-identifiable data with the Registry; set a high standard for disclosure of user data to third parties, including law enforcement; and provide users adequate notice and control over what information is collected and stored.
The Google Books settlement represents a sweeping change to the publishing industry and the role of libraries, and as such deserves the scrutiny it has gotten in the ten months since it was announced. We strongly believe that the settlement should be approved, and that privacy must be a part of that approval.
But Google will not be alone in the market for electronic books, nor is the company alone in providing electronic access to information more generally. Viewed in this broader context, Google Book Search is just the latest example of how innovative online approaches to communication and information dissemination can strain and reshape existing privacy laws and norms.
Because of the enormous scale and great potential value of the Google Books project - and since it will be operating with the blessing of a court - CDT feels it is crucial that longstanding privacy norms be carried over into the new services. The settlement presents a unique opportunity to ensure that the important public interest in reader privacy is built in from the start. The safeguards we recommend, however, should not be unique to Google, and we would recommend similar practices to any provider of similar online information services.
Consequently, CDT continues to work towards and support general legislation establishing technology-neutral baseline consumer privacy protections. One of the key aspects to our argument for privacy safeguards in Google Books is the ongoing enforcement oversight of the judge presiding over the case. At the same time, however, the legislative nature of many of the issues implicated in this settlement - copyright licensing, orphan works, consumer privacy, antitrust - suggest that the right place to address these questions is in Congress. Rather than working on a one-off basis as new and potentially disruptive projects emerge, well-written consumer-focused baseline privacy legislation would ensure that privacy is protected no matter where users go online or with whom they communicate. With Google Books as the most recent and highest profile example of the need to protect privacy online in mind, CDT will continue to advocate such legislation.