Library of Congress' useless Twitter archive is almost complete

The U.S. Library of Congress’ slow complication of Twitter’s public tweet archive is, according to the organization, moving along well; The first objectives laid out for the archiving project has finally been achieved. There’s only one question, however: How do you actually share all of those tweets now that you have them?

In a blog post on the Library of Congress’ site, Gayle Osterberg, the Library’s Director of Communications, wrote that once the April 2010 agreement between the Library and Twitter to archive the public tweets from the service’s origins in 2006 had been signed, work began in earnest on how best to achieve that aim. “The Library’s first objectives were to acquire and preserve the 2006-10 archive,” Osterberg wrote, “to establish a secure, sustainable process for receiving and preserving a daily, ongoing stream of tweets through the present day; and to create a structure for organizing the entire archive by date.”

As of this month, she goes on to explain, those initial goals have been met. “We now have an archive of approximately 170 billion tweets and growing,” she stated in the blog post, adding that “the volume of tweets the Library receives each day has grown from 140 million beginning in February 2011 to nearly half a billion tweets each day as of October 2012.” With that part of the project dealt with, it’s time to turn to a perhaps more problematic task – one Osterberg politely describes as “addressing the significant technology challenges in making the archive accessible to researchers in a comprehensive, useful way.” In other words, how to actually make it an archive that serves any real purpose, as opposed to a permanent record that is – for all intents and purposes – unavailable to anyone outside of the Library itself.

In a five-page report updating progress on the project, the Library notes that it has already received more than 400 requests for access to the archive, but it hasn’t as yet approved any. The reason is that right now, even just searching the fixed 2006-2010 archive Twitter shared before offering “live” updates to the ongoing record can take up to one day – something that the Library describes as “an inadequate situation in which to begin offering access to researchers.”

“It is clear that technology to allow for scholarship access to large data sets is not nearly as advanced as the technology for creating and distributing that data,” the report continues, pointing out that “even the private sector has not yet implemented cost-effective commercial solutions because of the complexity and resource requirements of such a task.” As a workable solution is sought, the Library promises that it will “develop a basic level of access that can be implemented” for the archive. For example, it aims to consult with outside experts to try and build something permanent that can handle the interest in the archive.

Hopefully no-one’s waiting with baited breath to check out what pop culture ephemera we were all talking about seven years ago, because we’ve obviously got a long way to go.

Editors’ Recommendations