Are social media e-discovery's next nightmare?

In many ways, the emergence of social media is a replay of the e-discovery challenges posed by email in the 1990s. This time, though, we have that precedent to learn from.

People who have been involved in the challenges of e-discovery for a while remember when email arrived on the scene nearly two decades ago. It changed the way people collaborate and left companies with mounds of digital information that was costly and time-consuming to sort through when litigation struck.

The arrival of social media is in many ways a repeat of those challenges. As was true of email, social media comes with new metadata and formats. But because of the similarities, there is an opportunity to avoid the mistakes made with email. One thing is clear: Companies that dive into social media without the right policies and solutions to govern usage will encounter information governance and e-discovery nightmares down the road.

With email, companies could plead ignorance about the e-discovery issues that arose. The digital revolution was new and case law and civil procedure rules were still in flux. With email as a precedent, however, companies cannot hide behind ignorance in the case of social media. Instead, they can get ahead of social media by putting in place governance policies, processes and tools to ensure that the email history lesson informs these new methods of collaboration.

Social media have seen widespread adoption. In order to avoid the mistakes made in the email generation, companies must figure out ways to best collect and preserve social-media content in the event it is needed for e-discovery. Today, this practice is extremely immature.

Recently, the eDJ Group conducted a survey on "The Cloud and eDiscovery" that looked at the experiences e-discovery professionals have had with collecting and preserving information from cloud-based sources such as Amazon, Rackspace and social-media publishers. At most, only 15% of the respondents indicated that they have had to collect from a popular social-media service. But that figure will surely rise.

Technological methods for collection and preservation

When it comes to the collection and preservation of social-media content, companies have several choices of technological methods, each with distinct pros and cons.

Web crawling

A Web crawler is a computer program that periodically browses the Web (in this case, a social-media URL). The crawler creates a copy of the page to be stored for processing into a preservation repository. Companies can set up Web crawlers to capture content from social-media sites at various intervals. Most of these systems store social-media content as static Web pages. However, Web crawling does not necessarily create a forensic capture of a Web page in its full context and therefore may not be sufficient in certain types of cases.


Companies can set up programs that will essentially take a screenshot, or screen scrape, of a Web page and then store that image as a record of the page at that point in time. In most cases, the image will be converted to a PDF (or similar) file so that it can be indexed and searched within a preservation repository. A screenshot, though, is not a full capture of the information in a Web page. It lacks metadata and other context that may be important depending on the matter.

Publisher application programming interfaces

The major social-media publishers have APIs that third parties can write to in order to enable collection directly from the publisher. By writing to an API, it is possible to capture all of the data and metadata that the publisher makes available -- for example, a Facebook page -- and then map that data back into a preservation repository. A major consideration with the API method is bandwidth; social-media sites create massive volumes of content. In 2011, content aggregator Gnip estimated that Twitter created 35MB per second of sustained network traffic. That is a lot of content to ingest. It is wise to use third-party applications that connect to the social-media publishers directly.

There are many ways to execute an API collection approach. Many third-party vendors build connectors to social-media publishers and then provide applications that allow customers to collect and preserve as needed. One approach is to have employees authorize the enablement of an application that sits on the social-media site and have that application monitor and collect all information. This can be done automatically at the company firewall and gives the company an opportunity to restate policies and capture login information with informed consent of users. This practice has user privacy implications that should be carefully evaluated by counsel, especially for global corporations with users/customers located in foreign countries with strong privacy protections.

Proxy method

In the context of collecting and preserving social media, a proxy approach is one where a company requires employees to interface with social media through a proxy server so that interactions can be monitored and captured.

The most comprehensive approach to social-media collection and preservation would combine the API and proxy methods. Doing so would ensure complete capture of all of a user's social-media content. But this approach is probably overkill for any but the most highly regulated organizations (and even then, it will only be a small subset of employees in regulated companies that need to be monitored so closely).

Publisher-specific methods

It is worth noting that social-media publishers make collection possible, for example with Twitter's "public follow" and Facebook's "download your information." But these methods have limitations and aren't suitable for many cases. Twitter's public-follow feature enables access to all the past Tweets of a specified user and any new Tweets in real time without generating a formal "follow" request, but with a limit of 3,200 past Tweets . And the feature works only if the user allows Tweets to be public.

Developing clear policies on social media

Another lesson from the past: When email archiving first started, companies archived all emails, typically through journaling. That led to bloated archives that broke down and became more expensive than they were worth.

Companies need to get ahead of the social-media curve and have granular policies on what social media to collect, how to preserve it and how long to archive it. There is no need to simply keep everything; the right policies and defensible execution of those policies will manage the risks that social media can pose.

It is wise to start by creating clear policies that state whether employees should expect to be monitored when using social media on company property (that is, when using company-issued devices or the company's network). Importantly, the policy should clearly delineate acceptable use of social media for business purposes and state whether employees can use social media for personal reasons. This social-media governance effort should be driven by legal and compliance profiles.

Clearly, regulated companies have more reason to aggressively monitor, collect and preserve social-media content. Even in highly regulated companies, however, the rules of storing electronic communications apply to only a subset of employees. Especially in these early days of social media, companies must be careful about over-collecting and running into storage problems down the road.

They also need to determine which content capture mechanisms are right for them. Companies must determine what constitutes a reasonable effort to collect and preserve social media, when to select employees when a historical point-in-time view is required, or when to capture adds, changes or deletes going forward to support an ongoing matter. And finally, companies need to plan for more and more forms of social media going forward. While companies can focus near-term efforts on the major publishers (Facebook, LinkedIn, Twitter), over time there will be more and more sources of social media for companies to govern. For example, the number of users of social networks like Google+ and Tumblr continues to increase. More and more companies share video content via YouTube. And just imagine the rich evidence that a site like FourSquare (which allows users to check in with their geographic location) could provide.

In short, the relevance of social media to the e-discovery process is obvious. It would be a mistake not to start thinking about how to address the challenges that the emergence of social media poses.

Barry Murphy is an analyst at the eDJ Group.

Computerworld's IT Salary Survey 2017 results
Shop Tech Products at Amazon