Every four years the Chaos Computer Club (CCC) in Germany holds a Chaos Communication Camp, which is so cool that it inspired the American-flavored hacker camp ToorCamp. As always there were a plethora of excellent talks, but today we are looking at a talk that covered hypothetical methods to obtain knowledge, via academic research behind payalls, for free...without getting arrested.
At Chaos Communication Camp 2015 in Germany, Storm Harding, a “researcher investigating the intersection of piracy and privacy” presented Jumping the Paywall, or “how to freely share research without being arrested.” Harding is “deeply committed to contesting the notions of Intellectual Properties in all their nefarious manifestations (including copyleft).”
Jumping the Paywall explores “how to procure access to all-too-often restricted content sequestered behind extortionate academic paywalls, and how to then safely freely disseminate said content without being apprehended.” The talk, which included a waiver of liability, covered potential ways to get around paywalls as well as:
- Content access procurement (how to secure free access to knowledge from a variety of sources).
- Operational security during field deployment (how to stay safe when procuring content from physical sources such as libraries).
- Content Defanging (how to remove three problematic types of shackles from digital content: content protection, metadata, and watermarking identification and removal).
- Content distribution (how to distribute the now-defanged content safely).
Just the mention of fighting a paywall brought Aaron Swartz to mind; he successfully scraped 2.7 million documents from the paywalled PACER, which stands for Public Access to Court Electronic Records. Paywall, of course, implies that public access is an oxymoron since it’s not free for the public to access.
After Swartz scraped JSTOR (Journal Storage), a digital library of academic journals and books, it led to federal hacking charges and, after Swartz’s suicide, led to Aaron’s Law to hopefully fix vague language in CFAA (Computer Fraud and Abuse Act). If you still haven’t seen The Internet’s Own Boy: The Story of Aaron Swartz then you should watch it with all haste. Harding of course mentioned Swartz and his Guerilla Open Access Manifesto
There have long been loopholes to get around paywalls hiding quality news content, but to this day there are huge disagreements about whether access to science journals and other academic research should be free. A few days ago, the academic publishing industry was likened to a “gigantic web of avarice and selfishness.” Knowledge is definitely not always free; instead there are exploitative pricing schemes to get to the knowledge within academic papers.
Hypothetical ways procure academic papers for free
Let’s follow Harding down the rabbit hole of theoretical ways to jump paywalls and procure academic papers for free. Harding listed rules to always follow such as “always be piratin,’ or why you should always steal books from the library.” He advised CCC to “never use sources which require identification” since if you scan a journal and post it online, then it can be tracked back to what you borrowed at the library.
Alternative solutions when you are hungry for knowledge are to consider looking at Library Genesis, which had 38 million articles spanning 28 TB back in March 2015, and Sci-Hub, which uses EDU proxies to access academic articles. Harding also suggested checking crowd-sourced sources for articles such as Reddit Scholar or #ICanHazPDF. “Obvious” alternatives included Google Scholar and checking the authors’ personal or work pages for email addresses or links to their articles.
Use “open access terminals” at universities instead of logging in with your credentials. Harding’s fun fact involved escalating privileges: “You can sometimes turn the library catalog computers into open access terminals by hitting the Windows key, then right-clicking on ‘show desktop’ and clicking the web browser icon.”
As a last resort, a person could use a Wi-Fi hotspot that has EDU access. Trying this method means using excellent operational security practices. Never go back to the same source, use multiple “non-obvious exits” and do not create any record of your presence in “the enemy compound.”
Steps before sharing academic papers procured
Before sharing the documents you obtained, you need to take “content defanging” steps, meaning “to remove all the poison venomous publishers inject into articles.” The presentation described three main types of “bad things” that must be removed before sharing: 1) Content protection, which is “utilized for restrictions on content disassembly;” 2) watermarks, which are “utilized for traitor tracing;” and 3) metadata, which is also “utilized for traitor tracing.”
Content protection is meant to protect the document from being printed, edited, copied and sometimes even read. Before sharing, a wise person might use a program like Advanced PDF Password Recovery Pro or “other” brute force PDF password cracker to remove such protection.
The presentation suggested this would work for basic content protection, but not all. For example, some documents are protected by Adobe’s “LiveCycle Rights Management.” It would need removed or the PDF will expire after a certain number of days, so a person might want to follow this removal guide to spoof the server and remove content protection.
Watermark excision: Briss was suggested as a way to crop pages to remove watermarks, but “Briss performs what is called a non-destructive crop,” meaning the watermark is resized but not deleted…as in a forensic expert could still retrieve the watermark. Therefore the next step is to use PDF Creator to print the page with new margins that do not include the resized watermark. Slide 29 describes other potential watermarks and how to neutralize them.
Metadata spoofing: Metadata for a PDF might include the author’s name, a timestamp when the document was created, time zone and “the ever-mysterious UUID” (Universally Unique Identifier). Harding said, “Adobe has a built-in metadata scrub tool, but don’t you trust it!” Adobe’s checkbox to “discard document information and metadata” was referred to as a “lie.”
After running Adobe’s tool, then open the PDF in some flavor of hex editor to “modify the timestamps and the Document/InstanceID UUID fields.” Harding delved into additional complexity, but “if the aim isn’t spoofing but simple removal, automated tools” such as Metadata Anonymization Toolkit “are readily available.”
Time to share the knowledge!
Whether or not you agree with Harding and jumping paywalls to score academic documents, his presentation seemed fascinating. Harding said, "Stay safe. Eat the publishers before they eat us. Remember...we are at war."