PDFs are handy for displaying articles and books in a well-designed format. But for data analysis? Not so much. Yet there are times where data you'd like to analyze is only available in a table within a PDF -- especially frustrating since odds are, that data began in a much friendlier database or spreadsheet format.
Enter Tabula, a free, open-source tool designed for "liberating data tables locked inside PDF files." It was created by several journalists with the support of a number of organizations including Knight-Mozilla OpenNews, the New York Times and La Nación DATA.
To use, download the software from the project website . It runs locally in your browser and requires a Java Runtime Environment compatible with Java 6 or 7. Import a PDF and then select the area of a table you want to turn into usable data. You'll have the option of downloading as a comma- or tab-separated file as well as copying it to your clipboard.
You'll also be able to look at the data it captures before you save it, which I'd highly recommend. It can be easy to miss a column and especially a row when making a selection.
The 30-second video below, produced by the Tabula project, shows more of how it works on a Windows system. There are also versions available for OS X and Linux.
Note that Tabula is only designed for PDFs that were created from electronic text; it is not OCR software and won't work with scanned images. Its creators also caution that it works best on simple table formats, not those where some rows or columns span multiple cells.
Looking for other tools? Check out my chart of 30+ free tools for data visualization and analysis.
China said it plans to develop a prototype of an exascale supercomputer by the end of this year,...
The U.S. Federal Communications Commission has voted to roll back some net neutrality regulations that...
President Donald Trump is considering a new way of distributing the H-1B visa to ensure they go to the...
New data from The Solar Foundation’s annual job census show the number of jobs in the U.S. solar-power...
Senate Republicans think it’s fine and dandy for your ISP to sell your private data.
Experian CIO Barry Libenson and his team used APIs to improve data transfer to divisions and customers,...
Banks like Wells Fargo are rolling out much better ATM security. Here's why it won't stop ATM theft. ...