Usenix: Dartmouth updating diff, grep Unix tools
Dartmouth researchers are working on variants of diff and grep that can parse more complex data structures
IDG News Service - With some funding from Google and the U.S. Energy Department, a pair of computer scientists at Dartmouth University are updating the venerable grep and diff Unix command line utilities to handle more complex types of data.
Such updates are needed because "we now tend to have more model-based configuration languages that have meaningful constructs spanning more than one line," said Gabriel Weaver, a Dartmouth graduate student who, along with Dartmouth computer science professor Sean Smith, is creating the variants of grep and diff. Weaver presented the new utilities at a poster session at the Usenix Large Installation System Administration (LISA) conference, being held this week in Boston.
The new programs will allow administrators to extract meaningful data from configuration files, log files and other sources of operational data, the researchers maintain.
Grep and diff are command line-based text analysis tools available in all Linux and Unix distributions. Both are designed to parse documents on a line-by-line basis. Grep offers the ability to search through multiple text files and folders for a specific chunk of text or regular expression. Diff compares two documents and highlights the differences between them.
As with most Unix utilities, the output from either of these programs can be linked, or piped, to other utilities, so they can be incorporated into scripts that automate routine system administration tasks.
The new programs, called Context-Free Grep and Hierarchical Diff, will provide the ability to parse blocks of data rather than single lines. For each new type of data structure, a vendor would provide a pattern library identifying the basic structure of the data, which the software would then use to "extract the constructs of interest from the document," Weaver said.
Such utilities could provide administrators the ability to work with more complex forms of data now being generated by network equipment and infrastructure software. For instance, Cisco's IOS (Internetwork Operating System), which is the company's operating system for its routers and switches, will provide operational data in block-like data structures.
With this data, a tool such as diff "can be too low-level," Weaver said. "Diff doesn't really pay attention to the structure of the language you are trying to tell differences between." He has seen cases where dif reports that 10 changes have been made to a file, when in fact only two changes have been made, and the remaining data has simply been shifted around.
Grep has issues with data blocks as well. "With regular expressions, you don't really have the ability to extract things that are nested arbitrarily deep," Weaver said.
Context-Free Grep is still in the design stage, but should be completed within the next few months. A prototype of Hierarchical Diff has been completed, though the researchers have not posted the code yet.
- 15 Non-Certified IT Skills Growing in Demand
- How 19 Tech Titans Target Healthcare
- Twitter Suffering From Growing Pains (and Facebook Comparisons)
- Agile Comes to Data Integration
- Slideshow: 7 security mistakes people make with their mobile device
- iOS vs. Android: Which is more secure?
- 11 sure signs you've been hacked
- 4 Customers who never have to refresh their PCs again This paper illustrates a common theme: the combination of desktop virtualization and thin client computing helps organizations deliver an up-to-date user experience more...
- Mobile Devices: The New Thin Clients Get essential guidance for understanding the role thin clients plus virtual desktops play in the enterprise today.
- Taking Windows Mobile on Any Device Taking Windows applications mobile has many advantages, but the process of identifying a solution is complex. Learn how to solve this complex problem...
- PaaS - Powering a New Era of Business IT Why PaaS has suddenly become relevant and irresistible to many organizations. Dive into the opportunities and considerations associated with using PaaS from an...
- Redefine Your IT Operations: Remote Office IT Has Never Been Simpler Join us to see why PC Pro named Dell PowerEdge VRTX the "2013 Server of the Year." PowerEdge VRTX may be just what...
- Meg Whitman presents Unlocking IT with Big Data During this Web Event you will hear Meg Whitman, President and CEO, HP discuss HAVEn - the #1 Big Data platform, as well... All Hardware White Papers | Webcasts