Skip the navigation
)
News

Usenix: Dartmouth updating diff, grep Unix tools

Dartmouth researchers are working on variants of diff and grep that can parse more complex data structures

By Joab Jackson
December 8, 2011 05:40 AM ET

IDG News Service - With some funding from Google and the U.S. Energy Department, a pair of computer scientists at Dartmouth University are updating the venerable grep and diff Unix command line utilities to handle more complex types of data.

Such updates are needed because "we now tend to have more model-based configuration languages that have meaningful constructs spanning more than one line," said Gabriel Weaver, a Dartmouth graduate student who, along with Dartmouth computer science professor Sean Smith, is creating the variants of grep and diff. Weaver presented the new utilities at a poster session at the Usenix Large Installation System Administration (LISA) conference, being held this week in Boston.

The new programs will allow administrators to extract meaningful data from configuration files, log files and other sources of operational data, the researchers maintain.

Grep and diff are command line-based text analysis tools available in all Linux and Unix distributions. Both are designed to parse documents on a line-by-line basis. Grep offers the ability to search through multiple text files and folders for a specific chunk of text or regular expression. Diff compares two documents and highlights the differences between them.

As with most Unix utilities, the output from either of these programs can be linked, or piped, to other utilities, so they can be incorporated into scripts that automate routine system administration tasks.

The new programs, called Context-Free Grep and Hierarchical Diff, will provide the ability to parse blocks of data rather than single lines. For each new type of data structure, a vendor would provide a pattern library identifying the basic structure of the data, which the software would then use to "extract the constructs of interest from the document," Weaver said.

Such utilities could provide administrators the ability to work with more complex forms of data now being generated by network equipment and infrastructure software. For instance, Cisco's IOS (Internetwork Operating System), which is the company's operating system for its routers and switches, will provide operational data in block-like data structures.

With this data, a tool such as diff "can be too low-level," Weaver said. "Diff doesn't really pay attention to the structure of the language you are trying to tell differences between." He has seen cases where dif reports that 10 changes have been made to a file, when in fact only two changes have been made, and the remaining data has simply been shifted around.

Grep has issues with data blocks as well. "With regular expressions, you don't really have the ability to extract things that are nested arbitrarily deep," Weaver said.

Context-Free Grep is still in the design stage, but should be completed within the next few months. A prototype of Hierarchical Diff has been completed, though the researchers have not posted the code yet.

Reprinted with permission from IDG.net. Story copyright 2012 International Data Group. All rights reserved.
What is Tech Briefcase?
TechBriefcase is a new, free service where IT Professionals can Search, Store and Share IT white papers and content like this. Learn more
Bookmark content
Speed up your research efforts with content across the web.
Search and Store
Find the white papers you need. Create folders for any topic.
View Anywhere
Open your briefcase on your iPhone, tablet or desktop. Share with colleagues.
Don't have an account yet?
Additional Resources
Security KnowledgeVault
WHITE PAPER
Security is not an option. This KnowledgeVault Series offers professional advice how to be proactive in the fight against cybercrimes and multi-layered security threats; how to adopt a holistic approach to protecting and managing data; and how to hire a qualified security assessor. Make security your Number 1 priority.

Read now.

Cut Communications Costs Once and for All
WHITE PAPER
New IP-based communications systems are being deployed by small and midsized businesses at a rapid rate. Learn how these organizations are enabling faster responsiveness, creating better customer experiences, speeding office or mobile interactions, and dramatically reducing existing communications costs.

Read now.

Hardware White Papers
Gary Watson, CTO, Nexsan: 6 Tips for Selecting Hard Drives
What type of drives should be used for what types of data? Selecting a drive and interface can seem complex with considerations of...
10 Reasons to Modernize the Desktop
Learn how to enhance your business through VMware View
The Laptop Dilemma: How to Maximize Productivity and Lower the Burden on IT
Download Now
Practice Management: Double Billing Rate and Improve Patient Services
Would you like to double your billing rate and achieve faster payment for services?

Download this customer success story to see how One Health...
Mission Critical Data Explosion and Customer Case Study
Would you like to double your tier 1 storage capacity while simultaneously reducing your storage footprint?

Download this customer success story to see how...
All Hardware White Papers
Hardware Webcasts
Distributed Database Security with Real-time Monitoring
View this demo and learn how IBM InfoSphere Guardium database activity monitoring can help protect your sensitive data in distributed DBMS environments with...
InfoSphere Warehouse Packs Demo
These flash modules make warehousing more tangible and relevant to business users through detailed explanations of the InfoSphere Warehouse Packs.
Delivery Management -- Extending Lifecycle Management
Date: Wednesday, June 20, 2012, 1:00 PM EDT

Siloed organizations continue doing the wrong things and doing things wrong, leading to increased costs,...
Leverage automation today to reduce IT complexity
Date: Tuesday, June 5, 2012, 2:00 PM EDT

Whether your B2B complexity is caused by multiple technologies due to M&A, business or application specific...
Redefine Expectations in the Data Center
Need to do more with less? Watch this video to learn how HP ProLiant Gen8 servers can help your business deploy servers three...
All Hardware Webcasts
Newsletter Sign-Up

Receive the latest news test, reviews and trends on your favorite technology topics

Choose a newsletter
  1. View all newsletters | Privacy Policy
IT Jobs