Opera to Web developers: Come to MAMA

New search engine helps figure out which Web technologies are most popular

Opera Software ASA today revealed a search engine that indexes structural information about Web pages so Web developers and standards bodies can see what technologies are being used to build Web sites and how they are being used.

The Metadata Analysis and Mining Application search engine -- MAMA for short -- is being tested by the Oslo, Norway-based company and should be released in an invitation-only beta by the end of the year, said Snorre Grimsby, Opera's vice president of quality assurance.

MAMA grew out of tests that Opera routinely conducts to make sure its own browser products work well with existing Web pages that use the most common Web site creation technology, he said.

"We realized internally that we needed to be able to find lots of live sites out there that used certain technologies in certain combinations so we could test our browser on them," Grimsby said.

The resulting search engine crawls the Web, but it doesn't index the content of Web sites, as most search engines do. Instead, it discards the content and indexes the types of technologies being used on sites, such as Cascading Style Sheets (CSS), HTML, Extensible HTML and the like, Grimsby said.

This information is helpful for Web developers, who can use MAMA to identify sites that are using certain kinds of technology and see how other developers have implemented it, he said.

"It's a known fact that Web developers borrow ideas from each other," Grimsby said. If developers are working with a Web application that needs, for example, a new menu system, MAMA can help them find sites that use the technology being considered to build the system to get ideas for their own implementation.

Developers also can use MAMA to see how well sites conform to current World Wide Web Consortium (W3C) specifications for commonly used Web standards, such as CSS, HTML and others. The W3C oversees the creation and maintenance of specs for many of the most prevalent Web-site development technologies.

Grimsby said that in its own use of MAMA, Opera found that the average Web page has 47 discrepancies in how the site renders W3C-maintained technologies and the W3C specifications themselves.

MAMA could also help the W3C and other standards bodies set priorities for developing specifications. For example, if a technology is used a certain way on the majority of Web sites, or not used very much at all, the W3C "can change the spec or take something out of the spec," Grimsby said.

During an interview today, Grimsby demonstrated MAMA in real time, using it to crawl an International Data Group Web page to find out what technologies the site used.

According to the search engine, the site runs Version 2.2.8 of the Apache Web Server on a Windows 32-bit hardware server, has 56 hyperlinks and uses XHTML 1.0 and CSS, he said.

In the next eight weeks, Opera expects to publish a series of articles on its developer Web site about its own internal use of MAMA, noting key findings, statistics and trends the search engine discovers, he said.

By the end of the year, the company will invite key people within standards bodies to test the search engine, with a goal of releasing it publicly to developers sometime in the first or second quarter of next year, Grimsby said.

Copyright © 2008 IDG Communications, Inc.

9 steps to lock down corporate browsers
  
Shop Tech Products at Amazon