Computer vision is advancing in leaps and bounds

Although initially touted as a consumer technology - and then slated for its servers' inability to cope with the curious hordes who rushed to take a look when it was unveiled recently - there's a lot more to Microsoft's Photosynth technology than just a clever online photo-album.

Most commentators have focused on Photosynth's ability to stitch several photos of the same scene together into a 3D model of that place - and yes, that really is impressive. However, the technology (which is based on graphics software developed by Seattle-area start-up Seadragon, itself bought by Microsoft in 2006) is only part of the story.

I recently had the chance to chew over the concepts behind Photosynth with Paul Foster, one of Microsoft's 'technology evangelists', and was a bit surprised at first when he didn't immediately hype the 3D aspect.

The bigger story is organising photos in context, he explained, with the composite 3D image simply being a good way to view that.

Sure, it's extremely clever how the 3D image is deduced from the 2D images, and then used as a framework upon which we can hang and review those images.

Want to zoom in on a feature? The synth software knows which high-resolution images cover the area you're currently looking at in low-res, and the Snapdragon software knows how to smoothly transition between the two.

It's also an online photo-sharing service, so it can bring in pictures of the same scene taken by other people to add both coverage and depth. And it can extrapolate 3D images - albeit fuzzy ones for now - of an object or scene from directions where no 2D photo exists.

Perhaps more significantly for business though is that it works by analysing pictures for textures, not just patterns. This makes it easier to spot the same subject in two different photos - and you can do more with that than merely building 3D images.

It could be used to identify a person or building, for instance - that's got enormous applicability, from security onwards. Photosynth can also group photos by time, to show ergonomicists how people move around an event, say.

There are still issues, of course. Texture recognition can have difficulty with highly reflective surfaces or all-white buildings, and while refined algorithms mean that a synth now takes minutes rather than hours, it's still not something you can do on a PC. Instead, the local program processes each new photo using your GPU, then uploads its calculations to the online service, where the synth is done.

But it's a clear harbinger of something very interesting, with huge potential for the future.

Indeed, while his job title says 'evangelist', Paul Foster clearly isn't overstating it when he adds: "We are at the very beginning of the sense of vision as a more powerful element for computers."

This story, "Computer vision is advancing in leaps and bounds" was originally published by


Copyright © 2008 IDG Communications, Inc.

Shop Tech Products at Amazon