Diffbot, the "Visual Learning Robot," Helps Web Content Go Mobile

Sep 01, 2011

Diffbot is one of those applications (and companies) you probably are not even aware of when you use it, but that's not necessarily a problem for the company's co-founder and CEO Michael Tung.

That's because his product is a "visual learning robot," that hundreds of developers are using to translate web content into better mobile apps, and as such it stays pretty much under the hood.

"We've invented this visual ID algorithm," says Tung. One of our core insights is that the entire web can be classified down to 30 page types. There are product pages, event pages, news pages -- we can identify them visually with 99.999 accuracy."

Diffbot technology identifies each page's components, such as nav bars, footers, etc., as part of its identification process. Design standards are such that there is a high degree of similarity between the various page types grouped by category.

One customer using Diffbot at present is AOL's recently launched Editions, which is a personalized daily magazine for the tablet.

"Editions uses thousands of news sites," says Tung. "They send the urls to our servers, and our technology analyzes all of that content and extracts the headlines, bylines, and other key elements and assigns to a topic. When Diffbot analyzes the front page of Huffington Post, for example, it can identify what the top story is because it recognizes the features that define the top story there, and it knows what the topic is, also."

He continues, "it's not just a black box. It leverages all that human knowledge and the work of all those news editors out there."

A key factor driving Diffbot's growth is the problem mobile devices like the iPad have in interfacing with web content. "It's kind of a crappy experience," as Tung puts it. "And this is a huge opportunity for us. Because we extract that web data and make it easy for developers to use it in creating mobile apps."

For me, one of Diffbot's most surprising features is it works not only in English, but in some 250 languages, "because we leverage Wikipedia" (which publishes in all those languages.) "The ontology structure of data on Wikipedia allows us to use it as a training set" – that allows Diffbot to visually analyze a web page in any language by recognizing each page's tell-tale visual components.

The company, with a staff of five, is based in Palo Alto and was incubated by Stanford's StartX. Tung and his co-founder, Leith Abdulla, are Stanford grads. The company charges customers like AOL a licensing fee and a usage fee, and Tung says the startup is already profitable.

Diffbot, the "Visual Learning Robot," Helps Web Content Go Mobile

31 Fun Things to Do This Week (7.6.26)

Concept Store deMain Opens in the Mission, Old Navy’s Ready for the Fourth + More Shop Talk

7 Bay Area Food Trails to Feed Your Craving for Burgers, Ice Cream, Tacos + More

Master Sci-Fi storytellers Ray Bradbury and Kurt Vonnegut take the stage at Z Space.

17 Best Pit Stops from the Bay to Tahoe: Grab a Bite, Stretch Your Legs, Swim + Stock Up on Snacks

An Uni Foraging Adventure on the Sonoma Coast

31 Fun Things to Do This Week (6.29.26)

Locals We Love: Rohan Krishnamurthy, the Percussionist Behind Runaway Berkeley Rep Hit, 'The Lunchbox'

SFO + LAX are the only U.S. airports from which you can fly Cathay Pacific's luxe, game-changing Aria Suite.

31 Fun Things to Do This Week (7.6.26)

17 Best Pit Stops from the Bay to Tahoe: Grab a Bite, Stretch Your Legs, Swim + Stock Up on Snacks

13 Incredible Swimming Holes in Northern California

Modern Guide to Healdsburg: Great Restaurants, Chic Stays + All the Wine

Modern Guide to Inner Richmond: Endless Eats + Park Life on SF's Newly Buzzy West Side

Hike, Climb, Slide + Admire the View on San Francisco's Most Iconic Hills

Small-town charm shines in Paso Robles this summer.

Wedding Prep Cosmetic Treatments for San Francisco Brides and Grooms

The California Delta Community Drawing Bay Area Homebuyers to the Water