Diffbot, the "Visual Learning Robot," Helps Web Content Go Mobile
Diffbot is one of those applications (and companies) you probably are not even aware of when you use it, but that's not necessarily a problem for the company's co-founder and CEO Michael Tung.
That's because his product is a "visual learning robot," that hundreds of developers are using to translate web content into better mobile apps, and as such it stays pretty much under the hood.
"We've invented this visual ID algorithm," says Tung. One of our core insights is that the entire web can be classified down to 30 page types. There are product pages, event pages, news pages -- we can identify them visually with 99.999 accuracy."
Diffbot technology identifies each page's components, such as nav bars, footers, etc., as part of its identification process. Design standards are such that there is a high degree of similarity between the various page types grouped by category.
One customer using Diffbot at present is AOL's recently launched Editions, which is a personalized daily magazine for the tablet.
"Editions uses thousands of news sites," says Tung. "They send the urls to our servers, and our technology analyzes all of that content and extracts the headlines, bylines, and other key elements and assigns to a topic. When Diffbot analyzes the front page of Huffington Post, for example, it can identify what the top story is because it recognizes the features that define the top story there, and it knows what the topic is, also."
He continues, "it's not just a black box. It leverages all that human knowledge and the work of all those news editors out there."
A key factor driving Diffbot's growth is the problem mobile devices like the iPad have in interfacing with web content. "It's kind of a crappy experience," as Tung puts it. "And this is a huge opportunity for us. Because we extract that web data and make it easy for developers to use it in creating mobile apps."
For me, one of Diffbot's most surprising features is it works not only in English, but in some 250 languages, "because we leverage Wikipedia" (which publishes in all those languages.) "The ontology structure of data on Wikipedia allows us to use it as a training set" – that allows Diffbot to visually analyze a web page in any language by recognizing each page's tell-tale visual components.
The company, with a staff of five, is based in Palo Alto and was incubated by Stanford's StartX. Tung and his co-founder, Leith Abdulla, are Stanford grads. The company charges customers like AOL a licensing fee and a usage fee, and Tung says the startup is already profitable.
under Tech + Gadgets, aol, David Weir, diffbot, editions, leith abdulla, michael tung, Stanford, startx, Wikipedia
The Big Eat 2012: 100 Things to Try Before You Die
The Big Eat 2011: 100 Things to Try Before You Die
The Big Veg 2011: 50 Vegetarian (Or Vegan) Things to Eat Before You Die
Four Ways To Escape the Cold in Mexico
Jams We Love: Our Weekly Playlists
10 Best Dishes $10 in the Inner Sunset
Rise and Dine: A Guide to Brunch at SF's Best Restaurants
The Best Cheese in SF (Recommendations from Local Cheese Shops)
Refreshingly Unhip: The Best Vanilla Ice Cream in SF
The 20 Best Dishes Under $10 in the Tenderloin & Tendernob
Community Gardens Around the City
Horseback Riding Within 1.5 Hours of SF
Four Awesome Northern California Hot Springs
Refreshingly Unhip: SF's Old-School Pastrami Sandwiches
The 7 Best Carne Asada Burritos in San Francisco
The 10 Best Dishes Under $10 in the Outer Sunset
The 20 Best Dishes Under $10 in the Mission
The 10 Best Dishes Under $10 in Bernal Heights
The 10 Best Dishes Under $10 in the Lower Haight
The 10 Best Lunches in Union Square Under $10
Refreshingly Unhip: The Best Glazed Dougnuts in SF
Expert Advice on Parking in The City
- Be More Efficient in 2012 Thanks To 8 Local Startups
- Snip.it Building a Social Network Around In-Depth Curated Content
- Berkeley Dropout Darian Shirazi's Plan to Win the Hyperlocal Game with Fwix
- The Best Breaking News Site You Probably Don't Consider (A Breaking News Site)
- What Do You Think About SOPA and PIPA?






