Mashups and Aggregation - a Boundary Line

Mashups are truly attractive.. they are made of Google Maps, Flickr, Google Videos and other attractive sites, which hold you on these sites for long time w/o boring you! Not a surprise that they hav been good internet hits..

..however.. as I've been working on building a mashup myself, I always find problems describing a good mashup example for a particular scenario.. confusion actually lies in its definition itself.. what is a mashup anyway? Is it mere aggregation of huge data on web? But then.. how large the data should be? And what kind of data? Is showing google and yahoo search results on a single page a mashup? If not, then why not?

Mashups actually came into limelight with Housingmaps.com, the site which you’d know why people appreciate after visiting its parent site craiglist.com. It was built on Google Maps as enabler. Two disparate sources of information but large, one is geographical details of places visually presentable and other is housing rent details, “mashed up” using address as the “key”, presented in useful manner, and guess what? People like it..

But what about simplyhired.com? it virtually incorporates almost all job sites in world and present details before you.. huge data sources but data are similar in nature (not disparate).. hence the common key is not uncommon rather, they are many.. but then, are they Mashups?

IMHO, they are not. They are mere aggregation. No matter how huge is data source, its essentially a collection of data instead of matching data with each other. Total data displayed surely increases, but the atomic size of data (job description or resume in case of simplyhired.com) does not increase! Take another case, kayak.com. It gives flight details from almost all airlines in world. Even a normal search would end up in 100s of flight details. But it mere stands as a aggregation of data from multiple site, it can not be said a true Mashup.

The original clarification stuck into my mind actually after going to answers.com, which is essentially an aggregation of definitions on a topic from multiple web dictionaries and encyclopedias around. I figured out that, this should not be called a Mashup. Actually, these cases become even tougher as they need a well defined Ontology Base in order to ensure terms are compared properly in case of data from different sources. For example, in case of kayak.com, fares from all airlines should be compared at one place and so are the codes for different cities and countries. In case of answers.com, verb definitions can’t be mixed with adjective or noun definitions. Unlike Mashups where only few properties (one in general) of data are matched or compared, ensuring similar properties are grouped in same basket is a challenge in case of raw data.

Mashups, go in one-to-one kind of mapping, where around the common key, you can map one data of a source to one data of another source. Here, the atomic size of data increases ( visually location of house address + rent details in case of housingmaps.com).. which in turn increases the size of total data displayed.. data are disparate in nature.. they have most of properties different except some common attributes (Location in case of craiglist.com and Google Maps)..

Mashups are attempt to integrate multiple dimensions of information around a similar topic, so that a complete information can be portrayed on web. How it is portrayed, I think only RIA people can answer them.. (think about scenarios where data mashed up aren’t two but four or five!!!)

2 comments:

  1. nice to read about newer developments in web-tech...

    ReplyDelete