Comparing Approaches to Information Filtering for Relevance

In a previous post, we looked at the big shift From Numbers To Relevance. There are dozens of apps/sites that are focusing on filtering information today, but which of them will succeed?

To attempt to answer that question, let’s first look at the different approaches employed by such apps/sites today in the search for Relevance. This is a topic that is usually the subject of scholarly research papers in academia; this is only a layman’s overview.

The different approaches I observe are:

  • Algorithmic Filtering
  • Filtering Based on Social Graph
  • Human Filtering
  • Crowdsourced Filtering
  • Shared Sources Filtering (Meta)
  • Influence Filtering
  • Social Search
  • Location Filtering

Algorithmic Filtering

If you tell us what you want or like, our software can show you what you will like.

Google Suggest

The predominant use of algorithmic filtering is in web search, where Google has dominated and driven the web economy for the past two decades. You search for something and Google’s search algorithm filters billions of web pages to find the most relevant results.

Google also uses algorithmic filtering to suggest items in Google Reader’s “Sort by Magic” feature.

Pros: Highest relevance when searching for information.
Cons: No serendipity. Only useful for goal-oriented task of search. No personalization (search engines typically unaware of demographic information).

Filtering Based on Social Graph

If your friends like it, you’ll probably like it too.

This is the dominant approach being used today by various apps and websites. For example, Facebook uses the EdgeRank formula to determine what to display in your news feed:

edgerankform2

The key driving factor is the affinity score between you and the source.

Google also uses this approach when recommending posts in Google Buzz.

Most of the apps listed in my previous post, as well as the new Digg, use this or a similar approach that employs your Twitter or Facebook friends to recommend items.

Pros: High serendipity. Helps being “in the know”, a socially cool factor. Higher personalization.
Cons: Relevance depends on social graph, which often is not optimized for relevance, as Kevin Anderson noted.

Human Filtering

I trust a specific person to share all of the good stuff I like to know.

Some people make it a habit to go through news items every day and share what they deem to be the most significant ones. Others begin relying on them as trusted news sources.

Pros: High serendipity. Easy to use. Quickly become part of social circle of an influencer.
Cons: Unreliable. Susceptible to preferences and agendas of other people.

Crowdsourced Filtering

Quickly see what’s most important to know.

TweetMeme, OneRiot, Digg, and many other social bookmarking services aggregate the actions of millions of people to surface the most popular services. Techmeme and MediaGazer add human curation to the aggregation of thousands of websites to surface the most important tech and media stories.

Pros: Be up-to-date with the most important/popular need-to-know information.
Cons: No personalization. Popular doesn’t always equate to relevant.

Shared Sources Filtering (Meta)

If you read from sources similar to someone else, you’ll probably like their other sources too.

Facebook uses this approach to suggest new Fan Pages that you may like because your friends like them. Google Reader also uses this to recommend new RSS feeds. Toluu also compares your subscribed RSS feeds with other users to help you discover new feeds.

GR Recommendations

Pros: Useful for discovering new sources in social networks.
Cons: Filters sources, not actual news items, hence limited in scope.

Influence Filtering

Only read what influential people are saying/sharing.

This approach uses influence scores of sources to filter the news feed. An example of this is HootSuite, which uses Klout to let you filter tweets according to their Klout scores.

Klout Filtering

Pros: Flexibility. High serendipity. Helps being “in-the-know”.
Cons: Influence metric is unreliable. Currently only available for real-time feeds like Twitter.

Social Search: Algorithms + Social Graph

Let your social circle find the most relevant results for you.

Social Search uses a combination of algorithms and social graph to find relevant results.

Social Search

Pros: High relevance. Combines goal-orientation of search with serendipity of social. Very useful for news items from recent past.
Cons: Requires searching. Lesser utility for fresh, real-time news.

Location Filtering

If we know where you are, we can help you find relevant results.

Location is a treasure trove for relevance. As the mobile web explodes, services that provide information about nearby businesses or friends are gaining increased adoption.

Pros: High relevance. Can be serendipitous with real life impact.
Cons: Privacy concerns. Limited in scope.

Conclusion: Which Approach is the Best?

None. Relevance is dependent on the requirements of an individual at a specific moment in time. These requirements change from time to time and from person to person. There is no killer approach to relevance.

Which app or service is likely to succeed? I think the following factors will make a difference:

  • Support for multiple approaches
  • Flexibility of degree of filtering
  • Number of Mobile Platforms supported
  • Next Step: What can you do with the info? (e.g. Siri lets you take actions)

What do you think? Are there other approaches that I missed? Which other factors matter?

This entry was posted in Social Web and tagged . Bookmark the permalink.
  • http://www.louisgray.com/live/ Louis Gray

    This is a very important and interesting topic! Nice work, Mahendra. Thanks for posting. Once my6sense debuts for Android, I look forward to getting your experience with that. There simply is no better tool for hyperpersonalization and getting this done right.

  • http://www.skepticgeek.com Mahendra

    Thank you, Louis. I find this topic very interesting personally, and am eagerly looking forward to my6sense on Android.

  • Henchan

    I profer another category: Subjective
    A subjective filter filters information according to preferences or interests that you've explicitly defined. Alternatively, interests may be implicit, based on your previous activity such as reading an email.
    Gravity allows explicit subscription to interest categories, whilst my6sense is an example of implicit, subjective filtering.

  • eriklumer

    Nice post Mahendra. At Cascaad, we have taken the approach to indeed combine multiple methods to personalize access to social streams:
    - social graph based filtering
    - personalization based on your implicit (learned) profile of interest
    - delivery of a custom stream according to your explicitly tracked topics of interest

    An important distinction, supported by this approach, is between filtering of the sources you explicitly follow and discovery of content from sources that you have not subscribed to. You should check these features on the new Cascaad web app when you get a chance.

  • YannR

    Mahendra, I keep thinking about information 'resonance' these days and this post is really interesting. You've laid out a nice foundation to understand the 'relevance' parameters which rides along side resonance.

  • http://www.skepticgeek.com Mahendra

    Thank you. I considered subjective filtering as a feature: personalization that I used while assessing pros and cons. But it can also be considered as an approach in itself, as you rightly point out.

  • http://www.skepticgeek.com Mahendra

    Thanks for the update on Cascaad, Erik!

  • Pingback: FlipBoard: Just a tipping point or a real game changer? « Extanz – PR 2.0 and Inbound Marketing()

  • http://www.victusspiritus.com/ Mark Essel

    Finally caught up and finished this post, realizing I see a familiar face :D.
    Good luck Google Alerts “seeing” through the information within an embedded image.

    You covered my thoughts on relevance, and how it shifts with the moment. The most functional relevance I've found is reliable people that share my interests. It's quite potent having other humans observing and filtering information. Super human filters is tough to out do. The only missing element is customization or post filtering. So aggregate folks I generally appreciate content sharing from, and then real time search it for topics I'm directly interested in.

    I commented on Chris Dixon's blog today on a related topic. He, Caterina and the Hunch team are going after relevance purely from an abduction reasoning approach, where folks are clustered by choice alone and not by any social ties.

  • http://www.skepticgeek.com Mahendra

    The rate at which information explosion is occurring is of a magnitude several times higher than the rate at which super-human filters' brains are evolving. There's no way forward without algorithmic/computational integration, in my opinion.

    Yeah, Dixon's post came later after I posted this. What they're doing with Hunch is very interesting. From the above approaches, I'd like to think of Hunch as a Personalized Algorithmic approach. :)

  • Pingback: Can Blekko be a Disruptor in Search? by @ScepticGeek()

  • Pingback: The Filtering For Relevance Matrix (FORMAT) by @ScepticGeek()

  • Pingback: Real-Time News Curation – The Complete Guide Part 2: Aggregation Is Not Curation | Jobs in Austin Texas()

  • http://twitter.com/refynr refynr

    At http://refynr.com, we’re taking a different approach and letting the user decide explicitly what to see in his or her stream by making Keyword Lists. Only tweets (and eventually Facebook posts) will show up in your stream if, and only if, they are found in your Keyword Lists. Refynr applies this filtering to your Twitter home stream so you don’t get the entire Twitter world, just people you already follow. I’ve been wanting this for a long time, but the Twitter Search API doesn’t even support it, so I decided to build it myself. Over time, refynr can collect metrics and display the most popular keywords for the users to consider, but ultimately I want to keep the control in the hands of the users, and not some mythical, magic bot that tries to guess what the users want to see in his or her stream…

    We’ll see if this idea takes off or not, but either way I will use it :)

    – Aaron Longnion

  • http://www.skepticgeek.com Mahendra

    Aaron,

    Thanks for sharing the info.

  • Anonymous

    http://winnowtag.org lets you use the “personalized algorithmic approach” (Mahendra’s phrase in comment below). You provide examples for a tag representing each topic. The tags then continuously identify items on specific topics across a very large body of content.

    winnowTag.org downloads and tags 7,500 feeds daily and keeps the items for three months, thus currently has about 700,000 items on a huge variety of topics. Here are a couple of illustrative tags:

    entomology: http://winnowtag.org/#mode=all&tag_ids=468

    space: http://winnowtag.org/#mode=all&tag_ids=11

    A higher number at the left of an item means the Winnow classifier is more certain that item is a correct match for the selected tag. winnowtag.org features are explained in the Help tab of http://doc.winnowtag.org, and http://doc.winnowtag.org/open-source has info on Winnow.

    We’re very interested in feedback on winnowTag and other possible applications of Winnow content recommendation (which is available as open source).

  • Pingback: Filtering for Relevance with my6sense by @ScepticGeek()

  • Pingback: Mapping Startups & Services Filtering For Relevance In A Matrix – Mahendra Palsule - MediaNama()

  • Pingback: An Algorithm to Measure “User Quality” on Quora « @Strategyist()

  • Pingback: Google Reader+ And Identity vs. Personas by @ScepticGeek()