DataSift Curation Engine Aims for Relevance in Real-time

As I have said many times previously, if 2009 was all about the hype of Real-time, the future is all about capturing Relevance in real-time. Datasift has partnered with Twitter to get the full Twitter firehose and is building a platform to enable curation and filtering in real-time.Datasift

An introductory video about Datasift was posted in their first blog post, which didn’t reveal much about how the platform works. Now, uber-geek Robert Scoble has posted a video of an extensive discussion with Datasift’s founder, Nick Halstead.

Robert Scoble with Datasift founder Nick Halstead

This post is a summary of Datasift as discussed above concluding with my own thoughts.

The Basics

Twitter’s firehose at present has around 800 tweets/sec, or 70 million tweets/day. Datasift can filter this firehose using over 20 variables. Examples of these variables include:

  • Profile information like name, location, bio, number of follows, followers, lists, etc.
  • Text and language of tweets
  • Geo-location of tweets
  • Verified users
  • Source of tweets – web, Seesmic, TweetDeck, etc.
  • Number of Retweets
  • Whether tweet contains a hyperlink

Datasift is a rules-based engine that can filter this firehose using thousands of complex rules and provide a filtered stream in real-time within milliseconds. It is built using a Service Oriented Architecture and has an API.

The Rules

Rules can comprise of any combination of filters using the above variables. Rules can be combined and merged, or added and subtracted, into a single new rule. Stream outputs from Datasift using such rules can become columns in Twitter clients like TweetDeck.

Here are a few examples of how rules can be used:

  • Show me tweets containing “google” from users who don’t have “social media” in their bio, and who have more than 500 followers.
  • Show me tweets from my curated Twitter list of tech brands that have more than 100 Retweets.
  • Show me tweets originating from within a radius of 5 miles from the location of XYZ Conference that don’t have swear words, irrespective of whether their tweets contain the hashtag for the conference.
  • Show me tweets originating from Starbucks shops around the world, of users who are “Verified Accounts”, irrespective of what they’re about.

Datasift’s website is intended as a community website for curators and developers to collaboratively work on developing these rules. You can leverage rules created by others to avoid duplication of effort. Rules are classified with tags, and Datasift provides search, ranking and trending for easier discoverability of rules.

Partnerships for Influence Tracking and Sentiment Analysis

Datasift has partnered with PeerIndex and Klout to enable filtering using their influence and authority scores. It has also partnered with a firm for real-time sentiment analysis.

Thus, any of the above rules can be filtered further using such scores, and a stream of tweets with negative sentiment about a brand or product, combined with any other rules, can be monitored in real-time.

Alerts and Analytics

For esoteric rules that may provide a result infrequently, alerts can be set up. The example discussed is of any politicians from a Twitter list tweeting the word “scandal”. Developers can send these alerts as email, SMS, or notifications on smartphones.

The resulting streams from all rules applied by the engine are stored by Datasift. This data can be extracted, segmented, and analyzed later. For example, this can be used to track the performance of social media campaigns.

Relevance Filtering of Links

Datasift can use TweetMeme and other databases to check the links in tweets, and determine whether they are relevant to a specific topic. Not much details on how this is achieved, but apparently, Nick says that all sites are already classified into different subjects by Tweetmeme and other such databases.

Blekko-style Twitter Search

Datasift has developed a prototype of Twitter search along the lines of Blekko’s slashtags. Thus, along with your query text, you can use filters such as “/nolinks” to get tweets without links, or “/California” to get tweets originating from CA.

RSS Feeds

Compared to the massive volume of the Twitter firehose, the volume of RSS is minimal. Datasift plans to have their own PubSubHubbub server. Developers and third-parties can plugin any RSS feeds and use Datasift’s filtering rules to get an output feed.

Revenue Model

One option is free access to the stream with in-stream ads. Ads will be tailored and designed for the target form factor – desktop/mobile/tablet/etc.

Second option is selling data B2B for developers and brand companies, charged by volume of data consumed.

Prospective Partners

Datasift is seeking to work with startups like Flipboard, who are creating new ways for curated content consumption. This can also include any of the startups focusing on Relevance, such as TwitterTimes or Paperli.

My Thoughts

When I compared approaches to filtering information for relevance, I had suggested that the service most likely to succeed would be the one that supports multiple approaches and platforms. We can easily see that Datasift supports all platforms and several approaches like crowdsourced filtering, influence filtering, location filtering, etc. It is easily the most powerful relevance filtering engine I have seen yet.

The market of end-users for curated real-time content is at present unknown. Startups involved in creating pleasant experiences for consuming content have yet to find a monetization strategy. The degree of Datasift’s success from an end-user perspective is largely dependent on:

  • The creativity of developers and curators to create compelling experiences, and
  • How the monetization strategies of presentation apps fare and how Datasift is able to work with them

Nevertheless, with the amount of content being created online growing exponentially, curation and filtering will eventually become necessities for any social media client. It is just a matter of time.

I also see a bright future on the B2B front. By partnering with influence and authority tracking companies, combined with sentiment analysis, Datasift may already be a compelling choice for brand monitoring and social media reputation tracking.

Lastly, thanks to Robert Scoble and Nick Halstead for the interesting interview.

This entry was posted in Social Web and tagged , , , , . Bookmark the permalink.
  • Pingback: Datasift – Realtime Twitter Query & Curation for Developers | sull is vocally active()

  • http://www.victusspiritus.com/ Mark Essel

    Datasift sounds like a valuable search filter for real time content. Could there be conflict with twitter’s own search functionality?

    As a potential user, I’m looking forward to developing my own filters with Datasifts access and tools. Hopefully we can restrict filters to lists, a useful filter for searching. I don’t necessarily care about a person’s Klout or external companies measure. I care about information from the folks I trust and understand (conversationlist by Kevin Marshal is more relevant to me than overall influence, engagement).

    Is there an open format for filtering of updates that anyone can code to (i.e. self hosted filtering servers), or is the idea to have an api like twitter’s api and restrict or sell access?

    Best of luck to TweetMeme and team!

  • http://www.skepticgeek.com Mahendra

    Mark,

    I also wondered about conflicts with Twitter, but couldn’t write anything about it as I don’t know anything about their biz relationship.

    I am not sure of exact semantics of filters for Twitter Lists, but I imagine it must be possible.

    Nick did mention that the search portal would be open source, but about the API, I think it will be restricted. But, I am not the person you should be asking this :) You should join their alpha which is open to developers right now.

  • Pingback: Cool Breezes, the Messenger of Change, and Fantastic Network Challenges | Victus Spiritus()

  • http://twitter.com/AnthonyGadgetX AnthonyGadgetX

    http://bit.ly/dsWLqS

    Datasift looks awesome. Linked, a visualization I thought might work well with it, along with Kinect from Microsoft.

  • Pingback: Mapping Startups & Services Filtering For Relevance In A Matrix – Mahendra Palsule - MediaNama()

  • Pingback: Marketing via Aggregation, Filtering and Curation – Tools and Resources()