DataSift Curation Engine Aims for Relevance in Real-time

As I have said many times previously, if 2009 was all about the hype of Real-time, the future is all about capturing Relevance in real-time. Datasift has partnered with Twitter to get the full Twitter firehose and is building a platform to enable curation and filtering in real-time.Datasift

An introductory video about Datasift was posted in their first blog post, which didn’t reveal much about how the platform works. Now, uber-geek Robert Scoble has posted a video of an extensive discussion with Datasift’s founder, Nick Halstead.

Robert Scoble with Datasift founder Nick Halstead

This post is a summary of Datasift as discussed above concluding with my own thoughts.

The Basics

Twitter’s firehose at present has around 800 tweets/sec, or 70 million tweets/day. Datasift can filter this firehose using over 20 variables. Examples of these variables include:

  • Profile information like name, location, bio, number of follows, followers, lists, etc.
  • Text and language of tweets
  • Geo-location of tweets
  • Verified users
  • Source of tweets – web, Seesmic, TweetDeck, etc.
  • Number of Retweets
  • Whether tweet contains a hyperlink

Datasift is a rules-based engine that can filter this firehose using thousands of complex rules and provide a filtered stream in real-time within milliseconds. It is built using a Service Oriented Architecture and has an API.

The Rules

Rules can comprise of any combination of filters using the above variables. Rules can be combined and merged, or added and subtracted, into a single new rule. Stream outputs from Datasift using such rules can become columns in Twitter clients like TweetDeck.

Here are a few examples of how rules can be used:

  • Show me tweets containing “google” from users who don’t have “social media” in their bio, and who have more than 500 followers.
  • Show me tweets from my curated Twitter list of tech brands that have more than 100 Retweets.
  • Show me tweets originating from within a radius of 5 miles from the location of XYZ Conference that don’t have swear words, irrespective of whether their tweets contain the hashtag for the conference.
  • Show me tweets originating from Starbucks shops around the world, of users who are “Verified Accounts”, irrespective of what they’re about.

Datasift’s website is intended as a community website for curators and developers to collaboratively work on developing these rules. You can leverage rules created by others to avoid duplication of effort. Rules are classified with tags, and Datasift provides search, ranking and trending for easier discoverability of rules.

Partnerships for Influence Tracking and Sentiment Analysis

Datasift has partnered with PeerIndex and Klout to enable filtering using their influence and authority scores. It has also partnered with a firm for real-time sentiment analysis.

Thus, any of the above rules can be filtered further using such scores, and a stream of tweets with negative sentiment about a brand or product, combined with any other rules, can be monitored in real-time.

Alerts and Analytics

For esoteric rules that may provide a result infrequently, alerts can be set up. The example discussed is of any politicians from a Twitter list tweeting the word “scandal”. Developers can send these alerts as email, SMS, or notifications on smartphones.

The resulting streams from all rules applied by the engine are stored by Datasift. This data can be extracted, segmented, and analyzed later. For example, this can be used to track the performance of social media campaigns.

Relevance Filtering of Links

Datasift can use TweetMeme and other databases to check the links in tweets, and determine whether they are relevant to a specific topic. Not much details on how this is achieved, but apparently, Nick says that all sites are already classified into different subjects by Tweetmeme and other such databases.

Blekko-style Twitter Search

Datasift has developed a prototype of Twitter search along the lines of Blekko’s slashtags. Thus, along with your query text, you can use filters such as “/nolinks” to get tweets without links, or “/California” to get tweets originating from CA.

RSS Feeds

Compared to the massive volume of the Twitter firehose, the volume of RSS is minimal. Datasift plans to have their own PubSubHubbub server. Developers and third-parties can plugin any RSS feeds and use Datasift’s filtering rules to get an output feed.

Revenue Model

One option is free access to the stream with in-stream ads. Ads will be tailored and designed for the target form factor – desktop/mobile/tablet/etc.

Second option is selling data B2B for developers and brand companies, charged by volume of data consumed.

Prospective Partners

Datasift is seeking to work with startups like Flipboard, who are creating new ways for curated content consumption. This can also include any of the startups focusing on Relevance, such as TwitterTimes or Paperli.

My Thoughts

When I compared approaches to filtering information for relevance, I had suggested that the service most likely to succeed would be the one that supports multiple approaches and platforms. We can easily see that Datasift supports all platforms and several approaches like crowdsourced filtering, influence filtering, location filtering, etc. It is easily the most powerful relevance filtering engine I have seen yet.

The market of end-users for curated real-time content is at present unknown. Startups involved in creating pleasant experiences for consuming content have yet to find a monetization strategy. The degree of Datasift’s success from an end-user perspective is largely dependent on:

  • The creativity of developers and curators to create compelling experiences, and
  • How the monetization strategies of presentation apps fare and how Datasift is able to work with them

Nevertheless, with the amount of content being created online growing exponentially, curation and filtering will eventually become necessities for any social media client. It is just a matter of time.

I also see a bright future on the B2B front. By partnering with influence and authority tracking companies, combined with sentiment analysis, Datasift may already be a compelling choice for brand monitoring and social media reputation tracking.

Lastly, thanks to Robert Scoble and Nick Halstead for the interesting interview.

Tagged with:
 

5 Suggestions for Twitter’s Whom To Follow

Here are 5 suggestions for Twitter’s “Who To Follow” feature, that I have seen being mentioned in the Twitterverse:

  1. Avoid users who have set tweets as Private
  2. Avoid users who haven’t tweeted for past 15 days or have less than 10 tweets overall
  3. Avoid users I have added to Lists
  4. Avoid famous celebrities everyone knows
  5. Avoid users I have followed and unfollowed before

Twitter Who To Follow

These simple things will improve the effectiveness of Twitter’s suggestions greatly.

Tagged with:
 

What We Really Need: Discovering Whom To UnFollow

Twitter is rolling out a new feature to help you discover new people to follow:

The algorithms in this feature, built by our user relevance team, suggest people you don’t currently follow that you may find interesting. The suggestions are based on several factors, including people you follow and the people they follow.

This is a very welcome move by Twitter. TechCrunch says they’re building a Social Graph, while VentureBeat suggests a PeopleRank algorithm powering these suggestions.

The problem? Twitter badly needs a Matt Cutts.

Active Users

Here are stats on number of tweets by Twitter Users by RJMetrics from Jan 2010:

updatedistribution

  • 80% of all Twitter users have tweeted fewer than 10 times.

That means only 20% are active users.

The 2009 Annual Report from Barracuda Labs independently confirms these findings.

  • 34% of Twitter users have no tweets
  • 73% of users have less than 10 tweets

Spam Accounts

Now, from the remaining 20% of “active Twitter users”, how many users are spam?

According to TwitSweeper in March 2010: 5%.

These are accounts who tweet "make money fast online!", "multiple sources of passive income", "view my naked pics!", etc.

That leaves 15% of Twitter users who are real and may be considered worth following.

Why This Is A Problem

If Twitter is trying to build a meaningful, relevant social graph, they have to clean up first.

Twitter’s PeopleRank faces the same challenge as Google’s PageRank: Blackhat SEO. These spam accounts are followed by each other and by other fake accounts – all to provide a semblance of a active social user graph and avoid algorithmic detection. These are virtually indistinguishable from real users and will become part of the suggested users ecosystem.

How many times do we encounter spam accounts on Facebook? How many times do we see spam results in the first page of Google search results? In contrast, how many times do we get @replies from spammers on Twitter?

A contaminated social graph or PeopleRank system is harmful to Twitter from an investor, user, and advertiser point of view. It will be great if Twitter is able to suggest whom to unfollow and get rid of all these inactive, fake, and spam accounts.

Tagged with:
 

How I Live and Breathe Twitter

This is a companion post to How I Live and Breathe Google Reader. A few people have asked for some of my stats, so I am sharing them too.

Twitter Profile

Objectives behind using Twitter

Different people use Twitter for different purposes. My objectives are:

  • Get fresh tech news as fast as possible.
  • Learn what leading tech experts and analysts are reading and understand their opinions on current tech topics
  • Share fresh tech news and my views
  • Make a few friends and have fun.

I use @ScepticGeek as my professional account for my key goals, and @Palsule as a personal account for the last. The remainder of this post focuses on my @ScepticGeek account.

I think defining your objective behind using any network or service is important as it helps define the kind of social graph and relationships you build using it.

My Sharing Policy

I share tech news and opinion pieces about current tech trends. I try my best to keep my Twitter feed a relevant signal. I don’t tweet all of my friends’ blog posts just because they’re my friends.

I realize this may be construed as anti-social behavior among social media experts, but as I explained in Role of Curation in the Attention Economy, I don’t wish to increase noise for my followers and am more interested in curating my Twitter feed to keep it relevant for my followers.

My Follow Policy

My objectives drive my follow policy. I typically follow people who break tech news. I also follow people who may not break news themselves, but who constantly live in the breaking tech news world and are always sharing fresh stuff.

The key principle behind my Follow Policy is Relevance. Thus, I do not follow all experts and people I admire. A person’s greatness isn’t always proportional to the relevance of their Twitter feed to my objectives.

Because I follow relatively few people, some folks assume that I only follow “big shots”. That simply isn’t true. Sometimes, I also follow people who @Reply me with interesting comments on what I share.

I am constantly following new people and unfollowing some of them. This is a continuous process and I am brutal in curating my following list. Sometimes, I use the following benchmarks in deciding whether to unfollow someone:

  • Have I liked/retweeted any tweet or article shared by that person within the last month?
  • Can I explain to myself why I follow a person given my objectives?

Lastly, I don’t care if the people I choose to follow, follow me back or not.

What “Follow” Means To Me

To me, a Follow is more than a social gesture. A Follow means that I try my best to read tweets, read the articles being shared, listen, answer questions when I can, offer help where possible.

Twitter as a Conversation Platform

A great many people complain that Twitter is not suitable for having conversations.

On the other hand, a great many people I admire and respect, from top tech bloggers and editors of leading tech blogs, to VCs and media/journalism experts, unaware of this inherent limitation of Twitter, continue to use it for meaningful conversations. So do I.

My Reply Policy

I make an effort to respond to each and every @Reply, as long as it is being made in good spirit and doesn’t reek of fanboyism.

Attribution Policy

I try to attribute my sources as far as possible, as I described earlier in Thanksgiving via Attribution.

On Automated Tweets

From Google Buzz + Reader + Twitter + Facebook = Noise: “When you auto-share, you’re not a human on that network, you turn into a bot. Bots are what we call spam.”

I neither use any tool that automates tweets, nor do I typically follow those who do.

Tweet Format

I try to make each tweet meaningful for my followers. My usual format is:

“Original Post Title” <Link> by “Author/Blog” /via @source /my comments if any

Frequently, the original blog title is either too sensationalistic or entirely misleading or link-baiting. In such cases, I dispense with the original title entirely substituting it with my own.

My URL Shortener

I use Bit.ly as my preferred URL shortener. Here are my Bit.ly stats for the past month:

My Bit.ly Stats Jul 2010

My monthly stats typically range between 500 to 3000.

Though Twitter mostly works in real-time and I live in India, it is interesting that my followers are primarily based in the United States and EU.

Klout Score and Classification

For those who’re interested, my Klout Score varies from 58 to 62.

My Klout Score

According to Klout, I am a “Thought Leader”:

Klout Classification

I usually avoid ending my blog posts with the customary “You can follow me on xyz here” plea, but I will make an exception for this post. So if you’re interested in tech news, do follow @ScepticGeek on Twitter! :)

Tagged with:
 

Matrix: Google Buzz, Twitter Chirp, Facebook F8

This is how the events launching new social network features compared:

Buzz-Chrip-F8

Pretty self-explanatory.

Tagged with:
 

On the first day of the Chirp developer conference, Twitter announced “Annotations”:

The feature will allow developers to “add any arbitrary metadata to any tweet in the system.” So, just like a tweet can today be transmitted along with information about which other tweet it was in reply to, or what location it came from, or what application it was created on, now Twitter will allow developers to make up new stuff. Twitter is looking to see how developers use Annotations before it creates any sort of taxonomy for them, Sarver said.

Creative Possibilities

What can such metadata include? Apart from the obvious ones, let us consider possibilities:

  • The number of retweets, faves, could be metadatatwitter-chirp
  • Apps could use plugins to add an “influence-rank” to all your tweets, like your Klout score
  • Apps could let you specify your Google Profile URL or Facebook URL and add that as metadata to your tweets
  • Apps may move all links from your tweets to the metadata section, leaving you the full 140 characters for plain text
  • Apps may move all media attachment links (pics/videos) to the metadata section
  • Number of your followers, number of lists you are a member of, can be metadata for your tweets

Using these, apps can come up with interesting filters that increase relevance for my Twitter experience:

  • Show me tweets from users above an influence-rank threshold
  • Show me tweets from users who have at least x followers or x list memberships
  • Show me tweets from a specific geo-location
  • Only show me tweets that contain links or pics or videos
  • There can be interesting mashups and visualizations based on such metadata.

As apparent from some examples from the top-of-my-head, there are lots of creative possibilities.

The Problem

Annotations will be app-specific. Annotations devised by Tweetdeck will be incomprehensible to Seesmic and vice-versa. There is potential for vast fragmentation here, in the absence of a uniform taxonomy defined by Twitter.

I expect Twitter will wait to see what developers come up with and then absorb the best innovations in its native implementations. In the meantime, Annotations will increase “stickiness” of specific Twitter apps and may be used to lock-in users to certain apps.

Is this a good move on the part of Twitter? I don’t know. But in the absence of guidance from Twitter, this is a free-for-all that will hinder seamless interoperability between different Twitter clients, which may not be good for the ecosystem as a whole.

Tagged with:
 

Google Buzz + Reader + Twitter + Facebook = Noise

I’m having a hard time deciding whom to follow on which network with duplicate shares everywhere. The problem is compounded further by folks who auto-share from one network to another. There is no value in following people who share the same thing on Reader, Buzz, Twitter, Facebook, and so on. Duplication simply amplifies noise and reduces signal.

This is a real problem with social media today. Everyone wants maximum likes, shares, retweets on each and every thing they share. Their hope, understandably, is that each morsel they throw into social media becomes a feast on which everyone will drool.

Well, count me out. If someone is auto-feeding the same thing on all networks, it doesn’t add any value to me to follow them on all networks. Especially if they are not engaging in conversation where their content is landing.

I have written before about why I do not use auto-tweeting tools like Reader2Twitter, because I take as much effort as possible to attribute my sources. If you are using such tools, it makes sense to auto-tweet to a different Twitter account, like some folks do. This gives your followers the choice whether to follow you on Reader or Twitter.

Enter Buzz and FriendFeed and Facebook. Each of these is capable of pulling items from multiple sources for each person. FriendFeed can further be imported into Facebook and Buzz. This is not just aggregation, it is super-aggregation or aggregation-squared. This amplifies signals to such enormous proportions that all this noise is deafening.

Each of my shares on Twitter, Reader, and Facebook are hand-picked and manual. It takes extra effort but I believe it adds value to those who follow me. I am happy not being a social media superstar with thousands of followers if even a single person likes a single share of mine in a day. My value is not in the number of retweets, number of likes, etc., but in the feedback I get from even a single @reply or comment.

Neither of the companies behind each of these social networks are working with each other to design better filters for all of us. Each simply wants us to use them exclusively. There lies the problem. We hop on to each new social network bandwagon, immediately discover tools that allow us to auto-share and auto-propagate our shared content down stream, up stream, cross stream, life stream, etc., ultimately drowning our followers in the flood.

I am skeptic this problem will go away soon. As a curator, this is a challenge. The only way I see to successfully filter the signal out of this noise is to be brutal in curating sources. Auto-sharers, auto-tweeters, auto-feeders, or whatever these tools are called, will be the first on my radar as likely candidates to be unfollowed.

As a follower, I am a human. When you auto-share, you’re not a human on that network, you turn into a bot. Bots are what we call spam.

Tagged with:
 

New RTs with @Replies and Localized Trending Topics

This post is a collection of small observations that may not be individually “post-worthy”.

New Style ReTweets with @Replies

We all know that @Replies to you are visible only in the home timeline of those following both you and the sender. Thus you will not see the following tweet unless you were following both @ScepticGeek and @LayeredByte:

ReplyTweet Example

Now, if I do an old style ReTweet by prefixing it with RT as below, my ReTweet is visible to everyone who follows me, even if they don’t follow @LayeredByte.

Old ReTweet Example

But what if I do a new style ReTweet? A new style ReTweet will not prefix anything, and is effectively the same as an @Reply. The question in my mind was:

Are new style ReTweets of @Replies visible to everyone who follows you (and not only to those following both)?

Some quick searching on Google did not yield an answer. Twitter’s help on @Replies and ReTweets does not clarify this, nor does Evan William’s post explaining organic RTs. So with the help of my colleague @MadLid, I performed a quick test.

I retweeted her @Reply to me from my @ScepticGeek account, and checked if the new style ReTweet appeared in my @Palsule account from which I was not following her:

New Style Retweet Reply

Voilà! Even if @Palsule is not following @MadLid, her @Reply to @ScepticGeek appeared in @Palsule’s home timeline when @ScepticGeek did a new style ReTweet of her @Reply. :)

If you’re wondering “what’s the big deal?”, there is none. This is what geeks like me who like to experiment and pay attention to detail do. I did not find it documented anywhere, hence doing it here.

Note that this is how RTs should work, and Twitter has implemented them in the correct way. When you ReTweet, you want all your followers to see it, irrespective of whether they’re following the original tweeter or not. Thus, in a way, I am also applauding Twitter’s developers for bypassing the @Reply visibility restriction when they implemented organic RTs.

I also find it amazing that people are already using what is actually a “feature”, without realizing it.

Localized Trending Topics

Last week, Twitter started rolling out localized trends. On November 9th last year, Twitter announced its Trends API. Here is what I had tweeted hours before that happened, while it was still November 8th in the US:

Localized Trending Topics 

Disclosure Policy

Just a note that I have added a disclosure policy on the blog.

Tagged with:
 

At the start of this year, Seesmic bought Ping.fm enabling status updates across 50 social networks. Mark Hopkins elaborated on why this is a threat to Twitter.

Scobleizer talks about Twitter’s declining traffic and offers suggestions for improvement, which people commenting on the post say would turn Twitter into FriendFeed/Facebook.

Seesmic’s Ping.fm acquisition had led me to wonder if that makes it a perfect candidate for a Twitter acquisition. Marshall Kirkpatrick seemed to agree.

MarshallK Retweet

Would it make sense for Twitter to acquire Seesmic and Ping.fm?

Does Twitter want to build its own social network and fight against Facebook? Contrary to what you might think, Evan Williams says Twitter is not a social network.

Twitter’s strategy is to be the “Pulse of the Planet”. What better way to become that pulse than be the conduit that people use across 50 social networks? This would bolster Jack Dorsey’s vision of Twitter’s success as Twitter becoming infrastructure.

When the goal of a service is to become the nervous system of the real-time web, the traffic to its website doesn’t matter. The pulse of the online world lies in status updates people make on various social networks. I am sure that Seesmic, with Ping.fm’s half a million users, looks a very attractive option for Twitter to grab that pulse.

The scenario can look gloomy for the open web, with the social graph of users in the hands of Facebook, and real-time pulse in the hands of Twitter.

Tagged with:
 

Google Reader vs. Twitter for Discovery and Sharing

I started using Google Reader and Twitter for discovering and sharing content at roughly the same time in April last year. I share and tweet almost exactly the same content. After about 8 months, I have over 1100 followers on Twitter vs. 133 on Google Reader.

How do these two stack against each other from a discovery and sharing perspective? As a newcomer to the social web, my experience can be illustrative of any new user of these services.

A Glace At Follower Stats

In less than a year, I have 1100+ followers and am on 130+ Twitter Lists. Neither did the #FollowFriday or @MrTweet recommendations I received lead to any increase in followers, nor did being a Techmeme Editor lead to any surge in followers. My Twitter following has increased organically, steadily, without any positive disruptive event, like a recommendation by an influencer in any blog post or tweet.

TwitterCounter ScepticGeek Followers

On the other hand, I have promoted my Google Reader shares on my blog in a sidebar widget, tweeted and written about Google Reader often, commented on other blog posts discussing Google Reader, and shared my Reader feed on FriendFeed earlier. The only positive disruptive event that increased my Google Reader following was when I was recommended by Holden on TechGeist (the blog is no longer active).

Google Reader Sharing Stats

Geographical Perspective

The people I follow on both networks are in US/EU. Even if I live in India, most of the people following me on both networks are also from US/EU. I overcome the local limits of real-time by using Google Reader for discovery. One may expect my Twitter following to be more local and my Google Reader following to be global, but interestingly, this is not the case.

Twitter Follower Geography

I can tell from my engagement and Retweets on Twitter that my audience is largely global and not local.

The Discovery Angle

There are two aspects of discovery: content and people. It is clear that Google Reader remains a great tool for discovery of content, especially for a non-US/EU person like me. However, Google Reader sucks at discovering great people to follow.

It is easier to find the Twitter profile of a person with a search on Google, than finding his or her Google Profile. FriendFeed remains the best bet for finding Google Reader profiles.

You can easily crawl an influencer’s network on Twitter and use Twitter Lists for discovering great people to follow. Ever tried finding out who an influencer is following on Google Reader?

The Sharing Angle

My shares on Twitter get retweeted and often lead to conversation. Some kind folks practice thanksgiving via attribution when they tweet content they discovered via my Google Reader shares. Both these lead to psychological payback on Twitter in terms of increased followers, mentions, and list memberships.

On Google Reader, my shares disappear into a black hole. I never know when my share was re-shared by others. These re-shares also appear on other user’s FriendFeed and Twitter accounts without any attribution to the curator. Sharing on Google Reader has virtually zero psychological payback, unless you are an established tech celebrity.

Closing Thoughts

Google Reader was designed as a personal RSS feed reader and social features have been added as an after-thought. I was always a skeptic of claims that Google Reader will replace FriendFeed. Google lacks a social network of people, and prefers taking an algorithmic approach to social relevancy. There is no psychological payback for sharing on Google Reader because fundamentally, Google perceives you not as a person, but as a data element whose shares can be indexed and ranked. Is this a reflection of Google’s engineers lacking emotional intelligence or simply a technical limitation? That said, it still remains a great tool for discovery of content, because of RSS.

As a result of all this, I see the influencer-ordinary user pyramid on Google Reader remaining more or less the same in the years to come. There will be a few tech influencers who will get engagement and drive traffic via Google Reader, but its opaque approach to social networking will remain its Achilles’ Heel for ordinary users. This weakness has even given rise to parallel feed-based social networks like Toluu and PostRank. The fundamental problem of monetization of Google Reader also persists.

What this also means is apps and services that use RSS and relevancy algorithms for discovery and ease sharing of content to other social networks (Twitter & Facebook) are well-positioned to diminish Google Reader’s dominance of the feed reader market. Apps like LazyFeed, my6sense, and RSSOwl are some examples. In my opinion, it would be a good strategy for apps like Feedly to disassociate themselves from the Google Reader platform.

Twitter is a great tool for discovery of content, and its transparency makes it a unique tool for discovery of people. This means that the influencer pyramid on Twitter is constantly evolving, unlike Google Reader. Lastly, Twitter rules over Google Reader when it comes to payback for sharing.

Tagged with:
 

Switch to our mobile site