Thursday 9 December 2010

Twitter, just show us the algorithm - please?

After all the various stories about the hashtag #wikileaks not appearing as a Twitter trending topic, and whether that was plain Twitter censorship, some magical (and not quantified nor explained) algorithm or the hand of God, I decided to take some measurements myself

Basis for that is Twuniverse, a collection of services that operate directly on top of the Twitter API [disclosure: Twuniverse is one of my subsidiaries]

I compared the appearance of the hashtag #wikileaks with that of #leweb, the Paris Internet event. Each (half) hour I measured the frequency, and the place of both tags as a Twitter Trending Topic (T3). Every time I took 1,000 tweets per hashtag, giving me in between 15 and 25-30 minutes worth of tweets, depending on the hashtag: that is 25%-50% of coverage

What I found out:
  • Justin Bieber does trend mega. Not as #justin, #bieber or #justinbieber, but just as justin bieber or either one of those words. What is mega? 1 to 3 tps (tweets per second), which would extrapolate to some maximum 250,000 tweets per day
  • It takes roughly 10 tweets per minute (tpm) to make it as a trending topic, double to triple that or even more when the US is fully awake coast-to-coast. No exact science there, sometimes topics make it in at the very bottom of the list with 15 tpm when the next is 30 to 35 tpm
  • There seems to be an algorithm that determines which topics appear T3, and which don't. I can tell you that #FB and #FF don't make it to T3, just as Justin Bieber doesn't (anymore)
  • There is also some kind of delay (interval) or history involved. Dreamforce, #df10, was getting 25-28 tpm at the end of today, well over the 15 tpm the last T3 got. But it didn't make it back into T3
  • According to Mashable, Twitter created their algorithm in May 2010 to 'get rid of Bieber'. What that algorithm is, or how it works, can't be found anywhere
What happened in 7.5 hours of monitoring both #leweb and #wikileaks (times are in UTC / GMT):

(Remember, tpm shows Tweets Per Minute, whereas tps shows Tweets Per Second. Twuniverse automatically switches to a lower unit of time as soon as that resolves to greater than one)

During all this time, #leweb was T3 number one, and #wikileaks wasn't even on the list. Here come the exceptions to that:

At 15:17 PM, #leweb has moved to T3 number 2, whereas before it was number one - I'm ignoring the fact that there was a Promoted topic at no. 1 all the time, and just pretending that wasn't there; Life's complicated enough as it is and it shouldn't influence this exercise at all to begin with

At 17:25 PM, #leweb has entirely disappeared from T3. For a Parisian event, #leweb lasted well beyond closure, but is now below 20 tpm

What was really surprising:

Meanwhile, it's clear that, given the volume, #wikileaks should have made it to T3 easily - but it didn't
Likewise, however, neither #FB, with 52 tpm, nor #FF, with 38 tpm, made it into T3 - nor does bieber with 2.4 tps

What would end all debates:

It's clear that there is an extra force or two active in the T3 sphere. Promoted topics are one, of course, and the algorithm is another - and maybe there are others as well.
There is one easy conclusion that is non-debatable, however: the system as it is, is working - meaning that there is code that runs and controls all this

I'm not interested in the workings of Promoted Topics - that's highly likely a fully manual process and fairly straightforward as well as non-disputed. But I really would like to know how the magical algorithm works, and I don't see any harm in getting to know how it does.
Where there's code, there's a defined algorithm - and Twitter would end all debates by just telling us what it is

So, Twitter, please come forward and play this out in the open - no need to hide, is there?

0 reacties:

Post a Comment

Thank you for sharing your thoughts! Copy your comment before signing in...