Fascinating article in Cornell University's Chronicle Online, showing the
shifts in memes around a particular main story (in this case Sarah Pailin's run for Vice President). They note that:
The ideal, Kleinberg said, would be to track "memes," or ideas, through cyberspace, but deciding what an article is about is still a major challenge for computing. The researchers sidestepped that obstacle by tracking quotations that appear in news stories, since quotes remain fairly consistent even though the overall story may be presented in very different ways by different writers.
Even quotes may change slightly or "mutate" as they pass from one article to another, so the researchers developed an algorithm that could identify and group similar but slightly different phrases. In simple terms, the computer identified short phrases that were part of longer phrases, using those connections to create "phrase clusters." Then they tracked the volume of posts in each phrase cluster over time. In the August and September data they found threads rising and falling on a more or less weekly basis, with major peaks corresponding to the Democratic and Republican conventions, the "lipstick on a pig" discussion, rising concern over the financial crisis and discussions of a bailout plan.
Note that the blogosphere feeds of the mainstream media, in the main - which begs the question of
what happens if it kills its host....
The slow rise of a new story in the mainstream, the researchers suggest, results from imitation -- as more sites carried a story, other sites were more likely to pick it up. But the life of a story is limited, as new stories quickly push out the old. A mathematical model based on the interaction of imitation and recency predicted the pattern fairly well, the researchers said, while predictions based on either imitation or recency alone couldn't come close.
Watching how stories moved between mainstream media and blogs revealed a sharp dip and rise the researchers described as a "heartbeat." When a story first appears, there is a small rise in activity in both spheres; as mainstream activity increases, the proportion blogs contribute becomes small; but soon the blog activity shoots up, peaking an average of 2.5 hours after the mainstream peak. Almost all stories started in the mainstream. Only 3.5 percent of the stories tracked appeared first dominantly in the blogosphere and then moved to the mainstream.
Fascinating to us, anyway - about 2 years ago we
built a similar device for the BBC Innovation Labs (which was then accepted for further development) which does a very similar job, but we haven't had the time to do this level of analysis - broadly speaking though the results look familiar. We did one extra thing, which was track different news outlets to see how the story differed over time (the Memetic Difference Engine bit). That is in its own way as interesting as the development of a story.