Tech community chatter is rife with blather about the latest technorati report from David Sifry. The stats that technorati collects are useful but they are not the kind of numbers that trumpet the Level! of! Interest! that many techie-type bloggers are raving about.
According to Sifry, the numbers show that 9% of all blogs are spam. This is an assuredly low estimate. Any number of relevant test searches can verify that. Try a search related to some popular tech topic and see if you don't get a return of 50 to 60% spam blogs or duplicitous phantom search sites in the top 100 items returned. If you get 20% bunkum, you're doing pretty well. The telling statistic is in the number of blogs that are updated within 3 months of initial tracking. Only 55% of all blogs are active after that time period. So forget the "there's more than 35 million blogs!" statistic.
What about the contention that 3.9 million bloggers are updating at least weekly? If, as I surmise, a spam or junk blog wants to be perceived as "relevant", they will be sure to update on a timely basis. My searches are turning up a preponderance of junk or auto generated nonsense across a range of engines. Services like PubSub have become inundated with splog traffic and are becoming increasingly ineffective. Even if all 3.9 million of the weekly updaters were legitimate humans posting original material the vast majority of that stuff is oriented toward either:
- banter ("yor hairdoo is kewl")
- rant ("politician so-and-so is worse than Hitler...")
- innuendo (vox populi gossip, see also "banter" above)
- repetition (check out all the "me too!" posts on tech.memeorandum.com, heck even this post may qualify...)
What statistics from the universe of collections cannot capture is relevance. For instance, it is purported that 60% of all baseball cards created -- ever -- have been created since 1999. What does that mean for collectors? Anything in that latter period is of virtually no significance whatsoever. Volume of content does not determine its value. The fact that technorati.com can slice and dice words is significant in establishing a type of filter for assessment of site content but it fails as a measure of quality. Blog posts highlighting how many blogs there may be succeed only in highlighting that metrics cannot trump intellectual assessment. Throw all the science of algorithms you want at a pile of words, the art of those words can only be assessed by sentient readers.
In the blogging world, most words posted are of absolutely no value whatsoever. Most blogging content makes even horoscopes look like significant literature. Pick 1000 blogs at random and verify for yourself whether there is anything worth monitoring on a regular basis. Your final list will differ from mine but I am confident that we would agree that 90% of the sites are completely useless. That would leave us with a subset of 100 blogs from which each of us would find no more than 10 that are worth following and only 1 of which merits a regular read. Only a bare fraction of all web sites that purport to be legitimate blogs contain useful, relevant, literate and cogent writing.
What about truth? Clearly a spam blog is not intended to reflect any kind of relative truth; its sole purpose is to deceive. Can an earnest writer be factually wrong and still be readable? Of course, but a reader's tolerance is only measured by a preponderence of perceived truth over a collection of writings. Not much takes place in the blog universe that passes for critical discussion. Individuals tend rather to smear than discourse, to vent rather than reflect. Good criticism is often received with the sensitivity of a 5 year old.
Most readers search out the comfort of an opinion that matches their own rather than to seek assurance in the continuum of reflected thought. That does not mean that we should not be willing to make immediate and profound value judgements regarding things like hate speech, nonsense, trivia or content devised to entrap. Rather, one should also assess voices -- thoughtful voices -- that reflect a range of opinion. If there was any sort of community inherent in the blog space, surely it would lead us to question, reason and discuss. Sadly, most of us are more familiar with a flame than a warm glow.

Comments
Splogs and Relevance
Hi Bradford:
Finding "relevance" in 25 millions blogs is not easy. I've found, however, that if you pose a query string that is narrow enough not to have you miss important information, the Signal/Noise ratio will increase for the former. At PubSub, we have extensive boolean logic built-in to the system (see: http://www.pubsub.com/booleanhelp.php) which is very useful in narrowing down queries.
Regarding splogs, we are diligently working on the issue. In fact, over the past few weeks, we have made a lot of strides in this area. Our users are very important to us, which is why we're attacking this issue with ferocity.
Regards
Steven Cohen
PubSub Concepts, Inc
scohen@pubsub.com
Add new comment