Congrats to Tony Conrad and Sphere for landing a deal with Time Magazine. I like the idea of getting away from link cosmos and going with something like content semantic analysis. I have been a fan of Tony’s ideas since I saw him speak of his Bmodel at SXSW. Anytime you can land a distribution partner like Time when you have four employees, you are doing good business.

Susan Mernit made the following statement about Sphere’s new deal:

Very cool, but it makes me wonder why Sphere was successful in pulling this off when Technorati, Feedster, and probably every other start-up in blogosphere land has been pitching these guys (and every other big media outlet) since late 2004.

At Feedster, we did pull this off, and in fact, on a much larger scale than Time Magazine. In the end it comes down to the product and time will only tell if the mainstream is ready for a ‘Sphere It’ product. I can see this product being built out into a dashboard where an editor could have this tool on their backend, and have the ability to quickly go through and check the blogs they want around their articles. This could provide immediate ‘conversation on news’ around their story and then all links could open up to an additional, in this case Time Magazine, page that would be a whole page of results. Maybe tied into a dynamic blog community with other posts(headlines and excerpts), podcasts,etc. A huge channel of blog content that makes all parties involved(publisher, reader, blogger) happy.

Right now, as far as I can see, ‘Sphere It’ is great technology but it is not bringing Time any immediate additional page views. That is not a sustainable business model for Time, and not a good selling point for Sphere considering the last thing most old media properties want to do is have people jumping out of their walled gardens.(GoldenRule: old media has the money, they set the rules.)

Feedburner as a Meme Aggregator/Content Syndication Service

Another place a technology like ‘Sphere It’ could get interesting is with a partner like Feedburner. Feedburner will soon have 300,000 feeds in it’s index. No one has a deeper understanding of their index(publishers and their readers) and can give better on-demand analytics. So imagine if you have a content semantic tool like ‘Sphere It’, with a dynamic, spam free, and smart index like FeedBurner. And what if Feedburner allowed for people to upload their ‘blog icon’, develop an identity around their feed, ala ‘claim your blog’ on Technorati. (I think this is one issue that Dave Winer is starting to get worried about when he talks about the centralization of so many feeds, and maybe rightly so . But I don’t think we can blame Feedburner for being the brilliant technology and marketing company that put the ‘Really Simple Syndication’ in Feeds.) This new smart index could follow memes and could provide ‘conservation around news’ on demand by popularity/relevance(basing this off the number of times that the feed has been called, links, etc.), or by timeliness(who was the last person talking about X). Who needs an index of 40 million, especially if you can’t really make sense of it? Hell, turn on MySpace and some of these other “blog” hosted sites and you could have an index of 200 million blogs and you could spend all day monitoring your ping servers.(With 80% of blog comments being spam, what percentage of your typical blog index do you think is spam? How many are dead/irrelevant?) If the goal is to provide relevant/JIT syndicated conversations across a major news site there is no need to draw from the millions. Feedburner’s index will continue to grow relative to blogging and the demographics of the index will continue to look more like the overall blogger demographic. Since they provide so many good tools and so many of their users spend a lot of time on their site, setting up profiles would be ntohing more than adding a tab to the users dashboard.

Where this feature could get even more interesting is in licensing the content to third parties like BlogBurst is trying to do. Feedburner already has the smart index on publishers and the platform set up for payment to a publisher. This could just be an additional feature that a publisher could turn on if they wanted too, and Feedburner could offer the content up in a auction or license style manner to major content providers. Feedburner could become the content marketplace like Google currently is for advertising. :)

Blog Search is no longer Search
Anyone else notice that Technorati is looking less and less like search; and more like a blog directory/community? I think this is because JIT(Just In Time) search is dead from a Bmodel perspective. The conventional, type text into a query box and press the ’search’ button, doesn’t really work with blogs. Do you get up in the morning and type Apple into a search box, or do you just look at the news portals to see if anything important has happened? Look at Google News, syndicated on-demand news(and blogs will come), and how about their new ’suggest’ feature. People dont want to search for what just happened, they want it given to them. That is why Technorati seems to be transforming into, a blog directory/portal that provides multiple services and is built on top of a community.

In the end, big media thinks they get this “MySpace or CGC thing” and they have the ad demand but need the cheap additional pages of content, bloggers want distribution and exposure(Esther Dyson’s thoughts on Attention), and the technology service providers need to get paid or bought before Web 2.0 comes under the laws of gravity.


8 Responses to “Sphere It! Service, Search Engines, Blog Syndication, and Blah”  

  1. Gravatar Icon 1 Marshall Kirkpatrick

    I don’t get up in the morning and type anything into a search engine, no, but I do check my feeds first thing, many of which are blog searches. For that I’m very interested in rapid discovery and time based information.

  2. Gravatar Icon 2 James

    I agree with you Marshall, but you are an advanced techie, not the average internet browser. This post was more about showing how these companies are going to make money right now, and the current business behind consumers searching and subscribing is worth next to nothing.

    Trying to subscribe to a search on technorati is nearly impossible, why? Because they don’t want you too. They want you coming back to the site, uploading your OPML(favorites?) there, and having your “watchlists” on site.

    I think that Technorati’s newest announcement this morning is spot-on to where their product is shifting. They can sell syndication to content portals or the JIT alerts/communication relationship to the enterprise, like they are doing with Edelman, but for the average internet browser the play looks to be more and more like a blog directory/community.

  3. Gravatar Icon 3 Scott Rafer

    Four points on the above:

    1. What Feedburner has is probably not an index, it’s a directory with no (disclosed) underlying content archive. “Index” typically refers to having some sort of archive of the content and finding some tricky math to organize it. It’s only important to make this point as the lack of an index defines many of Feedburner’s limitations.
    2. There is no reason to believe that Feedburner will be spam-free, or is even spam free now. If they turn into a complete and economic ecology (or if they organize their publishers into an index, see #1), the spammers will simply start running their feeds through feedburner.
    3. You and Marshall are dancing around the right JIT search answer from my point of view, but are mixing two issues. Blog portal home pages are likely to be headline-driven a la Google News and the current ‘rati home page. The issue is that a search box is not suitable to setting up search feeds for 95% of the population. What is suitable is unclear so far.
    4. How do you determine that Sphere is even loosely semantic in nature? Tony’s a good marketer, but it’s unlikely in the extreme.

  4. Gravatar Icon 4 James

    Thanks for the comment Scott. To address your points.

    1. I understand the need to develop the index with some “tricky math” but i don’t believe the underlying content archive is an issue.

    2. Totally agreed about spam finding any viable source. But Feedburner has a gatekeeper. The ‘index’ would not be formed around unknown pings from ‘naked’ feeds, they have the content centralized(Feedburner feeds only), and the goal would never be to ‘index’, in this case, the whole blogosphere. The ability to discriminate is key here.

    3. Agreed

    4. I am just going off what I have seen and read so far, which is admittedly limited. I guess the person to ask would be Tony. :)

  5. Gravatar Icon 5 Tony Conrad

    Now Scott - that’s a pointed comment :)

    The Sphere It! bookmarklet is a very robust piece of technology, developed over several years by my two cofounders, Martin Remy and Steve Nieker.

    Sphere It takes advantage of a novel text analysis technology that
    analyzes entire document texts (a news article or blog post in Sphere
    terms, but any text will do; the technology works just as well on
    product descriptions, Word documents, etc.). Each text gets passed
    through a proprietary pipeline that extracts key concepts and themes.
    These key concepts and themes are encoded in a data structure we call
    a Document Genome.

    As the name is meant to suggest, a Document Genome is unique to the
    text from which it was derived. It’s worth emphasizing at this point
    that DGs are not simply keyword extractions. There’s a complex
    analysis pipeline employing a number of both traditional and novel
    text-analysis routines to identify the concepts and themes. The
    tokens of the resulting Document Genome, in contrast to keywords, are
    machine readable abstractions that don’t mean much to the human eye.

    Like the biological data sets they’re named after, Document Genomes
    can be compared for similarity. (I’m 98% chimp; 63% iguana!) We
    generate Document Genomes for every blog post we crawl. When you’re
    looking at a page on the Web and click the Sphere It bookmarklet,
    Sphere grabs the text of the page you’re viewing, generates a
    Document Genome for it on the fly, and compares it to the DGs of the
    blog posts we’ve crawled. The closest matching blog posts are presented.

    In practice, the Sphere It approach to contextual matching is
    fundamentally different from other approaches out there. If you
    compare it to other text-based approaches, like Google’s Similar
    Pages or Yahoo’s Search Related Info, we’re consistently much more
    precise in our matches.

    Compared to other blog-space approaches, like Technorati This, Sphere
    It! gives a wider breadth of results on the topic. Technorati This!
    limits results to only those blog posts that link directly to the
    page you’re viewing. How do you find the posts that were published
    prior to the page you’re reading, but on the same subject? How do you
    find posts discussing similar topics from other sources (you’re
    reading WSJ, they’re reading NYT)? How do you find posts discussing
    the topic but not linking anywhere? Sphere It makes those
    connections. (I’ve covered this in more detail at http://
    sphere.wordpress.com/2006/05/12/week-one-in-the-rearview-mirror/)

  6. Gravatar Icon 6 Scott Rafer

    Tony, thanks, we’re all sure you believe in what Sphere does and that you guys have something unique and cool. You’ve got too many options to be working on it otherwise. However, the issue was whether Sphere was semantic in the “semantic web” sense [ http://en.wikipedia.org/wiki/Semantic_web ]. Nothing you said seems to address the issue.

    James, (1) it’s theoretically reasonable to only store index information, but it doesn’t usually work in practice for serving a lot of users quickly and with a simple UI. (2) You’ve lost me. What gatekeeper? There’s already a lot of spam in Feedburner. All I have to do is get a bunch of bots to subscribe to the feeds and how are they going to know the difference? Google the feeds.feedburner.com domain for Viagra and guess what? Spam, spam, spam, spam…..

    http://tinyurl.com/j8jdr

  7. Gravatar Icon 7 Dick Costolo

    Great thread. Scott, you mischaracterize spam within FB. Most of the search results for the query you highlight are from perfectly legitimate feeds that happen to mention Viagra, in posts like “Pfizer is adding radio frequency identification tags to all Viagra sold in the US in an attempt…”, blogs about the pharma industry, blogs commenting on the viagra spam problem, etc.

  1. 1 About Sphere, what am I missing? « SEND IT!!!


Leave a Reply