<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Sphere It! Service, Search Engines, Blog Syndication, and Blah</title>
	<atom:link href="http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/</link>
	<description></description>
	<pubDate>Fri, 04 Jul 2008 23:43:14 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
		<item>
		<title>By: About Sphere, what am I missing? &#171; SEND IT!!!</title>
		<link>http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-13564</link>
		<dc:creator>About Sphere, what am I missing? &#171; SEND IT!!!</dc:creator>
		<pubDate>Fri, 10 Nov 2006 23:44:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-13564</guid>
		<description>[...] In looking around the blogosphere for more discussion on Sphere (using Google of course ;-), I ran into a post by James Gross on May 23rd, 2006, that had a very interesting response in the comments section from Tony Conrad, Sphere&#8217;s CEO, in response to some comments from Scott Rafer. Below is Tony&#8217;s response as I&#8217;d like to focus on some of the things he says here in relation to the issues I&#8217;ve raised above: [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] In looking around the blogosphere for more discussion on Sphere (using Google of course ;-), I ran into a post by James Gross on May 23rd, 2006, that had a very interesting response in the comments section from Tony Conrad, Sphere&#8217;s CEO, in response to some comments from Scott Rafer. Below is Tony&#8217;s response as I&#8217;d like to focus on some of the things he says here in relation to the issues I&#8217;ve raised above: [&#8230;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dick Costolo</title>
		<link>http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3989</link>
		<dc:creator>Dick Costolo</dc:creator>
		<pubDate>Mon, 29 May 2006 17:30:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3989</guid>
		<description>Great thread. Scott, you mischaracterize spam within FB. Most of the search results for the query you highlight are from perfectly legitimate feeds that happen to mention Viagra, in posts like "Pfizer is adding radio frequency identification tags to all Viagra sold in the US in an attempt...", blogs about the pharma industry, blogs commenting on the viagra spam problem, etc.</description>
		<content:encoded><![CDATA[<p>Great thread. Scott, you mischaracterize spam within FB. Most of the search results for the query you highlight are from perfectly legitimate feeds that happen to mention Viagra, in posts like &#8220;Pfizer is adding radio frequency identification tags to all Viagra sold in the US in an attempt&#8230;&#8221;, blogs about the pharma industry, blogs commenting on the viagra spam problem, etc.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Scott Rafer</title>
		<link>http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3946</link>
		<dc:creator>Scott Rafer</dc:creator>
		<pubDate>Sat, 27 May 2006 19:19:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3946</guid>
		<description>Tony, thanks, we're all sure you believe in what Sphere does and that you guys have something unique and cool. You've got too many options to be working on it otherwise. However, the issue was whether Sphere was semantic in the "semantic web" sense [ http://en.wikipedia.org/wiki/Semantic_web ]. Nothing you said seems to address the issue.

James, (1) it's theoretically reasonable to only store index information, but it doesn't usually work in practice for serving a lot of users quickly and with a simple UI. (2) You've lost me. What gatekeeper? There's already a lot of spam in Feedburner. All I have to do is get a bunch of bots to subscribe to the feeds and how are they going to know the difference? Google the feeds.feedburner.com domain for Viagra and guess what? Spam, spam, spam, spam.....

http://tinyurl.com/j8jdr</description>
		<content:encoded><![CDATA[<p>Tony, thanks, we&#8217;re all sure you believe in what Sphere does and that you guys have something unique and cool. You&#8217;ve got too many options to be working on it otherwise. However, the issue was whether Sphere was semantic in the &#8220;semantic web&#8221; sense [ <a href="http://en.wikipedia.org/wiki/Semantic_web" rel="nofollow">http://en.wikipedia.org/wiki/Semantic_web</a> ]. Nothing you said seems to address the issue.</p>
<p>James, (1) it&#8217;s theoretically reasonable to only store index information, but it doesn&#8217;t usually work in practice for serving a lot of users quickly and with a simple UI. (2) You&#8217;ve lost me. What gatekeeper? There&#8217;s already a lot of spam in Feedburner. All I have to do is get a bunch of bots to subscribe to the feeds and how are they going to know the difference? Google the feeds.feedburner.com domain for Viagra and guess what? Spam, spam, spam, spam&#8230;..</p>
<p><a href="http://tinyurl.com/j8jdr" rel="nofollow">http://tinyurl.com/j8jdr</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Tony Conrad</title>
		<link>http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3811</link>
		<dc:creator>Tony Conrad</dc:creator>
		<pubDate>Thu, 25 May 2006 17:44:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3811</guid>
		<description>Now Scott - that's a pointed comment :) 

The Sphere It! bookmarklet is a very robust piece of technology, developed over several years by my two cofounders, Martin Remy and Steve Nieker.

Sphere It takes advantage of a novel text analysis technology that  
analyzes entire document texts (a news article or blog post in Sphere  
terms, but any text will do; the technology works just as well on  
product descriptions, Word documents, etc.). Each text gets passed  
through a proprietary pipeline that extracts key concepts and themes.  
These key concepts and themes are encoded in a data structure we call  
a Document Genome.

As the name is meant to suggest, a Document Genome is unique to the  
text from which it was derived. It's worth emphasizing at this point  
that DGs are not simply keyword extractions. There's a complex  
analysis pipeline employing a number of both traditional and novel  
text-analysis routines to identify the concepts and themes. The  
tokens of the resulting Document Genome, in contrast to keywords, are  
machine readable abstractions that don't mean much to the human eye.

Like the biological data sets they're named after, Document Genomes  
can be compared for similarity. (I'm 98% chimp; 63% iguana!) We  
generate Document Genomes for every blog post we crawl. When you're  
looking at a page on the Web and click the Sphere It bookmarklet,  
Sphere grabs the text of the page you're viewing, generates a  
Document Genome for it on the fly, and compares it to the DGs of the  
blog posts we've crawled. The closest matching blog posts are presented.

In practice, the Sphere It approach to contextual matching is  
fundamentally different from other approaches out there. If you  
compare it to other text-based approaches, like Google's Similar  
Pages or Yahoo's Search Related Info, we're consistently much more  
precise in our matches.

Compared to other blog-space approaches, like Technorati This, Sphere  
It! gives a wider breadth of results on the topic. Technorati This!  
limits results to only those blog posts that link directly to the  
page you're viewing. How do you find the posts that were published  
prior to the page you're reading, but on the same subject? How do you  
find posts discussing similar topics from other sources (you're  
reading WSJ, they're reading NYT)? How do you find posts discussing  
the topic but not linking anywhere? Sphere It makes those  
connections.  (I've covered this in more detail at http://
sphere.wordpress.com/2006/05/12/week-one-in-the-rearview-mirror/)</description>
		<content:encoded><![CDATA[<p>Now Scott - that&#8217;s a pointed comment <img src='http://www.jamesgross.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>The Sphere It! bookmarklet is a very robust piece of technology, developed over several years by my two cofounders, Martin Remy and Steve Nieker.</p>
<p>Sphere It takes advantage of a novel text analysis technology that<br />
analyzes entire document texts (a news article or blog post in Sphere<br />
terms, but any text will do; the technology works just as well on<br />
product descriptions, Word documents, etc.). Each text gets passed<br />
through a proprietary pipeline that extracts key concepts and themes.<br />
These key concepts and themes are encoded in a data structure we call<br />
a Document Genome.</p>
<p>As the name is meant to suggest, a Document Genome is unique to the<br />
text from which it was derived. It&#8217;s worth emphasizing at this point<br />
that DGs are not simply keyword extractions. There&#8217;s a complex<br />
analysis pipeline employing a number of both traditional and novel<br />
text-analysis routines to identify the concepts and themes. The<br />
tokens of the resulting Document Genome, in contrast to keywords, are<br />
machine readable abstractions that don&#8217;t mean much to the human eye.</p>
<p>Like the biological data sets they&#8217;re named after, Document Genomes<br />
can be compared for similarity. (I&#8217;m 98% chimp; 63% iguana!) We<br />
generate Document Genomes for every blog post we crawl. When you&#8217;re<br />
looking at a page on the Web and click the Sphere It bookmarklet,<br />
Sphere grabs the text of the page you&#8217;re viewing, generates a<br />
Document Genome for it on the fly, and compares it to the DGs of the<br />
blog posts we&#8217;ve crawled. The closest matching blog posts are presented.</p>
<p>In practice, the Sphere It approach to contextual matching is<br />
fundamentally different from other approaches out there. If you<br />
compare it to other text-based approaches, like Google&#8217;s Similar<br />
Pages or Yahoo&#8217;s Search Related Info, we&#8217;re consistently much more<br />
precise in our matches.</p>
<p>Compared to other blog-space approaches, like Technorati This, Sphere<br />
It! gives a wider breadth of results on the topic. Technorati This!<br />
limits results to only those blog posts that link directly to the<br />
page you&#8217;re viewing. How do you find the posts that were published<br />
prior to the page you&#8217;re reading, but on the same subject? How do you<br />
find posts discussing similar topics from other sources (you&#8217;re<br />
reading WSJ, they&#8217;re reading NYT)? How do you find posts discussing<br />
the topic but not linking anywhere? Sphere It makes those<br />
connections.  (I&#8217;ve covered this in more detail at <a href="http://" rel="nofollow">http://</a><br />
sphere.wordpress.com/2006/05/12/week-one-in-the-rearview-mirror/)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James</title>
		<link>http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3810</link>
		<dc:creator>James</dc:creator>
		<pubDate>Thu, 25 May 2006 17:37:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3810</guid>
		<description>Thanks for the comment Scott. To address your points.

1. I understand the need to develop the index with some "tricky math" but i don't believe the underlying content archive is an issue.

2. Totally agreed about spam finding any viable source. But Feedburner has a gatekeeper. The 'index' would not be formed around unknown pings from 'naked' feeds, they have the content centralized(Feedburner feeds only), and the goal would never be to 'index', in this case, the whole blogosphere. The ability to discriminate is key here.

3. Agreed

4. I am just going off what I have seen and read so far, which is admittedly limited. I guess the person to ask would be Tony. :)</description>
		<content:encoded><![CDATA[<p>Thanks for the comment Scott. To address your points.</p>
<p>1. I understand the need to develop the index with some &#8220;tricky math&#8221; but i don&#8217;t believe the underlying content archive is an issue.</p>
<p>2. Totally agreed about spam finding any viable source. But Feedburner has a gatekeeper. The &#8216;index&#8217; would not be formed around unknown pings from &#8216;naked&#8217; feeds, they have the content centralized(Feedburner feeds only), and the goal would never be to &#8216;index&#8217;, in this case, the whole blogosphere. The ability to discriminate is key here.</p>
<p>3. Agreed</p>
<p>4. I am just going off what I have seen and read so far, which is admittedly limited. I guess the person to ask would be Tony. <img src='http://www.jamesgross.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Scott Rafer</title>
		<link>http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3808</link>
		<dc:creator>Scott Rafer</dc:creator>
		<pubDate>Thu, 25 May 2006 14:46:42 +0000</pubDate>
		<guid isPermaLink="false">http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3808</guid>
		<description>Four points on the above:

1. What Feedburner has is probably not an index, it's a directory with no (disclosed) underlying content archive. "Index" typically refers to having some sort of archive of the content and finding some tricky math to organize it. It's only important to make this point as the lack of an index defines many of Feedburner's limitations.
2. There is no reason to believe that Feedburner will be spam-free, or is even spam free now. If they turn into a complete and economic ecology (or if they organize their publishers into an index, see #1), the spammers will simply start running their feeds through feedburner. 
3. You and Marshall are dancing around the right JIT search answer from my point of view, but are mixing two issues. Blog portal home pages are likely to be headline-driven a la Google News and the current 'rati home page. The issue is that a search box is not suitable to setting up search feeds for 95%  of the population. What is suitable is unclear so far.
4. How do you determine that Sphere is even loosely semantic in nature? Tony's a good marketer, but it's unlikely in the extreme.</description>
		<content:encoded><![CDATA[<p>Four points on the above:</p>
<p>1. What Feedburner has is probably not an index, it&#8217;s a directory with no (disclosed) underlying content archive. &#8220;Index&#8221; typically refers to having some sort of archive of the content and finding some tricky math to organize it. It&#8217;s only important to make this point as the lack of an index defines many of Feedburner&#8217;s limitations.<br />
2. There is no reason to believe that Feedburner will be spam-free, or is even spam free now. If they turn into a complete and economic ecology (or if they organize their publishers into an index, see #1), the spammers will simply start running their feeds through feedburner.<br />
3. You and Marshall are dancing around the right JIT search answer from my point of view, but are mixing two issues. Blog portal home pages are likely to be headline-driven a la Google News and the current &#8216;rati home page. The issue is that a search box is not suitable to setting up search feeds for 95%  of the population. What is suitable is unclear so far.<br />
4. How do you determine that Sphere is even loosely semantic in nature? Tony&#8217;s a good marketer, but it&#8217;s unlikely in the extreme.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: James</title>
		<link>http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3804</link>
		<dc:creator>James</dc:creator>
		<pubDate>Wed, 24 May 2006 16:01:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3804</guid>
		<description>I agree with you Marshall, but you are an advanced techie, not the average internet browser. This post was more about showing how these companies are going to make money right now, and the current business behind &lt;em&gt;consumers&lt;/em&gt; searching and subscribing is worth next to nothing.

Trying to subscribe to a search on technorati is nearly impossible, why? Because they don't want you too. They want you coming back to the site, uploading your OPML(favorites?) there, and having your "watchlists" on site.

I think that &lt;a rel="nofollow" href="http://technorati.com/weblog/2006/05/107.html"&gt;Technorati's newest announcement&lt;/a&gt; this morning is spot-on to where their product is shifting.  They can sell syndication to content portals or the JIT alerts/communication relationship to the enterprise, like they are doing with Edelman, but for the average internet browser the play looks to be more and more like a blog directory/community.</description>
		<content:encoded><![CDATA[<p>I agree with you Marshall, but you are an advanced techie, not the average internet browser. This post was more about showing how these companies are going to make money right now, and the current business behind <em>consumers</em> searching and subscribing is worth next to nothing.</p>
<p>Trying to subscribe to a search on technorati is nearly impossible, why? Because they don&#8217;t want you too. They want you coming back to the site, uploading your OPML(favorites?) there, and having your &#8220;watchlists&#8221; on site.</p>
<p>I think that <a rel="nofollow" href="http://technorati.com/weblog/2006/05/107.html">Technorati&#8217;s newest announcement</a> this morning is spot-on to where their product is shifting.  They can sell syndication to content portals or the JIT alerts/communication relationship to the enterprise, like they are doing with Edelman, but for the average internet browser the play looks to be more and more like a blog directory/community.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Marshall Kirkpatrick</title>
		<link>http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3797</link>
		<dc:creator>Marshall Kirkpatrick</dc:creator>
		<pubDate>Wed, 24 May 2006 01:17:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.jamesgross.com/sphere-it-service-search-engines-blog-syndication-and-blah/#comment-3797</guid>
		<description>I don't get up in the morning and type anything into a search engine, no, but I do check my feeds first thing, many of which are blog searches.  For that I'm very interested in rapid discovery and time based information.</description>
		<content:encoded><![CDATA[<p>I don&#8217;t get up in the morning and type anything into a search engine, no, but I do check my feeds first thing, many of which are blog searches.  For that I&#8217;m very interested in rapid discovery and time based information.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
