
Cuil relaunches as 'cpedia.' The results are not pretty. - raganwald
http://cpedia.com/wiki?q=raganwald
======
gojomo
Hidden inside the vaguely markov-chain/chinglish-like nonsense is an
interesting idea: deduplicate the reference-information web.

Even before the recent explosion of spablum (spam/pablum/made-for-adsense)
sites, using the web to learn about something involved reading a lot of
duplicate info. When learning about topic X, you might visit site #1 and learn
A, B, C, D, E. Then, site #2 has B, C, D, F, G. Then, site #3 has A, C, E, G,
H.

Even if H is the key thing you personally wanted to know, by the time you got
to it, you'd encountered most other things 2 or more times, and even the 'gem'
of a result, site #3, was 80% redundant.

A wonderful summary page would have A, B, C, D, E, F, G, and H, all in one
place, properly contextualized. The best Wikipedia articles serve this role
and that's why they're deservedly atop many search results. But the obvious
utility of summary pages has prompted a race to create more of them, by
automated and editorial means. When such attempts fall even a little bit
short, they make the problem worse not better -- polluting the web with more
permutations of the exact same information, hiding the rare gems in more muck.

I'd like to start a project to attack this problem head-on, and would love to
hear from others who might want to collaborate.

------
danieldon
I really don't understand cuil. I feel really bad criticizing someone's hard
work, but this company is not a one man operation; it's a team of smart,
experienced people with tens of millions of dollars of venture capital. They
_must_ be aware of their reputation and that they are the butt of jokes for
exactly the kind of stuff we are now seeing in cpedia. What is the strategy?
Do they think they can just ride out having such a bad repuation while they
try to perfect the service?

~~~
techiferous
"it's a team of smart people with tens of millions of dollars of venture
capital."

This is probably a weakness, not a strength. If cuil were a bootstrapped one-
person operation it would be very easy to cut the cord and move on to
something else.

~~~
stcredzero
Also, it would only be subject to self-delusion and not the feedback loop of
groupthink.

~~~
ahoyhere
And the "us" vs "them" mentality. It's much easier to fall into for smart,
sane people when they're surrounded by other people doing the same thing.
After all, if it's so nuts, why doesn't Joanna / Bob quit?. Social proof.

------
jacquesm
What bugs me is how these guys got their funding simply on the off chance that
if you're an ex-googler you might know something that others don't.

Gigablast, duck-duck-go are outperforming cuil where it matters, and they're
getting quite close to be being able to supplant google.

Cuils founders ought to be at least slightly embarrassed by that.

~~~
elblanco
Even the most cursory of testing reveals that the results of this are no good.
I have no idea why they would pull the trigger and release this.

~~~
holygoat
They already did it once (the first time they launched, and Cuil was a POS).
Insane.

------
petewarden
Here's an article from Cuil's blog about the launch:

[http://www.cuil.com/info/blog/2010/04/08/introducing-
cpedia-...](http://www.cuil.com/info/blog/2010/04/08/introducing-cpedia-the-
automated-encyclopedia)

"I was a stay at home dad, I dropped by Kleiner Perkins to try to get some
money to write a search engine". This _has_ to be an elaborate piece of
satirical performance art.

~~~
jimboyoungblood
Personally I thought the link to "gay sex" in a product announcement was the
kicker. (It was the _only_ link in the article, btw)

~~~
akkartik
And why was it going to cuil rather than cpedia?

------
slig
I searched for "wikipedia" and the results couldn't be worse:
<http://imgur.com/nXKXj.png>

~~~
statictype
Strangely enough, the auto-completion box, when you type 'wikip', shows
www.wikipedia.org complete with Wikipedia logo as the first result.

------
mark_l_watson
This is useful because I thought about something similar: clustering text, and
using my auto-summarization library to extract key bits and then stich things
together.

Now, I am not going to try. Lesson learned.

~~~
elblanco
This looks like a hobbyist effort built by somebody messing around with too
big of a dataset. Anybody with even a smattering of NLP background could have
predicted this quality of results. Computers are simply too stupid to assemble
coherent narrative summaries on specific topics. They would require a true AI,
and all of the major efforts that _might_ build such a thing (Cyc, Semantic
Web, etc.) have produced very little of value in the area.

------
aheilbut
This would have made a great April Fool's joke.

~~~
nash
<http://cpedia.com/search?q=april+fools+joke>

Not really.

~~~
MikeCapone
Totally surrealistic! Reminds me of the Cuil Theory poster from that guy on
Reddit:

[http://www.reddit.com/r/blog/comments/a5byc/interrobang_your...](http://www.reddit.com/r/blog/comments/a5byc/interrobang_your_wall_with_this_new_cuil_theory/)

------
techiferous
There is a potential for misquoting people. Read the last line of the summary
section here: <http://cpedia.com/search?q=railsconf>

Read literally, Paul Krill was the author of the entire article after the
summary.

Because this content is pieced together by computers who have no idea of how
the resulting piece reads as a whole, I wonder if there are hidden legal
liabilities when words get stuffed into people's mouths or when the
combinations of sections lead to false content (imagine one paragraph ending
with "and found the following to be completely false:").

EDIT: Here's an article misquoting a lawyer. Not smart. ;)
<http://www.cpedia.com/wiki?q=palin#headline_31>

------
byrneseyeview
Demand Media must be relieved. Despite all the jokes, Cuil's team has a very
impressive background. But they've just made it clear that the cheapest way to
create prose online is still to pay people $10-$15 per article.

------
raquo
Wrap this up in Adsense and you get Mahalo 2.0

~~~
slig
You joke, but it's not that far: <http://cpedia.com/robots.txt> — No disallow
on /wiki

Except, maybe, that mahalo's "content" is far better scraped.

~~~
neilk
Their robots.txt is password protected now. (?)

I'm not sure what a crawler is supposed to conclude in that case.

I don't see a lot of cpedia content on Google yet. The only links I can see
are clearly due to the threads here and on sites like Reddit.

~~~
gojomo
Last I checked, Google's crawler treated 4xx errors on a /robots.txt (which
includes unauthorized/not-allowed) the same as 404-not-found. (In most cases,
it's a configuration error by the site owner, and for those sites where it's
intentional, the operator is savvy enough to block crawler access to other
URLs by password or user-agent if desired.)

------
tptacek
Cuil itself is still there. But yeah that's pretty bad.

Here's one of the first grafs on "cuil" in cpedia:

 _Best Web Hosting Offer Cuil launches with an index of 120 billion web pages
making it a most comprehensive magic! Affiliate Secrets search engine on the
web and also a potential try to get the Google competitor. Clickbooth Cuil but
not avail due to flooding traffics and making their servers 'too hot' to
handle. After googling Cuil is down at the moment to 'cool' down._

------
techiferous
My professional web development career, boiled down to one piece of useless
trivia:

<http://cpedia.com/search?q=techiferous>

~~~
blasdel
Mine is way worse: _"Careful, blasdelf ineptness is one of their core
competencies. Sounds like copyright infringement."_ \--
<http://cpedia.com/search?q=blasdelf>

~~~
mos1
Results for my real name indicate that I have a doctorate, 4 bedrooms, 2
baths, and that I was last sold for $650,000.

~~~
techiferous
Hilarious!

The results for my real name are disturbing:
<http://www.cpedia.com/wiki?q=wyatt+greene>

------
jbyers
In the last two minutes the raganwald page was removed.

The page for us is interesting. Huge amounts of content, flashes of
brilliance, and moments of sheer terror.
<http://cpedia.com/search?q=wikispaces>

~~~
raganwald
Well then, I guess the mods can kill this. I wondered if this might happen, so
I have a screen shot:

<http://twitpic.com/1ek18g/full>

The moods could also relink this thread of conversation to the screen shot.

~~~
stcredzero
Mods autocorrected to moods? Unintentionally funny, just like cpedia!

~~~
raganwald
I'll let it stand, you deserve upmods for pointing that out!

~~~
cubicle67
upmoods?

------
ottbot
Interesting, very optimistic project.. Put it just produces a mash of out of
context snippets. The biggest problem is that it seems unable to identify
introductory information very well, or eliminate references due to news items.
I tried a couple of things:

<http://cpedia.com/wiki?q=computational+fluid+dynamics>

<http://cpedia.com/search?q=London%20Underground>

Can anyone find something useful? I'd like to see a good article.

~~~
algolicious
Here is one. One of the top results right now on Reddit is a picture of Turnip
Rock. <http://www.reddit.com/r/pics/comments/bp49v/island_pic/> But Wikipedia
doesn't have a page on Turnip Rock. Cpedia has one that isn't completely
worthless. <http://cpedia.com/search?q=turnip+rock>

------
thorax
Are we sure this isn't just an experiment of theirs? Some 20% project from one
of their engineers using info they already had at their disposal?

For a side experiment, I'm not going to fault or bash any company-- we've got
a few of our own experiments which aren't the prettiest/best sites in the
world.

~~~
algolicious
Nope, they seem to think it reflects their vision:
[http://www.cuil.com/info/blog/2010/04/08/introducing-
cpedia-...](http://www.cuil.com/info/blog/2010/04/08/introducing-cpedia-the-
automated-encyclopedia)

~~~
thorax
Well, it does sound like the know it's more of a long-shot "weird" idea and
they call it an "alpha". One quote from the blog:

> _At other times it is weird — it does reflect the web after all._

I can see a lot of hackers wanting to try something like this and see if they
can iteratively make it better and better and better.

Doesn't this sort of a site sound like something people here would love to toy
with?

> _For each query, Cpedia algorithmically summarizes and clusters the ideas on
> the web and uses this to generate a report. We do the heavy lifting of
> removing all the repetition, so that unique and novel content surfaces._

I personally think it's cool for people to try weird things like this-- even
if the attempts fail.

~~~
i80and
I agree. While at the moment it's kind of cruddy in the current
implementation, I gotta respect them for playing with what really is a fun
idea. If they do manage to get it working better (extremely difficult and
iffy), Cpedia could be a pretty cool tool.

------
Detrus
I think you guys are missing the big picture here. This is a brilliant PR
move. The hilarity contained in this thing is on par with encyclopedia
dramatica. It will become part of the 4chan, lolcats web culture and what can
be better than that?

~~~
jimboyoungblood
Yes, I'm sure this is _exactly_ what their VC's were hoping for with their
$millions.

------
jluxenberg
People give Cuil search a lot of crap, but I actually find their results to be
pretty good. Their clustering algorithm is cool too and good for exploration.
Which is something that other search engines don't seem to do very well.

~~~
spc476
I've found Clusty ( <http://clusty.com> ) to be quite good. I can't quite make
out what the heck is supposed to be going on with this cpedia thing.

------
vaksel
the c stands for crap

the goal here isn't to find the one funny/stupid article...with this site the
goal should be to find one article that actually makes sense

------
cmars232
Maybe they're looking to get into the parked domain business?

------
Hates_
The Ruby on Rails one is particularly amusing.

<http://cpedia.com/search?q=ruby+on+rails>

~~~
jackowayed
Wow, yeah. The first section is Merb ... ok. Then there's a ROR subsection of
Merb. Um, what? And the first paragraph of the ROR subsection clearly was
copied and pasted from some consultancy's "Our decicated RoR developers can
create web 2.0 applications using latest Ruby on Rails web services." (Ok, not
_quite_ copied and pasted. They managed to work in a typo ("decicated").)

And then the paragraph immediately after that doesn't follow at all. It
starts:

 _> When he [Brian Mastenbrook] was able to reproduce the glitch at Basecamp,
he began to suspect that the flaw was inherent to Ruby on Rails, the popular
Web framework used by both websites._

Did they do QA before launching this thing?

~~~
senki
[http://www.cuil.com/info/blog/2010/04/08/introducing-
cpedia-...](http://www.cuil.com/info/blog/2010/04/08/introducing-cpedia-the-
automated-encyclopedia)

> _I find Cpedia best on topics that I thought I knew about. I find out things
> I should have known but didn’t. I’ve noticed productivity has slowed in the
> company since we have had it up for internal testing, as people ask each
> other about stranger and stranger trivia, or exclaim, “I didn’t know your
> middle name was Hector?”_

------
techiferous
If you search for George Bush, all you get is an airport.
<http://www.cpedia.com/search?q=george+bush>

And a search for Cheney does not show the former Vice President:
<http://www.cpedia.com/wiki?q=cheney>

------
stcredzero
Could this be a result of optimizing algorithms to solve hard problems? (Not
understanding that the first trials people would send the thing would be
simple searches to be compared to the Google results?)

Some of the results are so funny that some groupthink has to be involved as
well.

------
unignorant
Were this done correctly, it would be an amazing feat of NLP. Unfortunately,
it is not... You can't just mash snippets together piecemeal. Context matters.

I'm surprised they even released it; the few search terms I attempted
generated useless results.

~~~
Daemmerung
M-x dissociated-press

------
moe
I tried a few seemingly easy searches ("Obama", "ipod") and it only ever
returns random noise.

I just hope this garbage gets blacklisted in real search engines ASAP, before
it starts polluting results.

~~~
secret
I just tried your searches. Is their a word in English for third-party
embarrassment?

~~~
blasdel
Closest thing in colloquial english is _'douche chills'_

In Dutch it's called _'plaatsvervangende schaamte'_ which literally means
"place exchanging shame". Shame felt on behalf of someone else, shame you feel
someone else should feel. I'm embarrassed _for_ them.

------
elblanco
Somebody should build a mashup of the cpedia results with translation party.

<http://translationparty.com/>

------
pclark
"PClark has never once asked for someone to pay for his passion to ride the
old Auroras"

true that.

~~~
lepht
The amazing part is that this seems to be universally true. I'm also a
"pclark", and I can attest to the accuracy of this statement.

Watch your backs, Google, these guys are onto something.

------
JeffJenkins
I have a really common name, which I apparently share with a baseball player.
The page on the baseball player is _awful_ :

<http://cpedia.com/search?q=jeffrey+jenkins>

What's really funny about this page is that a quick search reveals that the
Milwaukee Brewers have a "Geoff Jenkins" and a "Jeffrey Hammond"

------
adharmad
This is hilarious: <http://cpedia.com/wiki?q=cuil>

------
jrockway
The jrockway page is similarly excellent:

"Git doesn't include a source code editor, profiler, or web browser, either.
That means your syntax-highlighted code-blocks look the same on your slides as
in Emacs :)"

I'm not sure how it gets from here to there. I did write eslide, however...

------
kbrower
This reminds me of TS Elliot's The Wasteland. We just don't get it yet.
<http://www.writtenhumor.com/wasteland.html>

------
Tycho
I quite like the pane on the right though with the hover-over feature. Let me
learn a few things within a field that I wouldn't have known otherwise nor had
the motivation to find out.

------
techiferous
Perhaps this is more a branding/marketing fail than a technology fail. First
of all, the cuil brand is so badly marred that it should be abandoned, but
cpedia holds onto it. Secondly, what is cpedia? Is it just an interesting side
project/proof-of-concept? Or is it trying to be a search engine game changer?
Depending on what it is, I have different opinions on it, so it's important
when launching it to describe what it is. It's actually a really cool side
project/toy app.

------
acangiano
What a train wreck: <http://cpedia.com/search?q=antonio+cangiano>

------
modeless
After I stopped laughing (which took quite a while), I noticed that this thing
occasionally surfaces some interesting information that I didn't even realize
was on the web. If they presented the results in a way that was a bit more
coherent, and linked back to their sources, this thing might actually be
useful for something other than comedy.

------
edmccaffrey
You would expect something with a disambiguation page for your search, and a
topic-specific page for what you are looking for would have relevant
information on that topic, but:

[http://www.cpedia.com/wiki?q=Haskell&disambig=Functional...](http://www.cpedia.com/wiki?q=Haskell&disambig=Functional%20Programming)

------
snorkel
Have no fear. If you click "View web search results" you get back to the same
classic cuil you know and love.

------
api
How does crap on a plate like this get funding when there are so many more
worthy projects out there?

------
iuguy
Just look at
[http://cpedia.com/search?q=Paul%20Graham&d=raganwald](http://cpedia.com/search?q=Paul%20Graham&d=raganwald)
\- I'd get more relevant and complete information from a markov chain...

------
jamesjyu
Sadly, Bing's concept of search overload comes to mind on every page of
cpedia.

------
benatkin
I thought it was "curl" at first, and it was due to some crazy late trademark
battle. I was glad to see it was Cuil. (Just got in from a bike ride, sitting
farther away from the monitor than usual.)

------
schwanksta
Try googling a journalist or someone with a byline that appears often.
<http://cpedia.com/search?q=Hilary%20Lehman>

------
orblivion
Oh, this poor company. I don't think this is going to work out.

------
mattdennewitz
oh, this is a complete disaster :(

------
chbarts
<http://en.wikipedia.org/wiki/Cut-up_technique>

<http://cpedia.com/search?q=william+s+burroughs>

Had they positioned this as an art project applying the cut-up technique to
the Web as a whole, it would be seen as a real success.

cpedia itself speaks:

"Along with JG Ballard, Angela Carter and Anna Kavan, William S. Burroughs has
probably done more to influence my writing, my reading, and my appreciation of
literature than any other writer."

(No attribution in article. I like to think of this as the software's own
appreciation showing through.)

