

Directed Edge (wheels' startup) launches public beta, finds related articles in Wikipedia - wheels
http://blog.directededge.com/2008/08/13/directed-edge-launches-recommender-engine-public-beta/

======
randomwalker
"We’ve got a super-fast in-house graph storage system that makes it possible
to do interesting stuff with graphs quickly, notably figure out which pages
are related."

I'm working with large graphs and I find that using a relational database as a
backend is horribly slow if you want to run somewhat complex graph algorithms.
Looks like everyone who does this ends up developing an in-house system.
Anyone know if there's a library out there for doing this sort of thing? The
order of magnitude I'm talking about is ~10^8 nodes, ~10^9 edges.

~~~
fizx
I think your best bet is an RDF tool like Jena or Sesame. 10^9 is pushing
these engines though. There's an HBase-SPARQL lib, but that's gotta be two
years away from stable.

For real world problems, I tend to write custom code that's Java NIO heavy.
Try to pack the data as efficiently as possible, and minimize disk seeks.

~~~
wheels
The tools that you mention are honestly not just one, but a couple orders of
magnitude too slow to do interesting applications on at this scale.

------
michael_dorfman
Count me as one of those people more interested in cool graph theory stuff
than Wikipedia. What else have you got up your sleeves?

Also, while we're talking about Wikipedia-- any plans to play around with
"shortest path" algorithms? For example, how few pages to I have to navigate
to get from, say, "Turing machines" to "Miles Davis"?

~~~
wheels
Wikipedia's just a big data set to practice on. I've been musing on
information graphs (and doing talks about such) since 2004 or so.

It also happens that I've done a bit of work in writing fast, embedded
databases, so when I realized that none of the off-the-shelf graph databases
that I found were fast enough for our needs, and that mapping graphs to an SQL
database was abysmally slow, I wrote a small one. Honestly, I was surprised
when I discovered some graph databases that _advertise_ being an order of
magnitude slower.

I'd love for our product to be a graph database, but I'm convinced that
product takes too long to get to the market. I'd need probably a year of just
concentrating on that, without, say, also having to do business stuff just to
get that ready for general purpose use. It'd also be cooler for geeking out
to, but I don't know if people would actually pay for it.

So right now we've got a database that's really fast at: traversing lots of
edges and filtering on tags. That's what we need for finding related stuff.

As for shortest path stuff ... well, it's never had much more than novelty
value for me. I don't see much of a business case for implementing it, and
well, we're a business.

~~~
michael_dorfman
I agree about the shortest path stuff being purely novelty stuff-- but I must
admit it was the first thing that came to my mind when I tried to connect
Wikipedia with graph theory.

Personally, I like playing around with graph theory, but I have yet to come up
with a compelling business case-- I wish you luck, and hope you find something
cool (and lucrative.) I'd probably be a customer of a fast graph database, but
I doubt there are a lot of other graph theory hobbyists out there.

~~~
wheels
PageRank was a graph algorithm that seems to have done pretty well. ;-)

~~~
michael_dorfman
Touché.

------
mattmcknight
It's a nice feature for browsing Wikipedia aimlessly. I'd be more interested
in an article about this: "We’ve got a super-fast in-house graph storage
system that makes it possible to do interesting stuff with graphs quickly,
notably figure out which pages are related.." Sounds cool, how does it work?

------
mrjbq7
You have a hugely improved layout and page design than the original wikipedia,
kudos.

And the related links feature makes ad-hoc browsing of wikipedia entries quite
useful, again kudos.

------
rst
Hmmm... simple stuff turned up mostly topical, so I tried Pierre Herme, a
famous French pastry chef (at least if you follow French pastry). The top
result was a stub article for a French immunologist; others include a musician
and a TV newscaster.

~~~
wheels
There are almost no links going in or out of his page, and since our algorithm
is mostly, as our name implies, graph based that means that there's not much
we can do there currently. He's the rare case of a reasonably well written
article that doesn't really use Wiki markup. I started to say, "We'll be
adding the French Wikipedia soon, it should be better there." But then I saw
that the English entry is just a translation of the French one, with the same
lack of links.

We'll be adding other techniques to our analysis over time though, this is
just our first step...

------
llimllib
So... I load up a page, and the benefit is supposed to be that I get 10 of the
links from it on the sidebar as "related links".

Why do I care, exactly? What's the value here?

~~~
wheels
The idea is that it's like Amazon recommending books or Last.fm recommending
music (though the technique is quite different). In practice, once you're used
to using it, it's a really fast way for jumping into a topic since you
immediately see clusters around an article -- i.e. if you don't know anything
about Literary Theory and want to figure out what the important articles and
authors in the area are you can do so quickly. (An example a friend of mine
used when looking for books on the topic, successfully, incidentally.)

My question would be, do you not associate what we're doing with
recommendations or not see recommendations as valuable?

~~~
dualogy
Cool for all the "knowledge management" aficionados, there's a whole industry
around that. Also potentially useful for intranets / collaboration systems --
if at some point you can offer your API offline, i.e. as a library callable
from code without having to depend on your web service. If I can package up
your algorithms with a product I'm deploying, I'll be happy to pay for it
(provided that real customer value is being added which for now I'm assuming).

------
nsrivast
Check out www.wikiwarp.com/go

 _made along with HN user jsomers_

