

Links between Paul Graham's essays - revisited - RiderOfGiraffes
http://www.solipsys.co.uk/new/PaulGrahamEssays.html?YC_News

======
Eliezer
Is there a standard tool that does this? Especially for, shall we say, _highly
interconnected_ webpages? I need it for Overcoming Bias.

~~~
jjs
Yes, it's called python.

Write a crawler that examines all local links, and builds a graph (it needn't
be anything fancy; a collection of nodes and edges is enough, but you'll have
to detect cycles when building the graph), and then print it out in graphviz
.dot language.

Then you can feed it into neato [either through a command-line pipe, or
construct one using Python's subprocess module], _et voilà_ , instant pagemap!
:)

You can also use some of the rendering modes and scrape position data for the
rendered nodes, if you want to generate a clickable imagemap.

~~~
jey
Here's a first cut implementation in ruby. Requires the "hpricot" gem. Outputs
a dot file to stdout. <http://gist.github.com/35841>

Output from the above is at: <http://jey.kottalam.net/tmp/obgraph.out>

I tried rendering this with "dot -Tcmapx -oob.map -Tgif -oob.gif" but dot
segfaulted after 70 minutes. The code as posted at gist only outputs nodes for
articles written by Eliezer in an attempt to make the dot file a manageable
size. Tips/fixes appreciated.

~~~
Eliezer
Yeah, it's that thing with the segfault that I worried about. "Highly
interconnected." Thanks for the graph, which I may be able to use for my own
purposes even if I can't show it.

There's got to be standard graphing tools for things that are highly
connected...

~~~
khafra
When Daniel J. Bernstein made his DAGs of standard crypto algorithms
(<http://cr.yp.to/cipherdag/cipherdag-20070630.pdf>) he mentioned that they'd
crashed every standard drawing tool he tried. He makes a reference to "...my
own drawing tools, which are much more careful in their use of memory."
Unfortunately, I'm not sure where to obtain DJB's drawing tools.

------
markessien
A tool like this would be ideal in locating the moment when a person stops
producing original thought, and starts repacking his old thoughts in new
structures.

~~~
jyothi
One or two links to an old post would not deteriorate the quality or freshness
of content. In fact a lot of times one might be interested to get broader idea
of the author stream of thoughts, for the ones not interested thye just have
to blind to hyperlinks.

May be it would become restructured content if there are too many outward
references like the... "Undergraduate" post. But even that when you look into
this post the hyperlinks are for keywords like "essays", "hacker", "computer
science", what you "love", hack a "blub" on windows ..

------
shaunxcode
This is cool - led me to <http://www.paulgraham.com/javacover.html> in which I
read "Historically, languages designed for other people to use have been bad"
which makes me feel a little less insane for spending the morning working on
my "pet language".

------
jyothi
In your words what does this graph imply ? I would be great if PG himself can
see something through this and respond.

My take: \- The most referenced (high in-links) posts are more dear to PG or
most popular/strong in his line of thought.

\- The ones with high outgoing links at times might mean they are restructured
thought as 'markessien' noted.

\- For a personal blog like PG's does higher interlinking reduce readability
or make the content richer ? I guess the reader chooses this.

~~~
pg
It measures some combination of how general an essay is, how early it was
written (I usually add these links when I first write something, so most links
are only backward in time), how much I like it, and sheer randomness (at least
half the time I don't bother to add crosslinks).

Larger numbers of outgoing links probably don't imply recycling. If anything
it's a sign I think an essay is good and will get lots of readers, so it's
worth trying to spread that traffic around. (These predictions are often
wrong, though. I find it practically impossible to guess which essays will get
lots of traffic.)

~~~
RiderOfGiraffes
Is it not worth considering putting "forward" links into older essays, so that
when you write an essay it gets referenced by other essays, even when they're
older?

Perhaps you don't have time, but would you consider adding links that other
people suggest?

It would be interesting to do to this what I do with my SiteMap, and colour
nodes darker if they have more traffic.

<http://www.solipsys.co.uk/new/SiteMapExpt.html?YC_News>

------
gibsonf1
I like the idea, but it would be more meaningful if you could link for
semantic reasons - that would be a very nice tool, could even be used by
authors who want to put a book together from various essays they've written,
etc. Or help you focus on an area of writing you're interested in by the
author. I mention this because looking at the graph you can see that things
that are related are not necessarily linked.

~~~
RiderOfGiraffes
This would be really cool. Can you suggest a way, given two essays, to decide
if, semantically, the first should point to the second? If an author could use
a tool like that it would mean that links could be discovered rather than
inserted by hand.

That would be valuable ... you've made me think ... I may have a way of doing
something close enough.

Hmm.

~~~
zupatol
Latent semantic analysis can be used to measure the similarity of two
documents.

<http://en.wikipedia.org/wiki/Latent_semantic_analysis>

I found examples in ruby and python in this blog, which is unfortunately down
just at the moment [http://blog.josephwilk.net/ruby/latent-semantic-analysis-
in-...](http://blog.josephwilk.net/ruby/latent-semantic-analysis-in-ruby.html)

~~~
Prrometheus
Gregor Heinrich has a good paper on Latent Dirichlet Allocation, which I
believe is an extension of Latent Semantic Analysis. It is a model which can
be used to group documents based on semantic content. He gives the
mathematical details and the "punchlines" for implementation. The model takes
as input a collection of documents and outputs a topic label for each word in
each document. The documents can be plotted in K-dimensional space, where K is
the number of possible topics, by using the proportion of each topic in a
document as its coordinates. Documents which are closer to each other have
topics more similar to each other. You could then use your favorite clustering
algorithm or a simple distance threshold to decide which should documents
should link to each other.

The paper is here [PDF]: <http://www.arbylon.net/publications/text-est.pdf>

C++ implementation: <http://gibbslda.sourceforge.net/>

(note, I haven't used the C++ implementation).

~~~
zupatol
That looks even better. I'm going to try this out in my pet project.

Thank you.

------
kylec
I wonder if someone can use the PageRank algorithm or something similar to
determine which essays are the most prominent. It's hard to tell with this
graph which is the most connected essay - is it "Taste for Makers", "What You
Can't Say", "Great Hackers", or something else? It would also be interesting
to correlate this with the Google PageRank for these pages, and traffic data
(if it's available).

~~~
pg
Here are the ones that get the most traffic, with daily page views:

    
    
         1760 Stuff
          770 What You'll Wish You'd Known
          403 How to Start a Startup
          365 Why to Start a Startup in a Bad Economy
          349 Why Nerds are Unpopular
          294 How to Do what You Love
          274 Web 2.0
          247 Great Hackers
          207 How to Make Wealth
          187 Lies We Tell Kids

~~~
ph0rque
"Stuff" is my sending the given link to my female relatives to dissuade them
from, or to reduce their practice of, being packrats. I'm pretty sure there
are other users out there that do the same...

~~~
pg
You're right. There are rarely referring urls for it, which implies that it
spreads primarily by email.

------
RiderOfGiraffes
Updated again to include new essays. Also, the RSS feed has been removed, and
a couple of other spurious links, just 'cos I was there and could do it.

I'm still interested in additional links based on semantics.

------
paulgb
Awesome!

One small thing, it looks like you included the RSS link from the bottom of
the page in the list of essays.

~~~
RiderOfGiraffes
Re RSS: Hmm. Oh well, Wabi Sabi.

