
Interactive map of all scientific papers from the arXiv - robjk
http://paperscape.org
======
robjk
We use HTML5 canvas elements to draw pregenerated tiles (bitmaps) together
with interactive overlays and underlays, such as the outlines and halos you
see when you click on or search for papers. The tiles are redrawn each morning
(using cairo) after the map has been updated with the new arXiv papers for
that day.

~~~
balsam
Is there any long-range order? For example, am I allowed to deduce from the
layout that mathematical physics is "the" bridge between condensed matter and
high energy physics? Would I also be able to use this graph to plan my career
(e.g. working on topics located at strategic points)?

~~~
robjk
The interface between hep-th and cond-mat is quite a mixed bag, and math-ph is
quite diffuse (the label just represents its centre of mass so to speak). So
I'm not sure if I would call math-ph the bridge between the two, but the map
is based on simple principles/forces so it's very open to interpretation. I
would say that hep-th is the bridge between a lot of the other categories,
namely hep-ph, astro-ph, gr-qc, quant-ph, cond-mat and math-ph. Also the
various interfaces between different categories are examples of long-range
structure. For example the field of dark matter is located where hep-ph meets
astro-ph, and the field of cosmological inflation occurs at the interface of
hep-th, astro-ph, gr-qc and hep-th.

------
robjk
A technical explanation of how the map is generated can be found at:
[http://blog.paperscape.org/?page_id=2](http://blog.paperscape.org/?page_id=2)
. And here are two introductory blog posts by Sean Carroll and Physics World:
[http://www.preposterousuniverse.com/blog/2013/08/17/a-map-
of...](http://www.preposterousuniverse.com/blog/2013/08/17/a-map-of-the-
research-literature/) , [http://blog.physicsworld.com/2013/08/16/welcome-to-
the-arxiv...](http://blog.physicsworld.com/2013/08/16/welcome-to-the-arxiv-
galaxy/)

------
manicbovine
Something is apparently wrong with the positions of these papers, in
mathematics at least. My own papers, as well as the key papers in my field,
are surrounded by unrelated papers.

~~~
robjk
Our success rate for extracting the reference information from papers in some
categories, notably math and cs, is still pretty low. This means we can't
place these papers very well and also that their radius is not a good
representation of their true number of citations. Furthermore, from a distance
these categories appear to lack structure. If we don't have references for a
paper, we use its keywords to place it, in which case it can certainly be
misplaced. Improving the reference extraction means building a better database
of the journals available to that field coupled with more robust regex.

------
mmcdan
Beautiful visualization. One thing missing though is a color-scale for age.
Difficult to tell what color represents older vs newer and what are the
magnitudes of the differences in age.

~~~
robjk
Thanks, good point, the colour-key box should change when the "age" colour
scheme is selected. For now: the newest papers are red and oldest papers are
gray, and, as you may have noticed, the gradient from gray to red is 1-1 but
not linear with age. For a bit more of an explanation see this post:
[http://blog.paperscape.org/?p=60](http://blog.paperscape.org/?p=60)

------
samograd
That's the most beautiful chart I've ever seen.

I found:

The A^(1)_M automata related to crystals of symmetric tensors G. Hatayama, K.
Hikami, R. Inoue, A. Kuniba, T. Takagi, T. Tokihiro J.Math.Phys. 42 (2001) 274
[Open paper's DOI page in a new tab] arXiv: math/9912209 [math.QA,nlin.SI]
[Open paper's PDF from the arXiv] [Open paper's arXiv page in a new tab]
inSPIRE: [Open paper's Inspire page in a new tab]

after just clicking around wondering if it was functional as a browser too :)

------
bane
Would it be correct to interpret the sparse areas as "needing more research"?
Picking a PhD topic can be difficult, a map like this could show where human
knowledge needs expanding.

~~~
beambot
You'd have to be really careful about that. arXiv.org isn't peer reviewed, so
there's a lot of crackpot science on there. For some of the best examples,
search for P!=NP proofs.

(It also depends on discipline. A lot of respectable math and physics
preprints first appear on arXiv.org and can be exceptionally solid.)

~~~
jmmcd
Moreover, it's skewed by which disciplines tend to publish in arXiv, as
opposed to more traditional venues, ie conferences and journals. Physics has
always been the main mover on arXiv, computer science surprisingly (?) rare.

------
bierik
Very cool! My scientific career ended some years back, but it's nice to see
the papers showing up. The size of the entries compared to others, however, is
kinda depressing.

------
mbq
Poor stat, cs and math look like a dark void without zooming-in; quite
expected though, the arXiv coverage of publications in those areas is still
pretty low.

~~~
carbon12
Yes, stat, cs and math researchers don't use arXiv as exclusively as
physicists. And we don't pick up references in those areas as well as we do
for physics papers, so they are small and not well placed. We plan to improve
the citation extraction for math and cs so they have a proper representation
in the map.

------
AliCollins
This appears to be a great tool, for instance, for visualizing how widely an
author has posted across a subject area. Nicely done!!

------
Gatsky
This is a wonderful way to examine the key references in a field. I wish
someone would do this to biomedical science/medicine.

~~~
carbon12
The problem is getting the data: a list of papers and their
references/bibliography. The great thing about the arXiv is that the papers
are open-access, that it is updated daily, and that the daily-update is
immediately available to data-mining. Is there a similar thing for med/biomed?

~~~
Gatsky
Unfortunately no. Many med/bio publications are subscription only. To do
something like paperscape, you would need a massive corpus of papers, which
would really cost a lot. Pubmed only has abstracts, not the citations from the
paper.

~~~
olihb
Right, but there's a subset of Open Access articles from Pubmed:
[http://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/](http://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/)

~~~
Gatsky
That's true, but coverage of seminal papers is not that great in Pubmed
central from my experience. This may change now that there is a push for all
publicly funded research to become open access immediately or after 1 or 2
years.

------
conjectures
Nice, what lib is generating the graphics?

