Hacker News new | past | comments | ask | show | jobs | submit login
Interactive map of all scientific papers from the arXiv (paperscape.org)
95 points by robjk on Sept 2, 2013 | hide | past | web | favorite | 25 comments

We use HTML5 canvas elements to draw pregenerated tiles (bitmaps) together with interactive overlays and underlays, such as the outlines and halos you see when you click on or search for papers. The tiles are redrawn each morning (using cairo) after the map has been updated with the new arXiv papers for that day.

Is there any long-range order? For example, am I allowed to deduce from the layout that mathematical physics is "the" bridge between condensed matter and high energy physics? Would I also be able to use this graph to plan my career (e.g. working on topics located at strategic points)?

The interface between hep-th and cond-mat is quite a mixed bag, and math-ph is quite diffuse (the label just represents its centre of mass so to speak). So I'm not sure if I would call math-ph the bridge between the two, but the map is based on simple principles/forces so it's very open to interpretation. I would say that hep-th is the bridge between a lot of the other categories, namely hep-ph, astro-ph, gr-qc, quant-ph, cond-mat and math-ph. Also the various interfaces between different categories are examples of long-range structure. For example the field of dark matter is located where hep-ph meets astro-ph, and the field of cosmological inflation occurs at the interface of hep-th, astro-ph, gr-qc and hep-th.

A technical explanation of how the map is generated can be found at: http://blog.paperscape.org/?page_id=2 . And here are two introductory blog posts by Sean Carroll and Physics World: http://www.preposterousuniverse.com/blog/2013/08/17/a-map-of... , http://blog.physicsworld.com/2013/08/16/welcome-to-the-arxiv...

Something is apparently wrong with the positions of these papers, in mathematics at least. My own papers, as well as the key papers in my field, are surrounded by unrelated papers.

Our success rate for extracting the reference information from papers in some categories, notably math and cs, is still pretty low. This means we can't place these papers very well and also that their radius is not a good representation of their true number of citations. Furthermore, from a distance these categories appear to lack structure. If we don't have references for a paper, we use its keywords to place it, in which case it can certainly be misplaced. Improving the reference extraction means building a better database of the journals available to that field coupled with more robust regex.

Beautiful visualization. One thing missing though is a color-scale for age. Difficult to tell what color represents older vs newer and what are the magnitudes of the differences in age.

Thanks, good point, the colour-key box should change when the "age" colour scheme is selected. For now: the newest papers are red and oldest papers are gray, and, as you may have noticed, the gradient from gray to red is 1-1 but not linear with age. For a bit more of an explanation see this post: http://blog.paperscape.org/?p=60

That's the most beautiful chart I've ever seen.

I found:

The A^(1)_M automata related to crystals of symmetric tensors G. Hatayama, K. Hikami, R. Inoue, A. Kuniba, T. Takagi, T. Tokihiro J.Math.Phys. 42 (2001) 274 [Open paper's DOI page in a new tab] arXiv: math/9912209 [math.QA,nlin.SI] [Open paper's PDF from the arXiv] [Open paper's arXiv page in a new tab] inSPIRE: [Open paper's Inspire page in a new tab]

after just clicking around wondering if it was functional as a browser too :)

Would it be correct to interpret the sparse areas as "needing more research"? Picking a PhD topic can be difficult, a map like this could show where human knowledge needs expanding.

You'd have to be really careful about that. arXiv.org isn't peer reviewed, so there's a lot of crackpot science on there. For some of the best examples, search for P!=NP proofs.

(It also depends on discipline. A lot of respectable math and physics preprints first appear on arXiv.org and can be exceptionally solid.)

Moreover, it's skewed by which disciplines tend to publish in arXiv, as opposed to more traditional venues, ie conferences and journals. Physics has always been the main mover on arXiv, computer science surprisingly (?) rare.

Very cool! My scientific career ended some years back, but it's nice to see the papers showing up. The size of the entries compared to others, however, is kinda depressing.

Poor stat, cs and math look like a dark void without zooming-in; quite expected though, the arXiv coverage of publications in those areas is still pretty low.

Yes, stat, cs and math researchers don't use arXiv as exclusively as physicists. And we don't pick up references in those areas as well as we do for physics papers, so they are small and not well placed. We plan to improve the citation extraction for math and cs so they have a proper representation in the map.

This appears to be a great tool, for instance, for visualizing how widely an author has posted across a subject area. Nicely done!!

This is a wonderful way to examine the key references in a field. I wish someone would do this to biomedical science/medicine.

The problem is getting the data: a list of papers and their references/bibliography. The great thing about the arXiv is that the papers are open-access, that it is updated daily, and that the daily-update is immediately available to data-mining. Is there a similar thing for med/biomed?

Unfortunately no. Many med/bio publications are subscription only. To do something like paperscape, you would need a massive corpus of papers, which would really cost a lot. Pubmed only has abstracts, not the citations from the paper.

Right, but there's a subset of Open Access articles from Pubmed: http://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/

That's true, but coverage of seminal papers is not that great in Pubmed central from my experience. This may change now that there is a push for all publicly funded research to become open access immediately or after 1 or 2 years.

Thanks for the link. We will look into the Pubmed data and see if any of it can be included in the map.

There is always pubmed[1].

[1]: http://www.ncbi.nlm.nih.gov/pubmed/

On http://scicurve.com/ you can visualize networks of papers based on biomedical data.

Nice, what lib is generating the graphics?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact