Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Arxiv Vanity – Read academic papers from Arxiv as responsive web pages (arxiv-vanity.com)
721 points by bfirsh on Oct 23, 2017 | hide | past | favorite | 134 comments



We were frustrated by the experience of reading machine learning papers on screens (particularly phones/tablets). There are lots of good tools for authoring HTML papers (Distill, Authorea, etc) but nothing that deals with the vast number of PDF papers that already exist.

So, we built Arxiv Vanity: a site that renders Arxiv papers as web pages. It’s still pretty janky, but for the papers that do render correctly, the experience is so much better than reading a PDF. For example:

https://www.arxiv-vanity.com/papers/1705.04085v3/

https://www.arxiv-vanity.com/papers/1708.00884/

https://www.arxiv-vanity.com/papers/1705.06031v2/

The source for the LaTeX to HTML renderer is on GitHub[0]. It’s built on Pandoc[1] and Distill.pub’s template[2].

[0] https://github.com/arxiv-vanity/engrafo

[1] https://pandoc.org

[2] https://github.com/distillpub/template


One of the things that I came across when writing my own janky pdf/latex->html converter for lecture notes[0] is that Pandoc doesn't handle references and subfigures correctly, even with pandoc-crossref and pandoc-citeproc enabled. I had to write a little python module[1] that used regex to extract those and then handle them on my own separately... This is definitely something you should look at.

[0] https://dmaitre.phyip3.dur.ac.uk/NPP/notes/ [1] https://github.com/JBorrow/latex-pandoc-preprocessor


I was looking for a way to turn my (soon-to-be-defended) PhD thesis into an epub, and investigated the various LaTeX2Html converters. I was pretty disappointed when I realized that all of them are terrible and have no hope of handling my manuscript. My current solution is to create a rendering of my thesis in a5 format. :/

This look quite a bit better, so here is the question: what do you not support at the moment?


Have you seen pandoc? That should be able to do that and the comments about results are usually positive.

https://pandoc.org/epub.html

One final thing, and wildly off topic, is that when you do your defense, remember that you probably know more about the specifics of the subject than anyone else in the room. Many folks stress over it, but you're almost certainly going to be the actual expert in the room. Good luck!


A lot of things. LaTeX and its packages have so much surface area. Our approach so far is to just make the papers that we read readable. That probably covers the 20% of LaTeX features that 80% of people use.

Here is the broken stuff we are keeping track of: https://github.com/arxiv-vanity/engrafo/issues (feel free to add to it!)


Is there a reason for relying on pandocfilters instead of on Panflute [1]?

I would think that panflute would allow for more readable code, which helps whendealing with all the corner cases and rough edges of latex.

[1] https://github.com/sergiocorreia/panflute


Because we didn't know that existed! That looks so much better, thank you. The pandocfilters library is really hard to use.

https://github.com/arxiv-vanity/engrafo/issues/160


I'm not sure I understand. Well, I understand what you're doing but I'm not sure why you'd dislike PDF.

PDF has the great benefit of rendering the same on every system. With very few exceptions, PDF will look exactly the same on every system and will print the same on every system.

HTML doesn't really have that same benefit.

Don't get me wrong, I think your service is a great idea for those who would like HTML formatted results, but I'm not understanding the complaint about PDF.

Could you expand on why you don't like PDF?


Pdf pages are usually based on A4 size which is 210mm wide. Even at full size the writing is often tiny. Once rendered on a 10cm wide screen (landscape) it's pretty darn hard to read.

Also in general the mobile pdf reading experience sucks.

For example you have to download a file (rather than browse to) on Android and the hunt it down to open it.

The pdf readers I've used easily accidentally scroll you to a random page if you make a mistake in where you touch the screen. Kindles probably the best but then you have to email yourself the pdf which is a hassle.


IMO, Reading two-column papers on an iPhone (through PDF) is a real pain -- IMO the format relies on you using your eyes to jump from bottom left to top right, rather than having to scroll from the very bottom to the very top (diagonally). Same problem even exists for single-column styles -- you need to zoom in so much that you have to scroll horizontally as well as vertically.

The need to scroll doesn't exist on a large screen or on a piece of A4, but on smaller devices like mobile phones or even tablets, it's annoying. Having a responsive page means you can scroll vertically as you read, rather than having to make a big jumps (or constant horizontal scrolls) that can really break the flow.


I wonder if that's a personal thing? Over the past year, I've been trying to join the mobile revolution - sort of. The majority of my browsing is now done on a tablet.

I read quite a few PDFs and don't actually have any complaints. I am not personally seeing any readability issues and don't mind consuming PDFs at all.

That said, I think I now understand your complaint. Thanks! I just don't personally have any trouble with it. I use multiple tablets, of varied sizes, and I've had good experiences with all of the devices. While some PDFs are horribly formatted, I find that the device choice doesn't help that and it's a design choice from the author.

But, again, thanks for helping me understand.


> The majority of my browsing is now done on a tablet.

Reading PDFs on a tablet isn't too bad because of large screen real estate.

Reading PDFs on a small mobile phone requires me to zoom in to make the font big enough for me to read, and then I have to scroll right to read, and left and down to move to a new section of the column.

Try reading a PDF on a smaller device than a tablet. I'm sure you'll be able to see what we mean.


Two column papers are the worst format. They emphasize compact printability in a world where no one buys proceedings.


I consume almost all my media on phone. The problem with pdf is precisely that it renders the same on every screen - this makes most PDFs virtually unusable on the phone as you have to scroll down one column in a page, then up for the second column etc.


Not the OP, but PDF is bad on tablets and horrible on phones.


What size tablets are people talking about? I find the iPad size practically ideal for consuming PDF papers.


I think part of the issue is that PDF isn't responsive to the size of the device. A PDF is not much more than an image from the perspective of layout. I'd love to be able to reflow text from a PDF such that a single column fills my screen edge-to-edge and scrolling allows me to advance through the paper, as opposed to requiring me to reposition the viewport every time I reach the end of a column. I know this isn't the purpose of PDFs, and I love them in different contexts where layout (including typography) does matter to me. But I also really want to be able to easily consume papers in a way that isn't constrained by the PDF layout.

Yes, I want my cake and a pony. Cakepony.


Fwiw, this conversation greatly varies depending on who is doing the reading -- the rather banal fact is that the average 25-year-old student has much different ability to screen-read than the average 60-year-old professor (or a 60-year-old student, for that matter) :)

so no need to search tablet specs for the culprit. PEBCAK :)


The screen of a tablet is large enough to display a PDF. But PDFs are split into pages. That's perfect with paper, where we flip pages. It's very unnatural on screens, especially touchscreens, where we use vertical scrolling to move around.

Then there are minor issues of margins, possibly zooming to make text readable, etc.

That's why PDFs are so bad on mobile. The ideal format is one column text, figures and tables between paragraphs of that column, no page breaks, bidirectional links to notes. That's HTML, I guess.


Yeah fair point, I use an iPad mini. But I have heard similar complaints from older folks (40+) who have full-size iPads. I think much of it stems from dual-column printing, which is just kind of antiquated/annoying on digital.


Yup. I agree. I even find the experience fantastic on the original iPad, as well as a brand new one. I find it just fine on my phone, which isn't nearly as large a screen.

I am guessing it is an individual taste thing. That makes some sense.


> HTML doesn't really have that same benefit.

Yes, it does.


Not really, look at the myriad rendering issues between the most popular browsers. PDF should result in pixel-for-pixel reproducibility, browsers don't do that in practice.

That's why we still test pages in different browsers and end up using browser specific code to ensure proper rendering - which often only reaches the 'close enough' format.


Not sure if pixel-perfect matters for consuming papers.


No way. We're testing in multiple browsers because of Javascript only.


> particularly phones/tablets

I understand the problem with a phone, but PDFs on an ipad/tablet are beautiful and a joy to read. Much better to read the text as originally typeset than to put it through a process such as this which risks corrupting minor but important details in the mathematical content.

On my phone I put it in landscape mode and that allows me to read a PDF OK, but I don't really get why one would read academic papers on a phone, why not use a tablet?

However I'm very interested in engrafo. It sounds like it will allow me to automatically publish blog style content from my LaTeX sources without having to fork the LaTeX content into a markdown / HTML version.

I just don't understand why you don't like reading academic papers as PDFs on tablets!


This is cool, it would be nice to have a chrome extension to take me directly to this from the page/pdf.


I actually made one quickly, published here: https://chrome.google.com/webstore/detail/arxiv-vanity-plugi... . It injects the arxiv vanity link on abstract pages and if you click the button when viewing an online arxiv pdf it opens the respective arxiv vanity link.


Yes! This is a great idea, and something we have been thinking of. https://github.com/arxiv-vanity/arxiv-vanity/issues/67


It would be amazing if we could browse "Latest" by category, and for a certain day, much like: https://arxiv.org/list/math.NT/recent


It's very nice. You should expand to cover bioRxiv (biology) too.


bioRxiv doesn't expose LateX files, they explicitly only use PDFs to make things easier. Which means you're going to need to reflow PDFs (a la https://docushow.com/), and I would guess there are a lot more edge cases there


Thanks for linking to https://docushow.com Also a work in progress, but PDF reflow is a hard problem so you never ship if you want to solve all cases :)

Your solution using the LaTex source generates really nice HTML, congrats!


Lovely idea, and I can't wait until it gains super deep-learning smarts and gets everything perfect :-)

For now, it's hard to read: https://docushow.com/viewdoc?url=https%3A%2F%2Farxiv.org%2Fp...


That first article and how it looks when imported into Authorea in one click: https://www.authorea.com/users/3/articles/208068-automatic-e... (just a couple of labels and si units which do not render). Note: it is forkable and can be commented upon.


In all three cases I find the original PDFs more pleasant to read. HTML typography is not up to snuff. I read them on a laptop, however, and I can see that this would be useful if one is forced to read on a phone.

(One thing that is very ugly in the PDFs, and most scholarly papers, is the use of different-colored boxes for hyperlinks. Authors, please consider putting

\usepackage[colorlinks]{hyperref}

in your LaTeX preambles.)


For me the PDF was fuzzier https://imgur.com/a/WJ5y3 and the HTML version was more convenient to read in a single column. The two-column format is nice if I'm skimming to see if a paper is going to be interesting, but when I sit down to read it the HTML version definitely wins.


Just tried your suggestion: it ends up looking much uglier with font colors imo.


The default saturated colors are a bit garish. But you can set them to be anything you want. See the hyperref documentation.


This is really cool and has a lot of potential. Academic papers are dense and heavily cross-referenced, so experimenting with new display formats that do more to help the reader could make researchers a lot more productive. For example, citation tooltips are a big time saver compared to cross-referencing the bibliography. However, it's also beneficial for every paper to look the same because this makes skimming easier. To get both innovation and consistency is to develop tools, like Arxiv Vanity, that automatically transform the source document. This example makes me hopeful that we'll someday have similar tools for the commercial publishers' papers.

As for immediate tweaks, I tentatively suggest making the text 100% black (like the original PDF) instead of rgba(0, 0, 0, 0.8). The higher contrast will help those of us with less-than-great eyes.


This is amazing. I hope you'll keep working on it. There's always a long tail of details that need taking care of when trying to cover a large corpus, and ploughing through successive 80%'s is (as you are no doubt acutely aware) serious grunt work. But you've made a fabulous start, so I hope you find the stamina to do it!


Yeah, even building upon Pandoc's LaTeX parsing, 3 months of grunt work got us this 20% working. Over the next 12 months we'll get the other 80% working. :)


A big challenge is to get references working correctly. LaTeXML is quite good at converting latex documents to html [1], including references such as Theorem 2.1, equation (8.1) etc.

For instance, the paper [2] appears to be quite readable on mobile, and clicking/tapping on a reference such as (8.1) leads you to equation (8.1) as you would expect.

The auto-generation of Arxiv-Vanity is really nice, maybe it would be easy to add the LatexML output too?

[1]: http://www.albany.edu/~hammond/demos/Html5/arXiv/lxmlexample...

[2]: http://www.albany.edu/~hammond/demos/Html5/arXiv/LaTeXML/110...


This is an awesome tool. Thanks!

Only issue I've run into so far is that cross-references to theorem numbers don't seem to always work correctly, e.g. you'll see a lot of "Theorem ?" in https://www.arxiv-vanity.com/papers/1607.06711/.


Ah, looks like we don't support theorems. You can track it here: https://github.com/arxiv-vanity/engrafo/issues/157

Thanks!


Not sure if all these are x-refs to theorems or not, but there seem to be lots of [?] links: https://www.arxiv-vanity.com/papers/1602.08927/

That said, on cursory look, this is pretty impressive. latex->web converters have existed for a long time, and this appears to have navigated some aspects quite well!



We haven't implemented many of the LaTeX packages used in papers that aren't machine learning papers yet - sorry. :(


This looks really cool. (The program had some issues with the bibliography and with custom layout, but other than that, was great.)

It would be nice if an option to output MathML existed.

(Why MathML?

In brief, it allows treating Maths as a first-class citizen on the web.

For instance, with MathML the reader can choose what font the equations will be rendered in — if you prefer STIX or Latin Modern Math, then you can specify it with CSS, and the browser will correctly render it. With the mash of spans within spans that arXiv-vanity uses, you couldn't change the font, as then the pre-calculated spacings would be wrong. (Alternatively, the publisher could easily offer several styles, without having to re-render everything, just by changing the CSS.)

Arguably, client-side MathJax offers the same flexibility as MathML, but it's much, much slower, while rendering MathML in firefox is as fast as rendering standard, static HTML.

Another application of MathML is embedding it in SVGs for beautiful graphs.

MathML can also be pasted into other applications that support it, such as Thunderbird and Mathematica. )


This is awesome! I was literally rolling my eyes this morning about trying to read an arXiv paper on my phone.


shameless plug: give https://docushow.com a try!


Great work.

I've also been working on a similar open-source project "Sharead".

https://github.com/strin/sharead

It has a chrome extension that uploads Arxiv papers, and you can manage papers with tags.

It also automatically converts pdf to HTML using a library called pdf2html:

https://github.com/coolwanglu/pdf2htmlEX


Looks like it's failing to process some standard tex commands (e.g. \textup) as well as some user defined macros. See the many display errors in https://www.arxiv-vanity.com/papers/1710.07406/

Of course, it goes without saying that I want this.


When the render fails, why are you redirecting to the pdf file intead of redirecting to the abstract? E.g. here (link stolen from another comment in this page) https://www.arxiv-vanity.com/papers/1608.04012/


Noob question but how far does Calibre take pdf to epub conversion? I've been really interested in learning more about the epub file format and was greatly intrigued to discover it extends xhtml and is essentially a zip folder if I've gathered that much correctly.


Unfortunately it's failing on first things I tried with a not-so-helpful error message:

https://www.arxiv-vanity.com/papers/1608.04012/

https://www.arxiv-vanity.com/papers/0903.3065/

Also a lot of MathJax failures (maybe Latex variables names?) https://www.arxiv-vanity.com/papers/1709.09439/


Those problems are normally Pandoc parsing errors. Considering it's open source, perhaps we should print the error message so people can actually help fix it...

The MathJax failures are either things that MathJax doesn't support, or use of \DeclareMathOperator which we haven't added support for yet.

Edit: Added a more useful error message. :) https://www.arxiv-vanity.com/papers/1608.04012/


Thank you thank you thank you! I detest the reader-hostile PDF (and WTF? why would you write something and then make it inconvenient to read???)

Unfortunately, among its sins, PDF discards a lot of the presentation semantics (headers, footnotes etc). Congrats on doing a credible job trying to reconstruct some of that! It's a tough, thankless job.

I was horrified when Adobe introduced PDF and indeed it has turned out at least as badly as I had feared.


I believe it's reconstructed from the Latex source, which is how every paper is submitted to the ArXiv. Not to diminish this site but I'm guessing that generating HTML from Latex is a lot easier than doing it from PDF format.


Thanks. Soon I suppose we'll be able to run a LaTeX->TeX rendering system compiled to web assembly!


I am curious, would you mind explaining why you dislike PDF?

(I'm an academic and I'm used to PDFs and I like them myself.)


This is awesome ! Going on my home screen now. Love the design. Maybe you could ask Arxiv to have a button on their site that would direct it so that it opens on your site .


This is so good. I do prefer the HTML over PDF in this scenario.


Not really my use case as I read PDF papers on an iPad Pro 12.9 inch, which is just fine, but very neat work!

I tried it on this one: https://www.arxiv-vanity.com/papers/1702.03277/

Some commands don't work (\textsl, \rotatebox, ...) and the thank you footnote is incorporated into the title, but otherwise very readable!


This is so much awesome. Thanks for building this.


Nice! PDF is the worst format I can think of to present papers. Especially for reading on mobile this will be of great help.


PDF is the only format that will preserve the typographical details that are important in many technical papers; it also avoids the relatively bad rendering created by the browsers.

PDF is usually bad, of course, on small screens, unless the publisher makes special versions.


Another useful thing is ability to put comments/annotations - I'm reading on iPad, and annotate quite a lot


I think this is a dramatic difference of opinion between coders and everyone else in science, if you excuse the generalization. I say that because of the large number of tools to try to one-up the PDF for academic scholarship (e.g. Readcube). Publishers looking to invent ways to define their own value have been slowly trying to force these things onto readers, but almost everyone in my field hates them and wishes they would die. Formatting and typography are critical in many fields, and a PDF is the canonical way to maintain these aspects of a paper.


Definitely useful in certain situations. Can't comment on how well the conversion works (yet) but I can see how this might be useful to a lot of people.

Me? I still mostly prefer reading physical academic papers because of needing to flip back and forth for re-reading (clarification) and adding personal notes/graphs/calculations.

Good job guys.


Tried three papers: "Could not find Arxiv ID in that URL. Are you sure it's an arxiv.org URL?" Why, yes, I copied-and-pasted directly from the browser tab showing the Arxiv URL.

Tried a couple other papers: "This paper failed to render. Take a look at the original PDF instead."

So...with what probability does this actually work?


What Arxiv URLs were you trying?

LaTeX is really tricky to parse, which is why you're seeing those "failed to render" errors. Judging from our logs, it works about ~80% of the time. That's up a lot from plain Pandoc though - it could render hardly anything from Arxiv.


I tried some more and still got the "are you sure this is a valid URL?" e.g., https://arxiv.org/abs/gr-qc/0702106

Tried https://arxiv.org/abs/1511.06343 and a couple others and got the "failed to render."

Tried cloning engrafo, then installing docker, then building engrafo, then my disk was filling up and decided I'm done with this for now.


Sad to see that it doesn't render the figures properly for this paper: https://www.arxiv-vanity.com/papers/1701.03757v1/

I hope this can be made to work reliably. I generally prefer web pages to pdfs.


LaTeX is hard, but we'll keep at it! That issue is being tracked here: https://github.com/arxiv-vanity/engrafo/issues/12


For bitmap based PDFs it would be possible to segment the document into words and images (just bounding boxes, not OCR), then "reflow" them to a different page size, by allocating less words per row.

Does anyone know if this kind of PDF reader exists? Such a PDF reflow reader would work on scanned old books.


I'm working on it, email me for early access.


Hallejuhah, we need to force academics and people who go around touting "whitepapers" to be ripped out of the proprietary PDF era of terrible UX/readability and into modern readable web documents. This is definitely the way to do it, this is brilliant!


FWIW, PDF is an open standard. There exist both open and proprietary readers and creators. It has been an open standard for nearly a decade.

https://en.wikipedia.org/wiki/Portable_Document_Format


I would pay at least 10$ for an app that made an aesthetically-pleasing HTML flow of any pdf document I’m looking at. At least. And I can’t imagine I’d be the only one. So much of my reading is on-screen now, and PDFs do kind of suck for two column reading.


What kind of pricing schema would you prefer for this? Monthly or one time payment?


It definitely feels like a one time payment, since my mental model for it would be “pdf reader.” Also, since I’m accustomed to SaaS in my business dealings, but detest it in the private consumer space, and this would appeal to me as a consumer; my business documents are already mostly PowerPoints, with the occasional word doc.


If you're on Android or an ereader and frequently read two-column PDFs, consider using Koreader: https://github.com/koreader/koreader


This is a superb idea! Still not working on some of the papers I read, but hopefully it will soon.

I would love to see a bookmarklet that lets me hop from an arxiv page straight to Arxiv Vanity.

Also, the manicure emoji for the favicon was a great choice!


Great tool for mobile! Have to say I'm too used to PDFs on my laptop, although I can see this replacing it. Two suggestions-

1. Center the text to the screen 2. Justify the text

(I'm not sure how difficult these are though)


Really like this! I'm working on something similar, a generic PDF to HTML converter that enables reflowing of documents on a mobile device.

Any recommendations for HTML templates other than the distill.pub one?


Great job!

Personally, I prefer the PDF versions, but this could be very useful on a phone.


Awesome thank you so much :) The GAN paper somehow feels more readable :D


Thanks! This is useful.

I always have trouble reading papers on Kindle, as the screen is small. panning and zooming are also painful as the device is slow.

I kinda hope papers can be turned into single column (more kindle friendly.)


You might add a contact link so people can easily get in touch. The first-listed personal homepage is kind of insane with all the flashing colors, and it contains dead links (e.g., your Twitter).


Yeah, that's a good point. Andreas's site is a bit out of date. I've made them link to more useful places. Thanks!


I've a large collection of scientific papers and interesting publications all in PDF format, and I've never had a problem reading a PDF document.


Now, if this can be integrated with arXiv-sanity.com


So I tried this on a couple of papers I need to read: awesome awesome awesome awesome. Would upvote a thousand times if I could. Awesome.


Looks cool although the first paper I tried it on other than the examples didn't work :( Looking forward to seeing this improve!


Please make the References section hyperlinked so clicking on a reference takes you to the paper.


So much this. This is one of the main reasons I wanted to build this.

https://github.com/arxiv-vanity/engrafo/issues/127


It's also one of the main reasons Larry built Google.


An excellent project, though maybe Sakura CSS would be a lighter alternative to distill...


It will be great if I can bookmark the position where I quit reading a paper.


Looks great! Would be even better if the references are urls to other papers.


That looks great! Well done!


Narrow column would be easier. Full screen width is not best practice.


Somebody please, PLEASE do this for Project Gutenburg.


I just visited Project Gutenberg and tried opening a few books. There seems to be a HTML option, what did you mean?


They are inconsistently formatted, and the default formatting usually takes up the whole page width, making it hard to actually read.

I guess I could probably solve this with a custom stylesheet, though.


Really neat!! Thanks for sharing :)


I love this, thank you so much!


This will save so many trees.


Why "Vanity"?


A bit of wordplay on our favourite Arxiv tool. :) http://arxiv-sanity.com/


fuck that's cool


This site is nice even for mobile!

Shameless plug: I made an Android app for arXiv if anyone wants something simple to search articles on mobile. Graduating soon so if you try and enjoy it, any positive (but honest) views help the looming job search ;)

https://play.google.com/store/apps/details?id=xyz.imaginatri...



I have to admit I am not impressed. My first paper, which I tried to render, does not work properly; references are removed and rendered poorly, figures are misplaced and tables incomplete. Given that not all arXiv papers are under a permissive license and you do not have permission to do this, I would much prefer if you at least made sure that arxiv-vanity rendered papers do not show up in search results, e.g. by offering a suitable robots.txt and with a bigger link to the author-endorsed version of the paper.

Edit to clarify: If people want to use or develop a broken sort-of-PDF viewer, that’s fine. However, if someone searches for a paper of mine, I would like them to only find the version where I at least had a chance to see that it renders correctly and is complete. In particular, I do not want to be "responsible" for broken rendering on random third-party websites. This website actually operating illegally does not make me more inclined to support it.


"Some papers do not render correctly, for example figures and tables in mine [1][2]. Beside that, some articles may not be posted under a permissive license, so you want to double check that you're not running into copyright violation troubles by modifying or publishing them.

[1] https://... [2] https://..."

Sounds a lot more positive and might get better results than being as adversarial and negative (imo) as the original comment?


I tried a bunch of papers that weren't yours and they came out really well.


This is fair. If somebody stumbles across this thinking it is how you intended it to be displayed, I can understand you'd be unhappy. We should make it clearer that we're just a conversion tool, not a source.

If you want us to remove your paper and just point at the PDF, we're happy to do so. My email's in my profile if you don't want to post the broken render here!


Thank you for your reply. Ideally I’d prefer for you to respect the license associated to each paper and only re-compile and re-host if the license actually allows you to do that (i.e. CC0, CC-BY, CC-BY-SA and maybe CC-BY-NC-SA, depending on whether you think you act commercially).

I also don’t want to keep tabs on every arXiv rehoster and inform them manually by e-mail every time a new paper goes up.

May I ask why this was not done together with the arXiv itself? I.e. have the infrastructure run there, let authors check the HTML render at the same time as the PDF render and then, if the author thinks they look ok, have them show directly on the abstracts page? This would even avoid all your license problems, as the arXiv already has the corresponding license!


I believe in most countries only a court can decide if a site is illegal or not. Not you. And as far as I know this is true in both France [0] and Germany[1].

IANAL.

[0] I got the impression it was a French site

[1] just guessing where you live


Each arXiv paper has a well-defined license linked-to from the upper right hand corner "(license)" link below the PDF and source downloads. If they only re-hosted and re-compiled papers for which they have a license to do so, I wouldn’t complain at all, but re-hosting and modifying content without a valid license is clearly illegal, no?


> re-hosting and modifying content without a valid license is clearly illegal, no?

No.

Longer version: it's illegal only if a license is required, which is a matter of the copyright law of the jurisdiction relevant to the act. In the US, that question may turn on things like fair use analysis, which can be tricky.


Fair use applies to citations (not of the whole work), parodies and similar creative processes. Similar requirements in Germany include some creative input by the person claiming fair use which usually should exceed the creative content taken from the original work. Simply re-compiling the LaTeX source is certainly not creative work sufficient for a fair use exception. Checking the other limits of copyright law in e.g. Germany ( https://de.wikipedia.org/wiki/Schranken_des_Urheberrechts ) nothing remotely applies to this site.

Could you clarify why you think that this site does not require a valid license to re-host and re-compile papers?


> Fair use applies to citations (not of the whole work)

Time-shifting is one of many examples of where copying a whole work was found to be fair use; the idea that fair use applies only to citations is very, very wrong.

Fair use is extremely precedent dependent (and very hard to predict without clear applicable precedent) because the statute law gives only factors to weigh in the analysis.

> Could you clarify why you think that this site does not require a valid license to re-host and re-compile papers?

I didn't state an opinion on that; I said that, because it skips the question of whether license is required, the blanket statement that rehosting without a valid license is “clearly illegal” is inaccurate and overbroad.


Which papers are yours, so OP can check them, and I can avoid them?


a simple robot.txt deny rule should be sufficient to prevent the papers from getting indexed.


Very cool idea! Thank you.


Very cool. Downloading PDF's has always bothered me and this is a fantastic and easy way to view papers, esp while commuting.

A Native android or iOS tablet app would be neat to track your papers etc.


I'm surprised there doesn't exist a more responsive/reactive PDF viewer. It seems like that would be easier than converting the document.


You mean like some embed like Scribd? I've never really liked that sort of stuff, always works wonky for me. Text is great.


Why PDFs in the first place? They're hard to write, hard to format, hard to generate, and they do no good at all to anyone.

HTML is better read, smaller, faster, has more formatting options, and can have all contained in a single file.

Seriously, stop creating PDFs.


Why the hate for PDFs, I can see why you would not want to communicate dynamic content with them, sure. But for academic papers, which are usually static text and equations and maybe some plots, PDFs do exactly what you want: At least in principle they look the same on every system. If I want to email them to somebody, store a copy, read them offline on a reader etc. I have to deal with one single file, in a format that just works on many readers. What am I gonna do when I typeset a paper with webtechnology ... fire up a local server to display some inline math in my html page .. no thank you.

I agree that maybe layouting based on physical paper is maybe not ultimately necessary, but it gives the reader a familiar structure. The way the advertised web site is transferring the papers into a long scrolling list of text ... I find it rather disorienting and unstructured. Text that is split up into "pages" (whatever size they are in the end) somehow helps break up the reading flow.

In the end it remains to be shown that the gain from having academic papers not typeset in PDF outweighs the hassle of having to deal with non-standardized ways of rendering properly formatted text on websites (thinks like MathJax etc. do not support everything that is available in full LaTeX etc.).


Thank you for your response. I understand your points and will rethink my hate for PDFs.

What I don't understand is why I got at least 2 downvotes. These days I'm getting downvotes for every opinion I express on HN. It's very annoying.


Your question "Why PDFs in the first place?" was answered in bfirsh's comment (https://news.ycombinator.com/item?id=15534583).

But anyway, in the context of the discussion about this webpage/project, it's not relevant to ask why these PDFs exist. They do, and the scientific community is nowhere near a transition away from them. So bfirsh is trying to find a solution to consume those existing PDFs.

So think as not getting downvoted for expressing your opinion, but more for not contributing to the discussion about this particular project.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: