Hacker Newsnew | past | comments | ask | show | jobs | submit | thecodeviking's commentslogin

+1 to what @undefined said.

If all goes well, we hope to act as a front-facing host for the Engrafo engine. That'll enable the ArXiv Vanity team to focus their efforts on improving the conversion process (which is what they'd like to focus on) while we can handle the logistics of serving this up to users (as quickly as possible).

I'm really excited about the opportunity to collaborate and support the ArXiv Vanity / Engrafo team!


It's not that different, at the moment. The only real difference is that we're pre-computing the HTML so it's faster (ArXiv Vanity runs at request time).

We've talked a lot with the ArXiv Vanity team. If all goes well and our users love the feature, we have an opportunity to support (and contribute to) to their efforts at improving Engrafo and LaTeXML and maintain the front-end facing portion of the system. That way they don't have to worry about hosting / providing a functioning front-end, which we're happy to foot the bill for (and maintain)!


Yup, @kpsns nailed it. LaTeXML does the heavy lifting in converting TeX to XML. From there some post processing does the job of converting it to a nice responsive template (that's done by Engrafo / the ArXiv Vanity team).

We love OSS at AI2, and are looking to collaborate with the Engrafo / ArXiv Vanity team as we expand the functionality.


So I digged into the code (engrafo repository) and was quite surprised that -- contrary to the suggestive title -- the method inherits all the problems LaTeXML already has. This is the fact that (for instance compared to the TeXLive distribution), tons of widespread sty files miss a LaTeXML integration and thus the conversion fails for a wide range of papers. Converting a TeX document to XML with LaTeXML really requires a lot of debugging and ideally starting from a plain LaTeX paper/book and compiling with pdflatex and latexml at the same time, making sure nothing breaks.


Yea, it's tough work. We're hoping to invest more in the conversion library (and support the Engrafo's team to do so).

It's going to take a lot of time and elbow grease to get it to where it needs to be!


Hi HN,

I'm an engineer on the Semantic Scholar team that worked on integrating this feature into the site.

Here's a blog post that talks a bit more about what we're doing: https://blog.semanticscholar.org/announcing-a-new-way-to-rea...

I'm around to answer questions / discuss the approach. We're super excited and would love to hear your feedback!


Feature idea 1: perhaps you could make references section in the bottom render links, at least for references that provide and arXiv identifier.

Feature idea 2: better/shorter URL structure, the current URL https://www.semanticscholar.org/paper/{title}-{authors}/{uui... ends up quite long and unreadable (may be good for SEO though). If you're rendering mostly arXiv papers you could setup a short URL scheme that mirrors the arXiv url paths: e.g if original URL is https://arxiv.org/abs/XXXX.YYYY your URL could be https://www.semanticscholar.org/paper/arxiv/XXXX.YYYY

Good stuff!


(I also work on the Semantic Scholar team)

1. Totally agree. We know how we can / will do this and plan to do so given enough interest in the MVP reading experience.

2. The url structure is indeed for SEO purposes. (We get a large majority of users discovering our paper pages through organic search)


The proposed shorter format could maybe be a redirect then? Quickly being able to turn one link into the other would be quite useful.


Definitely, thanks!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: