

Show HN: Unstructured Citations to BibTeX with Conditional Random Fields - rmcgibbo
http://reftag.rmcgibbo.org/

======
ygra
Side note: Page ranges should be formatted as 1044–1048, i.e. 1044--1048 in
TeX. Since you already seem to correctly parse the range, using the correct
dash shouldn't be too hard (couldn't really find the place in the code right
now, though, so no PR, sorry).

~~~
rmcgibbo
Oh good point. Thanks!

------
a3_nm
Didn't work on "[6] J. Wang, J. Han, Y. Lu, and P. Tzvetkov. TFP: An efficient
algorithm for mining top-k frequent closed itemsets. TKDE, 17(5), 2005." I
tried reporting it by clicking the red cross but it doesn't seem like anything
happens when you do it.

It's an interesting idea, however. It looks like it would be especially
relevant to structure citations extracted automatically from a PDF. I wonder
if existing tools like Google Scholar are using something of the kind.

~~~
rmcgibbo
I forgot to add any acknowledgement via the UI when the the red cross gets
clicked (but it was logged :))

~~~
a3_nm
Thanks! By the way, your blog
[http://rmcgibbo.org/posts/](http://rmcgibbo.org/posts/) seems to be missing
an RSS/Atom feed; I would subscribe if there were one. (I would also tell you
this via email, but your website is also missing an email address.)

~~~
rmcgibbo
Oh good points. I'll add those. Especially the email.

------
jessriedel
It would be nice if this could be configured to take you to the Google Scholar
result, or the journal webpage (DOI), since I often want that instead than the
bibtex entry.

Incidentally, I've always been blown away that Google Scholar isn't smarter
with unstructured citations. I've literally had cases where it choked on a
citation with "Phys. Rev. Lett." but then correctly identified it when un-
abbreviated to "Physical Review Letters". Crazy.

~~~
rmcgibbo
Yeah, this is on the list of things to add.

I think at this point google scholar is mostly abandonware. The team
maintaining it is very small, and not really adding new features.

~~~
jessriedel
Ahh, yes I see now that your website is doing parsing, and that linking up
with another service to match, say, DOIs, would be a significant amount of
additional work. Still, yours is a great tool!

You're probably already aware of it, but it might be worth checking out
CrossRef's Simple Text Query

[http://www.crossref.org/SimpleTextQuery/](http://www.crossref.org/SimpleTextQuery/)

------
wodenokoto
This is really cool and something I have been missing. It still gets little
things wrong, but it is pretty good.

Next step is to automatically extract these from PDF and integrate it into
various citation management programs :)

------
stared
It would be cool if it were searching for DOI as well.

------
Rhapso
Thank you! I wish I had this years go.

