Hacker News new | past | comments | ask | show | jobs | submit login
About Citation Files on GitHub (docs.github.com)
142 points by tgymnich on Jan 25, 2022 | hide | past | favorite | 28 comments



This looks really useful, but one thing I wish citation managers would consolidate around is a well-defined format for citation data, such as CSL-JSON [^1]. I have regretfully never seen CITATION.cff files anywhere else, but CSL-JSON seems to see concerted efforts from:

* Zotero, with BetterBibTex plugin [^2]

* pandoc, through its updated citeproc library [^3]

* the citation plugin for Obsidian, which uses CSL-JSON [^4]

This might in the end be my personal preference, but a standardized JSON format (which is just as easily adaptable to YAML [^5]) seems much easier to parse and modify than Bibtex, with its sheer complexity. If we want to have the ability to easily cite anything, then this direction of standardization, I believe, is a must.

[^1]: https://github.com/citation-style-language/schema#csl-json-s...

[^2]: https://retorque.re/zotero-better-bibtex/exporting/pandoc/#u...

[^3]: https://github.com/jgm/citeproc

[^4]: https://github.com/hans/obsidian-citation-plugin

[^5]: https://github.com/citation-style-language/schema/issues/278


Yep, came here to say the same thing.

There's already a lot of open source work going on around CSL, which I think originally comes out of Zotero.

While CSL comes from Javascript world, there already is a complete ruby implementation, providing ability to format a cituation into any citation style in the CSL style repository. https://github.com/inukshuk/csl-ruby https://github.com/inukshuk/citeproc

Even if github or other open source authors weren't happy with that implementation (whether for legitimate reasons or "not invented here"), it would still have been nice to do something around the basic CSL standards in some way, instead of inventing a new similar standard in ".cff".


Quick question: are the two formats semantically identical in the sense that a program can convert from one format to the other and back, with 100% accuracy? Thanks.


Looks like I posted prematurely without looking up prior discussions on this topic, but here's a detailed comparison from someone much more knowledgable than me:

https://news.ycombinator.com/item?id=28254775


Hm, that comment mainly seems to talk about how CFF is very specialized at citing software, as opposed to the more general CSL.

"It is is specifically for citing software and datasets… Unfortunately, though, because the schema only has enough fields for citing those two…"

However, the link in OP here suggests they've expanded CFF beyond that to be used to cite anything. https://docs.github.com/en/repositories/managing-your-reposi...


Can tell from what you linked, does this JSON format have anything like bib(la)tex's \mkbibquote and \mkbibemph?


Count me mystified as to why there needs to be another format for citations.

There are orgs doing super heavy lifting like crossref and orcid to make discovery as data better while google etc make things worse by encouraging discovery as content without offering standard outputs.

At least in this case, Github will output standards in the UI, but it would be much better to encourage storage in those formats too, instead of adding one.


There are at least six competing formats for citations and cross-references in Markdown documents[0]. At some point, I'd like to update my text editor[1] to use a "standard" syntax, but CommonMark hasn't proposed one and the forums are quite quiet on the subject.

[0]: https://talk.commonmark.org/t/cross-references-and-citations...

[1]: https://github.com/DaveJarvis/keenwrite


I agree on why another format. Reading some of the discussion that someone posted when CFF was introduced I'm even more mystified, one of the arguments is that bibtex treats @software as an alias to @misc, which is weird because this seems to be an implementation detail of the what bibtex the program does not of the file format. In fact I believe using biblatex one can distinguish between the @software and @misc.


Search engines, including google, have endorsed discovery of structured data via the schema.org standards. They're not as focused on the issue as CrossRef or Orcid, but they're doing their part.


Obligatory related xkcd: https://xkcd.com/927/


These comments insisting on citing 927 are looking more and more bot like.


Quoting XKCD is mostly tired, but this seems to be an exceptionally fitting instance.


Exceptionally fitting? I don’t even care about other xkcd references–927 is the only number that I remember by heart because it is the poster-child of overused, tired, boring, thought-terminating (again), stock responses to people or posts which are about doing things in a better way than established things which happen to have been around for a while, have a spec, and maybe that have multiple implementations; i.e. things that broadly have a standard! (And of course there are always multiple “standards” because no one anoints a king to implement the-one-true-solution—multiple people do it independently. And then we get to snicker about “multiple implementations”… i.e. multiple people having minds of their own. Humans do not possess some kind of insect-like hivemind.)

A lot of what people present on Hacker News are suggestions on how to do things better than we have collectively done up until now. And some people—apparently enough to populate every such thread—feel that they are adding to the discussion by trying to shoot the whole thing down by posting the “mandatory/obligatory” 927 link (note those words—they even tacitly admit that their point is wholly worn down and tired).


I feel similarly to the other commenters that xkcd references tend to get overused on HN, but it seemed too perfect to not reference in this case.


I think everyone's seen this by now and it's no longer necessary and certainly no longer funny or appreciated to keep plugging it in. Maybe find a less popular XKCD to "obligatorily spam" everywhere.


Not everyone [1].

There's a voting mechanism to decide if comments aren't relevant/contributing.

[1] https://xkcd.com/1053/


I wouldn’t use that to justify deploying thought-terminating cliches.


There are valid reasons to cite software, but just 'using it' is not a good reason to request citation in a paper. The software package needs to be somehow relevant to paper's content. Why cite a tool for running experiments in parallel when the paper is about breast cancer research? Not every AI paper needs to cite PyTorch.

Proper place where to surface such software dependencies is not in the paper but in the open source code released together with the paper.


Many open-source libraries are developed and maintained by academia,and academia often insists that citation reach is a metric for impact and job security. Using an open-source library or tool might be the critical difference between producing research results or not.

Software is applied computation research, and if using a published research method is enough to warrant citation, then using an open-source library certainly warrants inclusion.

If the world isn't going to come up with ways to monetarily fund open-source, giving open-source developers the tools to help them compete in academia is a great move.


Counterpoint: not everything on Github is software. The linked instructions use the phrasing "let them cite your work". You might use Github as a canonical source for, say, a data dump. Or really any kind of non-software information. This would be a useful way to allow folks to cite your work.


That's a good idea, citing repos or worse, websites really ought to have a proper citation format.


I recently tried searching for the name of my main open source package on Google Scholar and was delighted to find it had been referenced by a few papers! I recommend doing that with your own projects, you never know what you might find.


Is there a JS library which adds robust support for citations to webpages?

If MathJax/KaTeX is the Javascript version of LaTeX math mode, then what's the Javascript version of Bib(La)TeX?


I don't know if this fits your KaTeX analogy, but citeproc has a javascript implementation. Uses CSL-JSON to render citations.

https://github.com/Juris-M/citeproc-js


That's perfect. This is first step for combining research with general development and I am sure the integration will become much stronger soon. Cheers!


I had never heard of using a @software citation, I always just used @misc... Does it work everywhere?


I've used Zenodo to generate cite-able archives of Github projects. This is interesting though!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: