This looks really useful, but one thing I wish citation managers would consolidate around is a well-defined format for citation data, such as CSL-JSON [^1]. I have regretfully never seen CITATION.cff files anywhere else, but CSL-JSON seems to see concerted efforts from:
* Zotero, with BetterBibTex plugin [^2]
* pandoc, through its updated citeproc library [^3]
* the citation plugin for Obsidian, which uses CSL-JSON [^4]
This might in the end be my personal preference, but a standardized JSON format (which is just as easily adaptable to YAML [^5]) seems much easier to parse and modify than Bibtex, with its sheer complexity. If we want to have the ability to easily cite anything, then this direction of standardization, I believe, is a must.
Even if github or other open source authors weren't happy with that implementation (whether for legitimate reasons or "not invented here"), it would still have been nice to do something around the basic CSL standards in some way, instead of inventing a new similar standard in ".cff".
Quick question: are the two formats semantically identical in the sense that a program can convert from one format to the other and back, with 100% accuracy? Thanks.
Looks like I posted prematurely without looking up prior discussions on this topic, but here's a detailed comparison from someone much more knowledgable than me:
Count me mystified as to why there needs to be another format for citations.
There are orgs doing super heavy lifting like crossref and orcid to make discovery as data better while google etc make things worse by encouraging discovery as content without offering standard outputs.
At least in this case, Github will output standards in the UI, but it would be much better to encourage storage in those formats too, instead of adding one.
There are at least six competing formats for citations and cross-references in Markdown documents[0]. At some point, I'd like to update my text editor[1] to use a "standard" syntax, but CommonMark hasn't proposed one and the forums are quite quiet on the subject.
I agree on why another format. Reading some of the discussion that someone posted when CFF was introduced I'm even more mystified, one of the arguments is that bibtex treats @software as an alias to @misc, which is weird because this seems to be an implementation detail of the what bibtex the program does not of the file format. In fact I believe using biblatex one can distinguish between the @software and @misc.
Search engines, including google, have endorsed discovery of structured data via the schema.org standards. They're not as focused on the issue as CrossRef or Orcid, but they're doing their part.
Exceptionally fitting? I don’t even care about other xkcd references–927 is the only number that I remember by heart because it is the poster-child of overused, tired, boring, thought-terminating (again), stock responses to people or posts which are about doing things in a better way than established things which happen to have been around for a while, have a spec, and maybe that have multiple implementations; i.e. things that broadly have a standard! (And of course there are always multiple “standards” because no one anoints a king to implement the-one-true-solution—multiple people do it independently. And then we get to snicker about “multiple implementations”… i.e. multiple people having minds of their own. Humans do not possess some kind of insect-like hivemind.)
A lot of what people present on Hacker News are suggestions on how to do things better than we have collectively done up until now. And some people—apparently enough to populate every such thread—feel that they are adding to the discussion by trying to shoot the whole thing down by posting the “mandatory/obligatory” 927 link (note those words—they even tacitly admit that their point is wholly worn down and tired).
I think everyone's seen this by now and it's no longer necessary and certainly no longer funny or appreciated to keep plugging it in. Maybe find a less popular XKCD to "obligatorily spam" everywhere.
There are valid reasons to cite software, but just 'using it' is not a good reason to request citation in a paper. The software package needs to be somehow relevant to paper's content. Why cite a tool for running experiments in parallel when the paper is about breast cancer research? Not every AI paper needs to cite PyTorch.
Proper place where to surface such software dependencies is not in the paper but in the open source code released together with the paper.
Many open-source libraries are developed and maintained by academia,and academia often insists that citation reach is a metric for impact and job security. Using an open-source library or tool might be the critical difference between producing research results or not.
Software is applied computation research, and if using a published research method is enough to warrant citation, then using an open-source library certainly warrants inclusion.
If the world isn't going to come up with ways to monetarily fund open-source, giving open-source developers the tools to help them compete in academia is a great move.
Counterpoint: not everything on Github is software. The linked instructions use the phrasing "let them cite your work". You might use Github as a canonical source for, say, a data dump. Or really any kind of non-software information. This would be a useful way to allow folks to cite your work.
I recently tried searching for the name of my main open source package on Google Scholar and was delighted to find it had been referenced by a few papers! I recommend doing that with your own projects, you never know what you might find.
That's perfect. This is first step for combining research with general development and I am sure the integration will become much stronger soon. Cheers!
* Zotero, with BetterBibTex plugin [^2]
* pandoc, through its updated citeproc library [^3]
* the citation plugin for Obsidian, which uses CSL-JSON [^4]
This might in the end be my personal preference, but a standardized JSON format (which is just as easily adaptable to YAML [^5]) seems much easier to parse and modify than Bibtex, with its sheer complexity. If we want to have the ability to easily cite anything, then this direction of standardization, I believe, is a must.
[^1]: https://github.com/citation-style-language/schema#csl-json-s...
[^2]: https://retorque.re/zotero-better-bibtex/exporting/pandoc/#u...
[^3]: https://github.com/jgm/citeproc
[^4]: https://github.com/hans/obsidian-citation-plugin
[^5]: https://github.com/citation-style-language/schema/issues/278