Hacker News new | past | comments | ask | show | jobs | submit login
Zotero: An open-source tool to help collect, organize, cite, and share research (zotero.org)
659 points by autocorr on July 25, 2018 | hide | past | favorite | 163 comments

I use a little script [1] and a passive approach to quickly find a PDF I am looking for among a few thousands of academic PDF. The workflow (illustrated in the GIF [2]):

- as I read new PDFs in the browser, the PDFs are passively downloaded typically in a Downloads/ folder.

- This results in thousands in papers lying in Downloads/ or elsewhere.

- The command p from the script [1] let me instantaneously fuzzy-search over the first page of each pdf (The first page of each pdf is extracted using pdftotext, but cached so it's fast). The first page of academic PDFs usually contains title, abstract, author names, institutions, keywords of the paper; so typing any combination of those will quickly find the pdf.

What is particularly convenient is that no time is spent trying to organize the papers into folders, or importing them into software such as zotero. The papers are passively downloaded, and if I remember ever downloading a paper, it's one fuzzy-search away. Of course it does not solve the problem of generating clean bibtex files.

[1]: the script: https://github.com/bellecp/fast-p

[2]: an illustration in GIF: https://user-images.githubusercontent.com/1019692/34446795-1...

edit: The script has been moved from the gist to the public repository https://github.com/bellecp/fast-p

I love the passive nature of your workflow. I’ve always thought that, as soon as I had consumed (read, viewed, heard) some content (text, audio, video) it would be nice to have a “shadow copy” of it stored in a personal, private knowledgebase, with a simple keyword or more complex semantic search on top.

Basically, a personal search engine with a passively gathered corpus of my experienced content - maybe even filtered at times as in your case where you limited it to academic PDFs (to keep the knowledgebase focused). Kind of like an extension of our human memory.

Consume -> add as extension of knowledgebase -> recall.

Thank you for sharing your workflow - simple and ingenious!

Have you had any issues or thoughts for future enhancements? I can think of a number of other helpful things you could do with the corpus you’ve built, for yourself.

It is not limited to academic PDFs, although I mostly use the script to find those. When typing academic keywords (author names, scientific jargon, etc), the personal PDF that also lie in ~/Downloads/ are filtered out.

I recently used the command with some combination of airport/city/airline and the only match was the boarding pass I was looking for. It could probably be used for receipts from hotel or whatnot, as soon as pdftotext can retrieve the text. It should find tax returns and related PDFs by querying "IRS + SSN".

A current issue that I would like to fix is the preview window that does not always highlight the query in full if a single match was found before the full query was typed. It is linked to how fzf handles previewing. I do not have plans for any big enhancements.

edit: I created a public repo to replace the gist. Feel free to post your thoughts or suggestions in the issues!

Take a look at Recoll, and its web extension:


I did something similar for search until I found Recoll. It has similar functions to what you describe(caching, fuzzy search) with a slick work flow that shows google scholar like context previews with an optional remote access to your library through a webui. It also searches compressed archives and generally simplifies searching many unorganized files.


Thank you for the pointer to Recoll!

I'm trying it out now -- I have way too many pdfs, and I could really use the extra features (like context previews).

I agree with what others have mentioned here: I really like your elegant workflow (thanks for sharing!!). I also like that it is generally applicable to any collection of PDFs.

However, to be fair, you can follow a somewhat similar workflow with Zotero in combination with the Firefox plugin: download pdf by adding it to the Zotero database in Firefox and Zotero takes care of the indexing. Zotero misses the fancy interactive fuzzy searching you have in your workflow thanks to fzf, but I've added it as a feature request for Zotero [1].

You don't have to organize your papers into folders (or collections in Zotero parlance since a single item can appear in multiple collections). For most academic papers the Zotero plugin will also grab the pdf's metadata as a bonus without additional costs.

[1] https://github.com/zotero/zotero/issues/1536

This is awesome! Interesting enhancement idea, slap a NLP topic model on top of it and explore by clusters. I'd pay for a product that did that (and if I ever find the time might try and do it myself).

Neat workflow. I use a conceptually similar system though it’s proprietary (DEVONthink). Drop pdfs or websites or bookmarks in the inbox and burn through them and decide if I want to archive it in the database.

I’ve enjoyed having access to the database from my laptop as well as iPhone and iPad. It’s definitely been a workflow I’ve cobbled together. This seems to be working out better though.

A downside would be that you can't search all the pdfs of a particular author? For me that's crucial.

I don't think full names of non-authors are very commonly mentioned on the first page of papers, so just including the name in the query should be a useful approximation?

This heavily depends on the field/journal style.

This is terrific! Thanks for sharing. fzf is a wonderful piece of software.

I tried to use the script but it just searches over the file names.

It seems that the pdftotext command does not go through. Create an issue on github if this still occurs.

What are you using for passively downloading all those pdfs?

Firefox or any other browser that downloads the PDFs that you browse. When searching/browsing PDFs, my firefox is set up to download the file into ~/Downloads/.

this is amazing.

I used Zotero through my MA and into my PhD, when I discovered and began writing in LaTeX in emacs/AucTeX instead of LibreOffice. I used the Better BibTeX plugin [0] to maintain a BibLaTeX file, but as I developed my emacs skills I moved to RefTeX.

At that point I realised that my BibLaTeX file was really a mess. Better BibTeX created tons of needless double curly brackets {{like this}} in the BibLaTeX file, making searching it directly a pain. And it created lots of @misc entries, the BibLaTeX entry of last resort, when it should have made @url's and other things.

Zotero has massively benefited my work, but it's also been something of a training wheel which in the longer term slowed me down. Emacs/AucTeX/RefTeX does everything Zotero does (at least according to my use) but faster, cleaner, more holistically, and with more features (eg crossref'ing entries). And two years on I'm still cleaning up that Better BibTeX's BibLaTeX file.

[0] https://retorque.re/zotero-better-bibtex/

* edited because I confused AucTeX with RefTeX.

I guess I don't see how you can easily add citations without something like Zotero (or its closed source equivalent Endnote). At least in my use case, I go to Pubmed and click the Zotero Chrome plugin to add the reference to Zotero. What do you do, manually enter the info?

Nah, not manual at all.

I just call up RefTeX by hitting ctrl-x + (or "C-x +" in emacs parlance). This calls up a list of all possible references and a fuzzy search box which lets me narrow it down. When I've found my reference I have ten options, including:

f1: Open associated pdf, url or doi

f3: Insert the citation into my thesis and then be asked for what kind of note I want to add (footnote? Title? author name? Year? Just a bibliography entry?), then any text to add before the reference (eg, "This argument has been made by,"), then after (p. 67). -- This answeres your question

f7: Attach pdf to email

f9: Show notes, if there's an .org file of the same name.

f10: Add pdf to library

If RefTeX can't find my query, not only does it ask if I want to add a new one, it asks whether to search various databases for it including arXiv, DBLP (computer science bibliography), Google Scholar, Bodleian Library, HAL, Library of Congress, British Library and others.

Adding entries to the database is as simple as editing a plain text file, and BibTeX provides quick shortcuts for all the different kinds of entries, checks them for you and provides a sane and customizable key.

Everything is completely integrated - my writing, my pdf library, my org notes, my bibliography database, my email as well as my thesaurus (wordnut), my pdf reader (pdf-tools) and my git repo (magit). It's just brilliant. All together it makes emacs the ultimate writers tool, as far as I'm concerned.

I think what your OP was referring to is not adding a citation to a document that you're editing, but rather ingesting PDFs and bibliographic data from the web.

I love Emacs (not as much as you do it seems ;) but Zotero is amazing for ingesting academic references including fulltext and bibliographic data from the web. One click of a button in your browser, and everything is ready to read and cite.

(Source: Wrote quite a number of papers and co-authored one textbook, all using Zotero. My workflow was Zotero for ingestion -> per-publication bibtex files for authoring)

Aha, apologies, my misunderstanding. Looks like I got a bit emaxcited 8)

So either you add them manually or you query Crossref, arXiv, DBLP or HAL (French open archive) within emacs and then can copy their bibtex entry from there. No pdf or web reference importer as far as I'm aware. Yet!

AucTeX can save snapshots of entire Web pages with images, PDFs, etc. in a personal library?

I use Mendeley instead of Zotero. I really wanted to like Zotero, since I hate Elseveir like the next guy, but Zotero didn't have any of the look-up features and ability to quickly import my papers (thousands). In Mendeley, I can just throw a physical folder into a folder and it'll tag and categorize the PDFs mostly correctly.

I mean it is nice that Zotero has an import browser plugin (which, btw, Mendeley does as well), but once you have a substantial library of pdfs, it is just too time consuming to re-import it all again, type in names, years, journals etc. ugh..

Also, Mendeley can open pdfs within the app and has great features for mark-up and comments. Reviewing a paper is really nice this way. In fact, Mendeley is the program with the best mark-up features on all of Linux, in my opinon. And all this is synced across devices.

I'd love to use Zotero but tbh Mendeley is just better.

Your information might be outdated -- In the latest version of Zotero, when you add a PDF to Zotero, it automatically retrieves the metadata via DOI lookup.

(I don't think it would work with papers too old to have DOIs but haven't checked lately.)

The metadata retrieval pairs with a DOI or ISBN [1]. Articles without such handles might not work, nor will articles which are not OCR'ed (obviously) but this can be easily fixed with OCR scanning software.

The metadata retrieval process is a very handy feature, especially paired with Zotero's web browser plugin.

Zotero also keeps a repository of all the citation styles, which are curated and administered by the Zotero team and stored in a Github repo [2].

[1] https://www.zotero.org/blog/zotero-5-0-36/

|2] https://github.com/citation-style-language/styles

>> I mean it is nice that Zotero has an import browser plugin (which, btw, Mendeley does as well), but once you have a substantial library of pdfs, it is just too time consuming to re-import it all again, type in names, years, journals etc. ugh..

It automatically added 90-95% of PDFs I imported, most directly from Sci-Hub with no editing from me.

Mendeley is an Elsevier product? Might make a few academics come to a differing conclusion given Elsevier's reputation is one they've well and truly earned...

It was a startup and Elsevier acquired them a few years ago. I'm not sure how integrated the teams are but I'd expect the worst :)

I reinstalled Zotero just now, since it has a Mendeley importer now.

What is the proper way to store PDFs that I annotate (say, using Okular in Linux) on Zotero with the ability to send them to other researchers and then update them?

I don't mind paying money for cloud storage, but I gotta be able to work with the pdfs.

You can choose to set a custom PDF reader in the General tab of preferences --> Open PDF using --> custom [1].

Then you can annotate the file and save it. If Zotero creates a copy of the file when you save your annotations, you might need to use Show file by right clicking the article in Zotero and make changes to the file in Zotero's storage. In Linux, that'll be in ~/Zotero.

[1] https://forums.zotero.org/discussion/1977/changing-the-defau...

No need to use Show File. You can just open the PDF from Zotero, annotate, and save.

It's a bit of a pain, but you can store `~/.local/share/okular/docdata/` in version control or your favourite cloud storage provider, and then you'll get synchronised annotations.

When I say a bit of a pain, I mean it - I switched to Mendeley with great sadness because its mobile application means I can read and annotate papers on my tablet and have it synchronised perfectly with my desktop.

On a semi-related note: the Mendeley mobile application makes selecting text a joy on a touchscreen with a kind of magnifying glass. I really wish it was built into Android system wide.

> you can store `~/.local/share/okular/docdata/` in version control

What kind of annotations are that? I think GP refers to document annotations, which are stored in the PDF file itself.

Annotations meaning notes, highlights, etc.

If they are stored in the file itself, that's news to me. Last time I used it, Okular didn't modify the actual PDF and considered that a feature.

You have to explicitly save the file after making the annotations. I've been doing this for years with Okular, indeed, Okular having out-of-file annotations is news to me :)

Oh, wow. That will make collaboration so much easier. I resorted to Adobe Reader on multiple occasions in the past...


Yes. That's why I uninstalled and switched to Zotero. The import tool worked great, btw.

yeah so I just reinstalled Zotero and imported my Mendeley library.

It does that, and it can actually look up metadata from pdfs. But: It can NOT update already existing entries from such a lookup. You have to delete the pdf and re-import. Why? No reason.

It's this sort of really bad usability decision that makes Zotero just not very good imo.

It's not a usability decision — it's just one feature that hasn't yet been implemented [1]. Updating metadata is more complicated than creating a new item, because you have to deal with the existing metadata somehow. (Mendeley just overwrites the existing item, often with incorrect data, which we don't consider acceptable.)

Disclosure: Zotero developer

[1] https://github.com/zotero/zotero/issues/1515

I really love the "duplicate finder" and its ability to pick and choose metadata to keep. Perhaps just create a duplicate item and deal with merge conflicts later?

I'm not sure what feature exactly you're missing, but are you aware of the "Rename file from parent metadata" option you get when right clicking on a PDF in the list? Perhaps that does what you need.

If not, with Zotero you have a chance of getting a missing feature added. Not so with the closed-source Mendeley. Especially now they have started doing user-hostile lock-in stuff like gratuitously encrypting users' data, making it no longer accessible to the user (see downthread).

With Zotero, in principle you could hack together something using `bash` and `sqlite3` that batch-updated documents; you may not even need to look at the source code to the official app. It's ad-hoc munging like that that Mendeley has recently gone out of its way to prevent, stopping people from working effectively with their own data.

Zotero can look up metadata if you add a PDF to your library

Zotero can not update metadata for a PDF you already have in your library

Hence, adding an existing library means you have to do it manually for each pdf This is bad design, because the functionality exists, but it's implemented in a way that makes everything complicated for no reason at all.

Why disable the "lookup metadata" button for a pdf which is already in a library? WHYYYYY

I see! Yes, that's not great.

You can just all all the pdfs by "Store copy of file" option. It allows multi sect. Then you can select all of them and do a fetch metadata together.

My experience of auto-import of metadata in Mendeley is that it gets about 80% of it right. Not good enough for publication: I still had to check over each record with a fine-toothed comb when I wanted to actually use an entry to cite something.

So far, Zotero's metadata lookup seems about the same quality.

For my university essays, I keep my research citations in Zotero and write in Markdown using Ulysses. Then when it is ready to submit, I'll use Raphael Kabo's technique [1] to export into Microsoft Word. It is a truly simple workflow to get perfect formating with little effort.

[1] http://raphaelkabo.com/blog/posts/markdown-to-word

Genuine question: what do you do with Word docs sent back to you as email attachments with track changes and embedded comments?

I'm slowly realising that my colleagues for whom this "works" are never going to change. Our shared folders are littered with "Copy of FINAL final + comments 2.7.18-my-copy.docx.docx". It doesn't matter how slick my git + markdown + pandoc workflow is when conversations go like this:

"Just use track changes."



Is your markdown to Word reversible? If so, accept all changes, convert back to Markdown and run git diff. I guess you're losing comments though.

Definitely worth a go.

Comments might be extractable, I'm not familiar with docx format but it's zipped X(?)ML type data so there will be parsers. Or a conversion to an intermediate format that's more amenable to computer processing, perhaps.

The problem remains in reverse, though: it's expected that I will produce Word docs full of track changes edits.

I second this workflow - thanks to Zotero and Ulysses I felt I was able to keep track of research for my thesis and easily restructure as I went along. My editor on the other hand was not so happy when I thought it was a good idea to use MS Word's built-in citation tool.. he made me pull them all out and add them back in manually. A good lesson to chat with your editor first before trying to be efficient.

Zotero's cross-platform plugins meant I could use any device for reading and not worry that if I followed important links I'd never find them again.

I had tried Zotero before but I stuck with Mendeley for a long time. I ended up using Zotero a couple years ago to share some references with my advisor and since then I've grown to love it. One of the nice things about it being based on Firefox is that there's a lot of extensions written for it (which IMO, are not well-advertised).

The API is also great and I've been using it to automatically sync PDFs of papers I want to read to my reMarkable tablet.


Do you like the reMarkable tablet?

I currently have an iPad Pro but I really don’t like reading PDFs on it. I’d love to stop printing out papers and an eink display would be nice.

EDIT: just read your review linked in the post

One thing I'll add that has happened since the post was made is that the reMarkable team has been pretty good at churning out updates. The last firmware update adds table of contents and text search for PDFs and its incredibly fast.

My two biggest annoyances are currently:

1) No web app, so the only way to export PDFs is via the desktop app which is only available for Windows and Mac. The Android app is also pretty terrible. I think not starting with a web interface was a bad decision.

2) The PDFs exported from the desktop app are huge (a PDF that started out ~400KB ended up >100MB after export). Support tells me they're working on this.

5 years and 7464 references later Zotero has become an integral part of my workflow. The combination of Zotero and the custom renaming of Zotfile lets me store all my pdfs in Dropbox, but sync all database by Zotero such that I can add references at work and have them waiting for me on my PC at home.

I abandoned Mendeley because it would not permit relative paths, and I have not looked back.

I loved zotfile for its feature which enable user to highlight/comment articles then import the text in notes related to article.

Our team switched from a custom solution to Mendeley last year instead of Zotero because Mendeley lets you highlight and add notes in the PDF, which gets shared among the whole team with those markups. Zotero didn't have that. Otherwise we would have went with Zotero.

In Zotero you just do that in your PDF reader of choice. If you're using syncing, as long as the PDF reader saves annotations back to the original file, Zotero will automatically sync the updated file to other group members.

The one downside of this approach is that multiple people can't modify the same PDF at the same time. The upside is that you can use whatever PDF tools you want and annotations remain accessible in the file even if you stop using Zotero, which goes with our philosophy of leaving people in control of their own data. (Mendeley stores annotations and highlights in its own encrypted database, and you can't even export PDFs with annotations in batch. If you want to get a PDF with your annotations out of Mendeley, you have to do it one file at a time.)

Disclosure: Zotero developer

Thanks for explaining how it can be achieved in Zotero. We did try this out, but just as you mentioned, the annotated/highlighted PDF changes the file. We liked how Mendeley keeps that information in a separate DB and lets you make the markups right in the app, instead of having to do it yourself outside the application. It was simply more seamless for our non-tech people.

6 months to go before handing in my PhD thesis and I've just discovered it. It was well worth the hassle of a slightly fiddly transfer from Endnote. Working great so far.

For LaTeX users I recommend the Better Bibtex plugin, making my life a lot easier!!

Zotero really need more advertisement. I prefer Zotero over Mendeley, but Mandaley seems to advertise much more to university.

Also, Mendeley was bought by Elsevier. Zotero is open source.

I have just recently switched away from Mendeley because of an insanely user-hostile thing they've just done to lock-in their users.

Mendeley has started encrypting users' bibliography files (from version 19), claiming variously that encryption is required by GDPR (?!?!) and/or that it improves security on a multiuser system. (Neither of these excuses holds water.) The keys are not available to the users whose data were encrypted. The encryption is completely proprietary and there are no tools available for letting users work with the now-encrypted databases.

Previously, users could access their data by running `sqlite3` on a plain sqlite database in their profile directory.

Not only did users take advantage of this, but a small ecosystem of third-party tools and scripts had sprung up to help people take control over automating repetitive tasks in managing their bibliographies.

Well, now Mendeley has encrypted everything, that's the end of that. No more tools and scripts. No more user control over one's own data and workflow.

There's now only the limited (for my purposes, useless) "export database" facility in the client. It only exports a limited subset of the fields in the database. Alternatively, I could register as a developer, get an API key, and develop a full web-style app to get hold of my own data. ("Did you just tell me to go fuck myself, Bob?")

The Zotero people suspect that Mendeley's move was retaliatory for Zotero's implementation of import-from-Mendeley [1]. The Mendeley twitter account has dismissed this - literally - as being "fake news" [2]. Super weird.

I reverse-engineered the encryption they'd put in, in order to export my own data and migrate to Zotero. I wrote up the instructions for others to follow [3], but it's really not easy, not portable, not reliable, and not something users should ever have to do just to get access to their own data!

Anyway, at this point, I don't trust Mendeley as far as I can throw them, and I thoroughly regret ever recommending to my friends that they use the tool. After nine years of using Mendeley, though, I've switched to Zotero, and it's at least as good, and in some ways better. Plus it's properly open.

[1] https://www.zotero.org/support/kb/mendeley_import

[2] https://twitter.com/mendeley_com/status/1006919608471818240 and several others. Really, the official twitter responses from Mendeley to people discussing this issue were bizarrely dismissive and mocking.

[3] https://eighty-twenty.org/2018/06/13/mendeley-encrypted-db

That sounds against GDPR: https://gdpr-info.eu/art-20-gdpr/

"The data subject shall have the right to receive the personal data concerning him or her, which he or she has provided to a controller, in a structured, commonly used and machine-readable format and have the right to transmit those data to another controller without hindrance from the controller to which the personal data have been provided [...]"

Wow, that is seriously draconian. In my opinion it should be illegal if it isn’t, intentionally obscuring data storage to prevent marketplace competition.

Even though I also like Zotero better because of its open nature, I prefer Mendeley in practice mostly because it has a decent Android client where you can just sync all your PDFs and read them on a tablet or something. Seemed a bit more involved to do the same with Zotero (where it seemed there were only minimal Android clients quickly hacked together). Mendeley also has 2GB storage, so that's nice for personal use and paper/book syncing.

That said, if I had to collaborate in a larger group I'd surely choose Zotero instead of locking everybody into some proprietary ecosystem. But for personal use I definitely don't feel that I'm enabling or supporting Elsevier somehow in using their freeware.

> I definitely don't feel that I'm enabling or > supporting Elsevier somehow in using their freeware.

If instead of using Elsevier's freeware you were to file bug reports - not code just reports - on Zotero then you could be helping improve the software.

And when you suddenly _do_ need to collaborate you'll find that you are using the same tool you have gotten used to, instead of the dissonance of using a tool that is similar-but-not-quite what you know.

Go ahead and use Zotero, check out the two or three leading Android clients, and file bugs. You've got these clients currently: https://play.google.com/store/apps/details?id=com.gimranov.z... https://play.google.com/store/apps/details?id=computer.benja... https://play.google.com/store/apps/details?id=net.ezbio.zote...

I've not got any experience with Zotero, but I do with Mendeley. Unfortunately, the apps you've linked above suffer the same issues as many OSS alternatives to commercial software. For a start, the reviews on the Mendeley client [0] are far better than any of the three alternatives you've provided. All three of them have the same complaints - Functionality/UI leave a lot to be desired. My experience of this fragmentation (KeePass) is you have N clients which offer slightly different features, but none of them offer the same features that the commercial offering offers, meaning people who rely on one or more of the specific features of the commercial offering will never switch.

[0] https://play.google.com/store/apps/details?id=com.mendeley&h...

I have found syncing the PDFs via Dropbox works fine. Zotero puts them in a Dropbox folder with a reasonable filename, and then the regular embedded Dropbox PDF reader does an OK job on my phone/tablet.

It'd be nice to have the database synced, though. Funny how "synchronization of replicas of information", for all its centrality to modern internet-based life, isn't something that our operating systems provide as a system service.

> Funny how "synchronization of replicas of information", for all its centrality to modern internet-based life, isn't something that our operating systems provide as a system service.

Just FYI on that front, macOS has iCloud, which does this. I put something on my Desktop, it appears on all my machines. I can access it from my phone, etc. etc. - it's basically a built-in Dropbox.

That's great for synchronising files. It's not really the kind of thing that would help with synchronisation of, say, contact databases, email folders, shared text documents, calendars, bibliographies, photo archives, etc etc. It'd be interesting to consider an OS-wide API for flexible data synchronisation and conflict resolution.

Mendeley was very popular (until its unfortunate acquisition) because of its excellent PDF full text search and the ability to automatically get metadata (i.e. a bibtex entry) from PDFs. Can Zotero do that?

Yes, I use that function in Zotero all the time.

Why not just use pdfgrep?

Because the info often isn't in the pdf file itself (at least in a parseable way). Typically metadata readers figure out what the paper is and then goes to a bibliographic database to pull out the full title, author list, page info, etc.

Zotero blew my mind when I discovered it last year. Works almost flawlessly.

I subsequently tried contributing but got lost in the weeds of the somewhat unwieldy codebase. In part because it's interface is based on XUL and I don't know wtf is going on there. I think they're slowly migrating to either electron or something else. If you know XUL and have some free time, please consider contributing.

Wow what a throw back...I remember using Zotero in middle school over a decade ago. It was a great tool then, glad to see they're still going strong.

I thought the same thing - It looks like they've come a long way since I used it eons ago.

Zotero rocks! I encourage all my students to use it as well even though the University has a license from Endnote. Power tip: reduce the number of characters searched from the default 500000 to more like 10000 for large (>5k) libraries otherwise I find it slows down quite a bit.

It makes so much sense to stimulate students to use a free tool they can keep using after they leave university or change schools, without resorting to piracy or expensive subscriptions ($250!).

The site is not very clear about if it's possible to self host the server part, but hopefully the server is the PHP application at https://github.com/zotero/dataserver

Yes. You can self-host a WebDav server and easily point Zotero at it.

Can I use this completely isolated and offline, or does it require you to use it online or is otherwise severely limited if you don't? And it is easy to get it in this mode in a clear way? Their website doesn't make this clear if this is possible.

Yes, by default it is completely offline. You have to take extra steps to connect it to any kind of online service, such as a Zotero web account, or Dropbox for file syncing.

It is completely isolated, if for example you want to include references in word. However some functions like adding references quickly with DOIs will fail without an internet connection.

I've used Zotero for about 10 years. Its one-click citation saving is awesome.

But since I am a LaTeX/LyX person, I also wrote an extension that lets Mac users automatically add newly saved citations to BibDesk: http://mackerron.com/zot2bib/

Zotero really needs to become self-hostable. Other than that, it's a great tool.

> Zotero is the only software that automatically senses research on the web. Need an article from JSTOR or a preprint from arXiv.org? A news story from the New York Times or a book from a library? Zotero has you covered, everywhere.

Is there a way to make it work with SciHub? Asking for a friend... ;).

Can confirm, this works.

Citationsy works with Sci-Hub... just saying :) https://citationsy.com/blog/new-feature-citationsy-archives/

No export feature prominently advertised, this looks like another startup silo.

It can export CSL-JSON, BibTeX, Endnote, and RefWorks formats.

Zotero can extract meaningful data from PDF. Alternatively, you can use the paper DOI.

I want to love Zotero, I really do. But the last time I tried it seriously, I found that:

- metadata extraction/automated adding really doesn't work right with my research workflow (a lot of google scholar searches, a lot of humanities and social science sources)---lots of inaccurate or incomplete info, lots of downloading RIS files and then manually importing and separately manually importing the PDF.

- documentation for things that would be useful like hooking up to academic library proxies is nonexistent. Take a look at the chain of empty links when you try to get proxy info: https://www.zotero.org/support/proxies

- no better bibtex for zotero 5... Although maybe this has changed recently? Which would be amazing.

> metadata extraction/automated adding really doesn't work right with my research workflow

I'm not sure exactly what you mean, but the primary way of adding items to Zotero is with the Zotero Connector browser extension, which lets you save high-quality metadata and PDFs with a single click from a huge variety of sources (certainly including humanities sources, since Zotero was created by historians). No other tool comes close to Zotero's abilities here. Metadata quality does vary by site, though — Google Scholar specifically only provides limited metadata, so you'll often get better results by clicking through to the linked article and saving from there. We have plans for functionality to flesh out incomplete metadata retrieved from subpar sources.

Zotero can also automatically retrieve metadata from PDFs you drag in, which should work for the vast majority of recent PDFs and many older ones with DOIs assigned, though that's not meant to be the primary workflow.

> documentation for things that would be useful like hooking up to academic library proxies is nonexistent

Current proxy documentation is here [1], and that's what's linked from the main documentation page. I've fixed the outdated page you pointed to — thanks.

Note, though, that the proxy functionality is meant to work automatically for the popular academic proxies, so most people don't need to configure anything to use it. (And as far as I know other competing tools don't offer anything like this.)

> no better bibtex for zotero 5

BBT has worked with Zotero 5 since last year.

Disclosure: Zotero developer

[1] https://www.zotero.org/support/connector_preferences#proxies...

Thank you---this is super encouraging, and a good signal to maybe go back to it.

Incidentally, one thing that would be really helpful for a documentation standpoint would be a series of articles about best practices, like which metadata sources work best.

I recently switched to Zotero from Papers, which is only for OS X and I needed to work on a Linux computer. After Papers was acquired by ReadCube, I was also concerned it would end up like Mendeley, which has a social network model that I don't particularly like.

On the whole, the transition was pretty effortless and I am pleased with Zotero (plus the Better BibTex plugin and webdav syncing). I like having the rss feeds inside the client. The firefox add-on is far superior to the Papers version. And finally, because it's an open source project with a reasonable ecosystem, many of the things I found annoying about Papers have been solved (like customizing which fields to exclude from a bibtex export).

I have been really happy with ReadCube. It's kinda clunky and awkward in some places, at least compared to Papers, but it works very well. I'm a happy paying customer.

Zotero is awesome. 80% of the academic type people I've introduced it to have switched. The other 20% have Endnote stockholm syndrome.

Having recently persuaded my research group leader not to spend >£500 on Endnote licenses, this comment tickled me. It may even be the case that Endnote is now a largely functional piece of software, but its history is too exquisite an exercise in Worse is Better for me to go near it again.

Particularly bench biologists tend to have "The only way to write a paper is Microsoft Word and Endnote". Zotero is considered a bit eccentric and using LaTeX is practically unheard of. As a computational biologist I like Zotero and LaTeX but am often forced not to use them in collaborations.

I wrote my (social science) PhD with LaTex (eventually). Corralling it to BibTeX from Zotero was a bit of a pain, but worked fairly smoothly. What it did achieve was that it made it look like I had paid much more attention to layout and cross referencing than I actually had, which got me through with two nit-picking corrections.

Endnote. Ugh. When I used it a decade ago it was the worst program I have ever used. Try to look up something and nothing happens. Is it thinking? Is it connecting to a server? Did it just not find any results? There's no way to tell because it doesn't tell you anything.

In the interests of nit-picking, back in the day I made some contribution to zotero's back end docs, and I was a bit unhappy with the way the open source project is managed as a potential contributor. But it's still pretty neat stuff.

Important supporting institutions. GUI looks like a lot of overlap with Citavi.


Major differences seem to include cloud based Zotero's advantages for sharing, collaborating, and Citavi's better fine-grained document quote/cite tools.


I am a big fan of paperpile:


I've used Endnote, Mendeley, and Zotero before and found them all to have their own issues. Paperpile is not perfect but it shines in a few places and I really like it. The chrome plugin adds an 'add to paperpile' button to places like google scholar making it easy to add citations.

It also is designed for writing in Google Docs and makes it very easy and quick to add citations. For some, Google Docs may be a dealbreaker, but if you need to collaboratively write/edit a manuscript it's much better than the alternatives in my opinion (esp. because everyone can write/edit at the same time).

Zotero also have chrome plugin that, while I don't remember it work on Google Scholar, I use it all the time on actual journal site (IEEE Xplore, arxiv, Nature, ACM, etc.)

I used Zotero while going through University and it seemed revolutionary at the time. There wasn't really anything else that could properly import the citation info from journals, web sites, etc so easily. Most of them you would have to type everything in manually. Zotero changed all of that and truly made it easy. Glad to see that it has kept getting better and better.

My company's R&D department uses Zotero a lot for our academic publishing. It's a little steep to get started as you build out the database but it's incredibly handy and well worth the time investment.

I've heard of other alternatives being better but shrug Zotero works well for us.

I'd like to switch to Zotero, but I've been using Mendeley for a good 5 years, and built a decent workflow around it. I was wondering if anyone here took the leap and made the switch, and could perhaps point out some of the biggest differences between both tools?

I did, just recently; the biggest difference, to me, is that you have to configure synchronisation of PDFs yourself, if you don't want to use the Zotero web account thing to do it for you. I haven't tried the Zotero account file-syncing option, myself; I just use the Zotfile extension to make it copy the files to a Dropbox folder.

ETA: The firefox integration of Zotero is great, and something I missed from Mendeley. Also, I should say, before I switched to Zotero, I was a Mendeley user for about nine years, and did my PhD dissertation's bibliography using it. I'm quite confident Zotero would have been just as good.

I used to use Zotero exclusively for managing citations in my papers. In the end though, I always had to correct many errors in the reference section. This is not Zotero's fault; the journal sites and indexes from which Zotero gleaned it's information are often incorrect or incomplete. Not to mention that many journals do not follow a standard MLA/APA, etc, but a custom format, which always required me to do more edits. Now days I just do it all by hand. Not sure if things have since gotten better since 4 years or so ago since I last used it.

That said, Zotero as a research tool for organizing papers by topic,etc,adding notes, is just fantastic. I really do love Zotero and it's potential.

I've tried Zotero but have been using JabRef (http://www.jabref.org/) for a good few years now and it works well for me on Linux.

Docear is another interesting project. Unfortunately though, Docear isn't nearly as active as it once was.


I came in just to give a shout out to Docear as well. There is an extensive blog post comparing Docear, Zotero and Mendeley made by Joeran Beel (Docear's team)[0]. I can't recommend it enough. It helped to get the review of my MSc thesis done easily combining a mind map approach to actually cliackable pdf links (in sync in multiple pcs), all this with bibfile integration. I actually blogged about it in portuguese [1].

However due to the literary suite paradigm I would recommend Jabref (which is integrated in Docear) instead [2], if you are coming from Zotero/Mendeley, which is what I Did. Docear alone is capable of much more. I was not aware it is not really active these days. It is really unfortunate.

[0] https://www.scss.tcd.ie/joeran.beel/blog/2014/01/15/comprehe... [1] https://gtpedrosa.github.io/blog/apresentando-o-docear/ [2] http://www.jabref.org/

I use Zotero to maintain a catalogue of the books I own. Being able to find the metadata for almost any book just by typing the ISBN (10 or 13) in the search field is a really nice feature.

I'm a big fan of Zotero. One of the greatest features is that you can use your own WebDav-Server to store and sync PDF files. There is also this tutorial on how to install your own Zotero server [1], but I don't know how to config Zotero to use it. Anyone knows?

[1]: http://git.27o.de/dataserver/about/Installation-Instructions...

Used back in the day Zotero Bibtex export with Markdown and this workflow/template https://github.com/tompollard/phd_thesis_markdown for my thesis. Zotero is very good (especially with extension on the browser, it's like pinning something in Evernote) and the best is: it's free.

Being able to quickly import literature via embedded metadata or specific extractors within zotero is incredibly convenient.

They have a super simple interface to do this from firefox, but I also configured my qutebrowser via a userscript:


I see references here about zotero being able to look up metadata for a pdf.

I'm slightly confused what that means, is that just meta data in the pdf file, that isn't visually visible?

For a while I've wanted to make something that can extract the title, authors, and bibliography visually from a pdf. Is that what zotero can do also?

It doesn't use embedded metadata in the file, which is usually pretty low quality. It looks for various identifiers (DOI, ISBN) in the first few pages that can be used to retrieve high-quality metadata from external services. It also does some analysis to try to identify the title, authors, abstract, etc., and compares those against known metadata for further lookups and/or to supplement the retrieved metadata. (This is a web service [1] because of the database requirements, but we don't log any data about the contents or results of searches, and it's an optional feature.)

For extracting metadata from a formatted bibliography you can use AnyStyle [2], which is a separate service written by a Zotero developer.

[1] https://github.com/zotero/recognizer-server [2] https://anystyle.io

Gotcha, thanks a lot for your reply!

We also used Zotero during our university studies for keeping track of citations. It's a great tool!

I used to use Zotero and Mendeley, but recently switched to Paperpile and haven't looked back. It's a Chrome plugin, and saves all your papers to Google Drive. It has a really nice cite-while-you-write extension for Google Docs that really makes it worth it.

Excellent to see and plan to try it out. I almost exclusively use DEVONThink Pro for collecting, organizing and searching local documents and Web resources. It's a key part of my research workflow, but always open to testing (or creating) improved tooling.

Always been a very happy Zotero user throughout my degree to manage my citations. Without it writing papers would have been a much bigger headache.

Zotero is great, does everything except actually read your papers for you. Although, there is a tool out there now that does that.

Assuming you managed to install your own Zotero sync server (not webdav), does anyone know how to use it with the Zotero client?

The ability to host your attachments (.pdf files mostly) on your own WebDAV server is very useful.

This software is available in 30+ langauge, but not in any Indian langauge. I wonder why developers Ignore translation of softwares in Indian Languages. If dev team is reading this, please contact me. I would translate this software in Hindi language.

Update: I wonder why this comment got downvoted any without explaination. HN is not same, as it used to be.

>> I wonder why this comment got downvoted any without explaination. HN is not same, as it used to be.

Because this is a freeware program that is community-driven and you're blaming the developers when in reality it is the community that is failing to provide translations; largely the Indian language community, I might add.

Translators are welcome to help. Read this:


Thanks, I will do the the translation in my free time. I hope other people also join in.

Same reason booking.com shuttered the Hindi version of their website, educated Indians speak English and are very comfortable using it. When politicians have Hindi speeches written in Latin alphabet instead of Devanagari it shows something of the grip English has on the South Asian subcontinent, not just India.

I don't want a third party cloud-based solution and Zotero is not clear on where your data is by default. So I went with JabRef (http://www.jabref.org/)

it says literally on the Zotero front page that all cloud functions are optional.

Yes but it isn't clear to me whether it's enabled by default.

Happy zotero user for the last decade here.

Feature request: integration with Sci-Hub

That's not gonna happen, unless they dislike being sponsored/funded.

reminds of an ancient site called diigo.

Still operating.

Does it only store the metadata or does it have the power to store the full text, whether pdf or website or other document?

Could you use this like a digital "commonplace book"?

It allows you to upload PDF or website snapshots, and by default an account is granted 300 MB free storage. They also have paid upgrades:

2 GB: $20/yr 6 GB: $60/yr Unlimited: $120/yr

I've had the paid 2 GB plan for a while, I've got about 800 MB stored with them, and I've been very happy with the service.

Between the Firefox connector, the automated PDF OCR metadata lookup, and the ability to pull in metadata from DOIs, Zotero makes indexing things insanely easy. It's like MusicBrainz for articles. I'm always surprised how many obscure PDFs it correctly recognizes and indexes for me.

Personally, I just sync metadata and use my institutional Google Cloud storage for PDFs. A little more convoluted (I think this wouldn't work without the ZotFile plugin) but it appears to work across my 2 devices.

You can store pdfs and other article files, and sync them across devices. The first 300MB are free, and there are 2GB, 6GB, and unlimited-storage paying tiers.

Actually you can couple it to any WebDAV compatible cloud system. You can connect it to your own Nextcloud instance and have the limitations be your hardware.

You can also just set up a WebDAV server and sync that across devices. Works pretty damn well.

Huh, that's interesting! Do you know of a blog post where this is spelled out?

I use my Nextcloud install (since it provides a WebDAV interface).

In the options under the Sync tab, I select WebDAV as the File Syncing type and for the URL I put:

with the appropriate credentials.

Works flawlessly.

Thanks a lot :)

I set it up to sync the full-texts via syncthing, has been working flawlessly.

Mendeley is supported by Elsevier, closed-source, and has premium-only features only for institutional users. Its publisher affiliation is important, when you see things like "Mendeley supports responsible sharing."

Zotero is the full package, GPL, the only premium service is extra data for syncing with their server (not necessary if you only have one client). None of this Orwellian hinting about keeping researchers from using libgen.

Yeah. Was just sharing it because it's what I used 5 years ago for my PhD. Worked well at the time and I found it useful. I don't think Elsevier owned it then. Got heavily down voted so I'll take the hint :)

Reading about Elsevier now and I understand the objection. Thanks.


It's an Elsevier product, so no interest there.

I'd rather just use JabRef http://www.jabref.org/

They are direct “competitors”. I find Zotero more power-user-friendly. For instance, there is a powerful bibtex plugin for Zotero that can run a server on localhost, so I can fetch the up-to-date bibliography for any particular collection from a make build with curl. I don’t see this hapenning ever in Mendeley

But unlike mendeley, it doesn't send you email advertising for PhDs based on the papers you have been reading it (getting it totally wrong I might add)

according to wikipedia, zotero was first released around a year before mendeley

It is ok not great. Lack many features. If they make app they can bring in many features from other citation managers very simple way. But all the citation managers out there half cooked softwares and so is this one. It is just a very basic tool with no features. Mendekey is better, papers is good, endnote is good but all those are expensive and not free except mendeley. So if they want it for just a regular bare minimum use tool then it is ok but otherwise power users and more trouble free versions are as mentioned above however these paid softwares also lack serious features even after paid. So you can’t expect much from free versions. Also zotero design is not sleek clean. It looks very raw and clumsy. Which can be easily improved by better interface design.

people should try readcube.com, it's not free, but the best I can find to organize papers/references from any platform (iOS, android, pc/mac/linux).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact