Hacker News new | past | comments | ask | show | jobs | submit login
Recoll – Full-text search for your desktop (lesbonscomptes.com)
125 points by nanna on Dec 1, 2022 | hide | past | favorite | 33 comments



Invaluable tool. Can handle massive collections of files. For power users it feels more appropriate to have a dedicated search engine application like recoll rather than embed search in the desktop (for a long time I was disabling baloo on KDE desktops as it would make the machine unusable while it was indexing).

The recoll UI could be improved and in general the integration e.g. with python scipting or other tools could be made easier but this is a much appreciated project and it is good to see it keeps being developed.

In an ideal universe projects such as this should converge with other desktop apps to create truly empowering information tools (repatriating the agency that has been relinquished to the "cloud")


The resource use has been the death of any full text desktop search I have tried to use. How lightweight is Recoll? I don't see much on this on the website, other than I have to use an third part CPU limiter if I want to limit CPU use.


Both recoll and Xapian (the index engine) are written in C++. The document filters are in Python but they only run once per document and tend to be simple and fast (easy to add your own too btw). For my use case of 15GB or so PDF files it has been lean and fast. It has a pragmatic Unix tool feel. I am only using the CLI though so can't speak for the GUI.

Another Xapian-based tool I use is notmuch (email) and that one is very snappy too.


never had to dig too deep as it is not noticeable (up to a certain index size of course https://www.lesbonscomptes.com/recoll/usermanual/usermanual....). index scheduling can be configured (cron) but for my purposes it works ok out of the box.

what is quite handy usability-wise: incremental index updates (after inserting new files) are fast and can be done on the fly while fully using the desktop


Recoll is not just useable for desktop applications, it can also be used as a local web search engine through recoll-webui [1] (link goes to my own repo which has some modifications to make it work with the Searx/SearxNG engine) which in turn can be used as an "engine" in Searx and SearxNG through the recoll engine [2] (which has been merged so it is no longer necessary to pull it from my repo).

This last option makes Searx/SearxNG useable for all types of searches, both local as well as remote. I've been using this exclusively for many years now over a large collection of documents (about 600.000 entries) with good results.

[1] https://github.com/Yetangitu/recoll-webui

[2] https://docs.searxng.org/admin/engines/recoll.html

[2] https://searx.github.io/searx/admin/engines/recoll.html


Also for MacOS and I was wondering what benefits Recoll would give. From the website:

> It seems that Recoll will sometimes find data that Spotlight misses (especially inside pdfs apparently, which is probably more to the credit of poppler than recoll itself).


I have some additional text search tool for MacOS. It was slightly better than Spotlight but I used it so rarely that I forgot it’s name. The thing that I miss most often in MacOS is - “which movie file has that scene that I remember so clearly”. Isn’t it time already to do something like that? That would be a noticeable breakthrough indeed. No?


I'm not aware if it exists in software form yet, but presently your use case can be solved by having a mutually advantageous social transaction with a /r/tipofmytongue user who is happy to help by recalling the title of the movie (or similar) you're thinking of.


Thank you for the suggestion. Not all my video files are publicly available movies, though. And sometimes I know the movie, but am struggling to find the exact time the episode takes place.

But TIL that some software already exists - right on the iPhone’s Camera app: https://support.apple.com/guide/iphone/use-voiceover-for-ima... Use VoiceOver for images and videos on iPhone. We are almost there!


Interesting. But more something for a website like IMDB I guess?


Spotlight misses inside pdf? Bit strange when it can search text inside images.


Everything (https://www.voidtools.com) can also index file contents


Everything indexes everything EXCEPT contents, unfortunately.

https://www.voidtools.com/faq/#does_everything_search_file_c...


Also, "Everything" is only available for Windows and is closed source, you can't add support for document types you care about.


There is FSearch for GNU/Linux: https://github.com/cboxdoerfer/fsearch


Can this search browser history? I've seen browserparrot[0] for this purpose but it's a bit abandoned from what I can tell.

[0]https://www.browserparrot.com/


Supported "format": https://www.lesbonscomptes.com/recoll/pages/features.html#do...

If I recall correctly, Firefox keeps your browser history in sqlite, seems trivial to add your own "doctype" in order to support it, as it's open source (https://framagit.org/medoc92/recoll) and written in a modular way (check the pdf handler as an example: https://www.lesbonscomptes.com/recoll/usermanual/usermanual....)


Use a browser extension like SingleFile to save pages you want to refer back to later to local HTML, then let Recoll index them.

If you have something doing this to every page you visit, and Recoll can see it, then Recoll can index it.

Regarding automatically saving every page you visit, there's multiple tools that do this. One I played with and liked - but I can't remember the name - it's 5 numbers and refers to a port you can type with localhost to search through all recorded pages. That or something like it would work really well with Recoll.


If you are on macOS, HistoryHound is a decent browser search. It supports many browsers and even text you may just want to search from a directory.

https://www.stclairsoft.com/HistoryHound/


If you like Recoll you might also like Docfetcher.


I use this script to make recoll produce pdfgrep-like output so that I can use it with Emacs and pdfgrep.el. This gives a nice interactive way to wade through thousands of pdf files.

https://github.com/jeremy-compostella/pdfgrep/pull/8#issueco...



I use HoudahSpot and/or DEVONthink for this on Mac. Can anyone knowledgable tell me whether I’m missing out on something by not using Recoll instead?


Interesting I'm going to have to try this out. By description it reminds me of everything


Not bad, I wonder if someone made something similar for ripgrep.


Yes, I haven’t tried it myself yet.

https://github.com/phiresky/ripgrep-all


What's the advantage over using plocate/rg?


indexing (for rg) and content search (for plocate)


On Windows I have been using Listary because it integrates into all file dialogs and Windows File Explorer. The killer feature is that it will navigate in file dialogs to where you are in the file explorer. Makes saving files so much easier.

It doesn't index contents just filenames so it is fast.

https://www.listary.com/


former listary user here: pro tip, switch to fluent search. it is even more powerful. and for directory switching i found "direct folders".



I am bit surprised by the downvotes. I guess it seems that people seem to be interested in getting a full text search, which for me has never really worked well because results are too noisy (thinking back at least to solutions such as copernic).


Yeah, there's a number of options for searching by file name etc, everything works well for me.

Searching within file contents seems to lack good options right now. Historically you had X1 desktop search, Google had a desktop search product, I think copernic. But most seem to be out of date.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: