Hacker News new | past | comments | ask | show | jobs | submit login

What do y’all like for searching by file contents? For Windows and Linux?



I use recoll[1] as my main homedir index/search tool. It supports many document formats and digs recursively into archives. It has a no-nonsense GUI, and a simple CLI interface. The search syntax is easy and flexible, allowing searches by many kinds of metadata, in addition to simple full-text search. It's slow with my collection of ~2.8 million files and the index on spinning rust, but it's thorough and reliable.

For email, I store everything in mboxes, and index/search with Mairix[2]. It's wickedly fast and the search capabilities make gmail blush. I use a little script to search:

  if mairix -o $$ $*; then
      mutt -e 'set quit=yes' -f ~/Mail/$$
  fi
  rm ~/Mail/$$
It's very common that I want to find an email from sometime in the last month, so my most common search is something like:

  search d:1m- alice
And I'll have an instant view of all emails from alice in the last month.

  [1] https://www.lesbonscomptes.com/recoll/
  [2] https://github.com/rc0/mairix


Recoll is the best i found for pdfs on linux. Recommended.

For everything else that's just plain text, ripgrep or ctrl+shift+f in the IDE.


There are great CLI tools already in this thread, but for some of my side-gig work I'm searching large piles of PDFs, docs formats, and ePubs with a GUI word processor open and need to reference the source by page/graf number. For those I use DocFetcher[1], a quirky and intermittently updated Java app that indexes file contents and provides rudimentary relevance searching along with regex. I index my docs, put the database it generates into a read-only shared directory, and point systems across OSs at that db so I can search quickly regardless of which box (or where) I'm working from, or can toss the app, db, and docs onto a thumbdrive for portability.

There's a commercial version that prioritized bugfixes, making the free and open version less attractive than it used to be. But it's still one of the better tools for the job when you want more than a grep-equivalent.

[1] http://docfetcher.sourceforge.net/en/index.html


Interesting.

At my previous job I created something similar for a recruitment sister company. They had a ton of CV's in all kinds of formats (Word, Excel, PDF, rtf, plain text etc). I used Lucene.NET to do the indexing.

Both companies no longer exist and I've needed to find some text in docs of my own. If I have a bit of time I could recreate the app pretty easily.


On Windows I use Agent Ransack. I don’t know if it’s the best but it works well and predictably. Unlike the built in search of Windows where I still don’t understand how to reliably search for something


I have used this too and found it to work well. I think they shot themselves in the foot with the name they chose for the program, though.


For source code specifically: ggreer's silver searcher: https://github.com/ggreer/the_silver_searcher


ripgrep has been a godsend for my bash workflow. It feels impossibly fast when used on git repositories. The caveat is that by default it omits .gitignored files, hidden files/directories, and binary files.


rga

It works both on Windows and Linux and can be pretty fast.

https://github.com/phiresky/ripgrep-all


These days I usually open the folder in VSCode and hope.


git grep is the best option, in my view

if you want to go beyond, advanced stuff like elastic search might be necessary, though




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: