I see they have enabled Snowball stemmers. I wonder if using other Lucene analyzers such as Voikko for Finnish is feasible. Snowball wasn’t particularly good when text get complex. I used to deal with Lucene and Solr way back when. Based on the OP I see Graal requires change.
I'd take look.
I realize some relatively obscure Finnish stemmer and Lucene with GraalVM aren't exactly a common use case. I did some testing and provided my use case. I certainly have much English language content to search with using lucene-grep. So, thank you for making it!
Is it automated in some way during web browsing, remembering to copy to a folder when you enjoyed it enough, or do you use a reading app/e-reader to read them so they're already downloaded
I wrote some Python to drive Selenium to get the URLs (not the full text) from Instapaper, then pass those URLs to newspaper3k, where a lot of the downloading and parsing work is done. I then save the output to SQLite. From there I was previously having ES build indexes but recently just switched to hosted Algolia, which seems to be basically free for my use case and has some nice libraries for building real-time search front ends too. I’ll be trying lmgrep as a substitution though.
The key thing about searching the text of articles you’ve read is that you want an intelligent ranking of all articles that bear on a subject, in order of relevance. That’s not something you can get with grep/ripgrep. ES is pretty good at it out of the box. But it’s also a pain to set up and run - you’ll probably end up needing something like Docker.
There are a thousand different ways you could do something like this - this is just the way I do it.
I guess that's what Google uses internally? Is there an open source alternative?
 - https://kythe.io/
There is also Sourcegraph.
Except rather than build an index I brute forced the search each time. For most repositories it’s fast enough even with ranking.
https://github.com/boyter/cs For those interested it’s still very WIP with noticeable issues in TUI mode.
"Then the most complicated part was to prepare executable binaries for different operating systems. Plenty of CPU, RAM, VirtualBox with Windows and macOS virtual machines, and here we go."