I think your project is great for how it does so much with so few resources, but most big search engines have an index which is more than 1,000X as large as yours (per document), in order to improve results.
It's alive and well. The TIOBE index still lists it ahead of Ruby, Swift, Objective-C, GoLang...
And I started this software 20 years ago. Granted, a LOT of the software has changed since then. But I don't see a reason to throw away existing code unless it is in need of so much change that rewriting from scratch would be easier. And even then I might stick to what I know best, and what fits best with other parts of the software.
File formats will be documented when I publish the data-files in a few weeks.
What do you mean with postings?
The main index is split into 32 shards (there is also an additional news-index which is updated about every 5-10 minutes). Each shard is updated and queried seperately. The query actually runs 2/3 on a Windows server and 1/3 on a Linux server. The latter in Docker containers. I want to move everything to Linux over time.
Query has two phases. First only a rough - but fast - ranking is done. Then the top results of all shards are combined and completely re-ranked. This is basically a meta search engine hidden within.
First query phase is in src/searchservernew.dpr, and the second phase is in src/cgi/PostProcess.pas.
Thank you. "Postings" is another word for the format of the doc ids and related information in the inverted file. A google for "inverted index postings" will turn up a bunch of references.
A free search API will be fully available probably next week. It's in testing already. It's just a matter of putting the finishing touches on the documentation.
And the crawl- and index-data will be available for download in a few weeks. It's also just a matter of documenting the data-format.
BTW: I disagree with your points about privacy. I see DeuSu as a way of fighting back.
Originally it was written in Delphi. But I now use FreePascal for the development. I'm even compiling both Windows and Linux versions on my Linux machine.
The snippets are currently the first 255 characters of the page's text. For snippets to be customized to the search term, I would have to store all the text of the page. And that would require a lot more disk space. Space that I can't afford at the moment.
https://deusu.org