They're very similar and share a lot of principles, though Matthias went full-on towards the algorithmic aspect and I focused on the experience of including the UI (copy-pastable from the code on the home page) and building a search index.
I think WASM-aided in-browser search is really exciting! There are clear benefits for content creators (embedding search on a Jamstack site has historically been tricky) and for users alike (Caching & offline support is pretty rad if your users are doing a lot of searching). I'm excited to see Matthias' project get attention here!
It's becoming so easy to compose software from available open source components, and migrate functionality (like full-text search) to different layers of the stack (and that's fantastic!).
It's just tricky to keep all the requirements and constraints (and implications) in mind when selecting the appropriate libraries :)
Yup. Every highly paid professional is highly paid for their knowledge and expertise. They need to be aware of all the relevant solutions, and the best way to apply them.
Top lawyers know about law, top doctors know about medicine, and they farm out the work to their less-highly paid nurses and paralegals.
When you reach the top of your field your main value is knowing what's out there and how to implement it, making sure more junior employees don't make mistakes along the way. (That is, if you stay on the IC track and don't go into management.)
It makes me wonder whether some/all of this can be encoded into an actual search engine or constraint satisfier. Arguably it's the dream of the Expert System all over again.
 - https://en.wikipedia.org/wiki/Expert_system
Someone should define a common API, and every language should adhere to it (or risk not be taken seriously). This is not trivial, since some languages have garbage collection, but it should be possible.
A more flexible – though less efficient – approach would be a service-oriented protocol. You'd send requests in the form of messages (binary or text) over a byte-oriented bidirectional channel and receive the replies on the same channel. Unfortunately this approach would require more code to set up than primitive  function calls, and fine-grained interaction with the library would be harder.
 "primitive" as in lower-level, not as in dumb.
Edit: From the perspective of the interop protocol, it wouldn't make much difference if the library runs in the same address space or in a different process. Large blobs of data, like an picture or a long string, could be passed via pointers (in the same process) or via shared memory (in different processes).
You can store and manipulate C pointers within Python.
The other way around should also be possible: manipulate references to Python objects from within C.
I suppose this could be extended to Java objects, etc., and it could be based on a single API.
Genuinely curious, because I don't fully understand this myself but the idea is interesting.
Garbage collectors ruin embeddability which is why people have been using C/C++ for all this time. The alternative is to serialize and deserialize like you are doing network communication.
Even something relatively simple like generics just works so differently depending on the language. C++'s templates are just fundamentally incompatible with how Java generics work.
Is that too architecture-specific? You could force all programming languages compile down to Java byte code, but that would be too restrictive. Not everyone wants to be bound by byte code.
You could create a language that aims to accomplish what you describe, but then you'd just have yet another programming language. (Insert relevant XKCD here.)
And humans either aren't smart enough to create a perfectly flexible language without warts that would work for people doing dramatically different things, or there are too many tradeoffs for those people to share the same den. Someone who has thought more about the philosophy of programming languages probably could answer that better than I could.
0 - https://github.com/lucaong/minisearch
1 - https://lucaongaro.eu/blog/2019/01/30/minisearch-client-side...
What began as a simple side project 4 years ago has consumed a significant part of my free time over the last couple of years.
Web assembly is certainly going to open a lot of new avenues for doing interesting things on the browser.
If you tweak the search box so it doesn't do anything until you've typed at least 2 or 3 letters, you could then serve regenerated son files that only contain that matches that have that prefix... No need for any of the json payload for words/phrases starting with aa..rt if someone's typed "ru" into the search box.
That means you'd have 676 distinct json files, but you'd only ever load the one you need...
Seems to me that often, though not always, this network request would happen whilst the user is typing — say, busy typing chars 3, 4 and 5. That the netw req won't be noticeable for the human
And, if typing more chars, or backspace-deleting to fix a typo ... no netw request required.
And the same scenario, if typing a 2nd word.
I'm guessing in like 90% of the cases, it'll seem as if there was never a netw req.
I'm continually amazed at how featureful SQLite is.
- Stripping all whitespace from formatted JSON can make a huge difference
- Making property names shorter (these get repeated for every element in a large dataset!)
- If your data is relatively flat, you could replace an array of objects with an array of arrays
Or you could go all the way and serve data in CSV format, which is very space-efficient and has the neat property of being tolerant of being broken up into pieces. Though it may not parse as quickly, since JSON has native parsing support from the browser.
* server side WASM such as cloudflare workers and kv to build and maintain the index
* streaming copy of the simplified index to be pulled in by a browser-side wasm
* queries that go beyond the simple index forwarded to the worker
One way of simplifying would be to limit search terms to a certain length, or only expose the most popular results.
By sharing wasm code the format can be optimized and not require a compatibility layer or serdes.
I guess for bigger indexes not gonna work out, as the payload will be huge and it pushes all the work to the client.
Still very impressive work, and gives me a new reason to learn Rust.
+ plugin support
+ large community with lots of themes/plugins
- need to install ruby and dependencies
- slow to build large sites
+ easy install (precompiled binary)
- smaller feature set and community
- no plugins
A word "elasticlunr", appears in the linked article, and the linked article appears in search results, but searching any partial string such as "elastic", "elasticl" "elasticlu" and "elasticlun" will not result in finding the linked article. Perhaps this behaviour is intended by the author, but it may not be intended by the various users of the site.
> elastic* and elasticl*
does find the linked article, but
> elasticlu* and elasticlun*
The reason is, that I'm working on decoupling the search frontend from the JSON search blobs. Want to make the frontend-part installable through npm as well (and not just cargo as it is now). Didn't get around to adding the search index generation to Github actions yet due to limited time.
Here's the pipeline if you want to give me a hand and add the tinysearch build: https://github.com/mre/mre.github.io/blob/source/.github/wor...
For dynamic content, I love Gatsby. It can read from whatever you throw at it: Files (Markdown, plaintext, YAML), APIs (e.g. Github),...
I'm not a frontend dev so my React knowledge is quite limited, but it's enough for simple components.
Sites I'm maintaining with Gatsby: https://analysis-tools.dev/
I haven't used Gatsby.js as doing React/GraphQL for my static site isn't really something I'm interested in. If your site is very dynamic/pulls from various remote sources or just want to use React it's better than Zola.