
Lunr.js - Simple full-text search in your browser - jchapron
https://github.com/olivernn/lunr.js
======
kristopolous
I see you are using my stemmer implementation ... snowball and porter2 are
better - git clone that instead.

In fact, if you had poked around, you probably could have snagged about 90% of
this code from various projects ... too bad I didn't put it together like you
did.

Ah well ... internet fame points to you I guess.

~~~
jchapron
I'm not the author, I just stumbled on the project and found it interesting. I
guess you could suggest it to him on twitter @olivernn

------
FuzzyDunlop
There's a detailed write up from the author about how it all works at:
[http://blog.new-bamboo.co.uk/2013/02/26/full-text-search-
in-...](http://blog.new-bamboo.co.uk/2013/02/26/full-text-search-in-your-
browser)

~~~
jchapron
There's also a website with docs and an example by the author here :
<http://lunrjs.com/>

------
smagch
As for server side, reds is a simple full-text search module of Node.js.
<https://github.com/visionmedia/reds>

------
tantalor
How does this compare to <http://reyesr.github.com/fullproof/>?

------
slashdotdash
I created a small Jekyll plugin to add full-text search using lunr.js for the
generated, static sites.

<https://github.com/slashdotdash/jekyll-lunr-js-search>

------
augustl
I think I'll put this to use in an internal admin/support system in need of
search. All the data is on the client already (AngularJS), and it's less than
thousand docs for now.

------
ErikRogneby
I am sure there are some applications that might need this due to obfuscation
of data from the users... But doesn't the browser already have full text
search? (Control-F)?

~~~
nzadrozny
Full text search usually presumes an index, for a lot of functional
differences compared to the browser's naive substring-matching Ctrl-F. And any
proper search index is going to be a better user experience than naive string
matches.

I haven't read through all of Lunr's docs and source, but based on my
Solr/Elasticsearch experience, I'd expect to see (in time)…

Tokenization and (presumably) term normalization/analysis; a faster and
smarter query language, for term order independence and boolean combinations
of clauses; relevance scores and maybe even score boosting per field.

Better queryability really shouldn't be understated here. Just having term
order independence focused on a specific set of JSON is going to be way better
than naively matching any substring on the entire rendered page.

~~~
olivernn
That is almost exactly what lunr is doing. It tokenises the input text, stems
the tokens and filters out any stop words. The index it can be searched, the
order is not relevant, a prefix search is currently used so that you can find
documents containing terms without having to type the whole term exactly. The
matching documents are also scored as to how relevant they are to the search
term.

In the future I want to add even more powerful querying, restricting search to
specific fields, taking into account the distance between terms, and adding
faceted search to reduce the total documents being searched over.

One of the original goals of the project was specifically to provide a better
alternative to just using the browsers built in find-in-page functionality

------
tantalor
> A browser is required for running the tests.

Why? This is a red flag.

~~~
olivernn
Why is it a red flag for a browser-based javascript library to require a
browser for testing?

~~~
tantalor
Continuous integration usually run JavaScript tests in browserless-
environments.

~~~
altcognito
While I agree with downstream comments that you _can_ run tests headless or
browserless, arguably, you're not really testing the user experience until you
execute it the way that the user will execute it. Perhaps this is a case of
perfect vs. good enough.

------
hajrice
Any stats on the limitations ?

I'm wondering how efficient this would be given that indexing a lot of data
via javascript might really not be a good idea..

~~~
piranha
Maybe you can create index on the server and then just load it on a client?

edit: looking at the docs it's unclear if it's possible. I guess index should
be a JS object, so it's pretty simple to save it to a disc and then fetch it
from client.

~~~
olivernn
Since the library can be run outside of the browser (using node.js for
example) the index could be generated server side, and then just passed to the
client. I hadn't considered this before but it might be worth looking at.

------
pudo
Depending on the performance of this, it might be awesome to have some
serialization format (i.e. inverted, normalized, tokenized JSON).

------
BaconJuice
How do you index your pages? is that a manual process by creating your json
file to be read?

------
napoleond
This is amazing, and perfect timing for me. Thanks!!

