
Show HN: WorldBrain – full text, local search of your browsing history - AJRF
https://worldbrain.io/#
======
diggan
One of the reasons I keep coming back to Firefox is that Firefox seems to
already be doing this. At least it searches all parts of the URL and the
title, so I'm 99% successful at getting the right URL in the awesomebar when
I'm looking for something.

Chrome is just horrible when it comes to this, and I can never get back to
previous pages when searching via the addressbar.

~~~
albertgoeswoof
Chrome does this deliberately because they want you to do another google
search instead of looking at your history.

~~~
jklehm
Citation?

~~~
boramalper
Well just try it yourself, it’s really that simple.

On the other hand, Chromium (the open-source version of Chrome) does this too
so maybe it’s simply a UX decision rather than an evil plot to send more data
to Google, though I personally believe that it’s the latter…

------
agotterer
I’ve tried to find a solution to bookmark searching for a while. I’ve never
found a product I liked or trust. Lately I’ve been manually adding bookmarks
to a custom google search engine. I’m considering building an extension that
will add them directly or sync chrome bookmarks. I figure google already knows
what I’ve searched, so I feel much less sketchy about it.

Would this extension be interesting to anyone? It would be very simple, open
source, and have no middle man. It would send links directly to a google CSE
via their API.

~~~
JohnStrangeII
I would have privacy concerns, because I'm not using Google all the time for
my searches and there is no need to give them even more data.

The only solution I'd accept is one where all data is stored and indexed
locally and there are good guarantees that deleted entries are actually
deleted and/or wiped from the indices.

~~~
agotterer
I’m not suggesting that you have to use google for the originating search. But
you would need to use CSE to index any content you wanted to search for later.
Arguably you could do this with a new google account if you’re concerned.

~~~
ChristianBundy
A few years ago I would have been excited about this, but I _personally_ won't
be giving Google any more data unless I'm forced to. I'd love a self-hosted
and local-first option, but either way I wish you good luck on your potential
project.

------
Pistos2
I get the point of this, but me personally, I prefer to have aspects of me
forgotten/gone rather than remembered, stored and searchable in the future.
Yes, it's true that, about once every two months, I am looking for something
that I swear I came across on the Internet at some point. However, the rest of
the time, I'm able to re-find it just by doing another search, whether on
search-engine-of-choice, or a search box on particular-website (e.g. socnet,
stackoverflow, reddit, github, hacker news...).

------
AJRF
Extension source code ->
[https://github.com/worldbrain/Memex](https://github.com/worldbrain/Memex)

------
mickelsen
To think that the classic Opera browser used to do this within the search
bar... man, good times those.

------
DyslexicAtheist
this project has been around for a while, see also some interesting comments:
[https://news.ycombinator.com/item?id=13427360](https://news.ycombinator.com/item?id=13427360)

------
xtiansimon
Looks like I can mark my Softwarerecs question as answered. hahah.

[https://softwarerecs.stackexchange.com/q/46270/16751](https://softwarerecs.stackexchange.com/q/46270/16751)

While I've been waiting for this, I've been using 'Export History (2.2)'
browser extension in Chrome. This saves your history to CSV or JSON.

~~~
xtiansimon
OH! Not so fast. After a test drive, I think the Results page needs to have a
button to use the same search on the web if no (useful) results are found.

------
eloff
Nice! I hacked something together to do this 12-odd years ago. I had something
to save the pages I visit to folder, and then a desktop search tool dtSearch
that I bought to index them. It was invaluable when I really needed it, but
too awkward to be really useful. Often it ended up being easier to find the
page again with Google if I remembered something unique about it.

This extension looks like it could finally make it convenient enough to be
more commonly useful, and privacy focused enough that I'm willing to try it.
Great work! Are you planning to charge something for this in the future, or
for extra features? I would definitely be willing to pay for it.

~~~
RealWorldBrain
Hey eloff,

Oli here, from the team developing Memex. Memex is open-source, so the browser
extension will always stay free to use.

What we will charge for are some of the services that require us to host
stuff. Like backups, multi-device syncing, API calls etc. We will run it as a
completely modular pricing model, where you can upgrade on only those features
you need. We don't like those usual 3 tier model, where you have to upgrade to
the 'monster mega plan' in order to just get one feature :) But you can also
completely self-host that, as we will make the server software open-source as
well.

Hope Memex can be useful to you. We are running a crowdfund to support its
development, where we offer some good discount on the future features in
return: worldbrain.io/pricing

Let me know if I can be of more help.

------
kovek
Seems like the Falcon extension has not gotten any updates in the last year or
two:

[https://github.com/lengstrom/falcon](https://github.com/lengstrom/falcon)

------
jaytaylor
The landing page emphasizes the word "focussed", with double S's, which didn't
look right to me. Apparently both "focused" and "focussed" are acceptable,
with the single S version "focused" being highly preferred [0].

[0] [http://www.future-perfect.co.uk/grammar-tip/is-it-
focussed-o...](http://www.future-perfect.co.uk/grammar-tip/is-it-focussed-or-
focused/)

~~~
Improvotter
TIL, was about to say the same thing.

------
mark_l_watson
I nice idea, but an open source browser plugin with all local storage would
fit my needs better.

I prototyped something roughly like this several years ago. I wrote a simple
Firefox plugin that communicated with a locally running server written in
Closure with a Clojurescript web app for browsing that used the same server
backend. I stopped working on the because services like Evernote do a better
job, at the loss of some privacy.

Edit: I didn’t intend to imply that Evernote reads or uses user data.

~~~
olejorgenb
[https://github.com/hedning/recoll-web](https://github.com/hedning/recoll-web)

~~~
thatcat
The description is horrible on their github page, do you know if this
integrates your browser history into recoll or what exactly?

------
jarsta
It would be great to be able to change the search engine keyword from 'w' to
something else. My brain has hardwired that to the Wikipedia search.

Otherwise, I think this is super!

------
JetSpiegel
This seems a very interesting project. I'll see how this works in practice.

It starts to get icky when you notice the devs have a business plan to
outsource the indexing to the cloud, but there's a commitment to keep the
servers parts open source too, so that you can self host.

~~~
RealWorldBrain
Oli here from the team building Memex.

Yeah indeed, having stuff in the cloud is not ideal when it comes to privacy
and centralisation.

One of our core values is privacy and data ownership. So we do our best to
make our business not dependent on (analysing and selling) your data, and
instead provide you with service value you're willing to pay for. We are built
with interoperability in mind, that will allow you to switch providers of
Memex and Memex Cloud without frictions, in case there are breaches of trust,
or simply better service.

We follow these values by currently building for offline first usage, where
your data is locally indexed and searchable primarily. With our search
technology you'll be able to get up to 5 years of your research done in the
browser.

For the cloud part, it is unfortunately not yet possible to do performant
search on encrypted data, otherwise it would not be such a big issue to have
your index in the cloud. Equally unfortunate is that it comes with a lot of
drawbacks to replicate all your data on all nodes, as opposed to have a
central point to query. Especially when we are looking at phone usages. There
it is really not practical, so there is a need to have some sort of cloud - UX
is still very important. Most people can't be bothered with the drawbacks of
decentralised and distributed systems (yet). We hope to get that switch in
multiple smaller steps that guide (non-technical) users through a smooth
transition to a Memex system that is as distributed as possible. (Check out
Dat [https://datproject.org/](https://datproject.org/), a technology we likely
use to make that first step possible)

And as you already noted, this stuff will be self-hostable. We see ourselves
as a service provider first and want to serve people who can't/don't want to
run their own server. A bit like the Wordpress model.

You can read more about our approaches to running this business in our vision
post: worldbrain.io/vision

~~~
JetSpiegel
Thanks for the long response.

I was wondering about mobile usage too, and agree that having a server
somewhere available for queries is the best solution. Since this deals with
such private data, having a possibility for self-hosting is the correct
solution.

------
fsiefken
Wouldn't it be a better alternative to use an autosave plugin for the
bookmarks and use a local indexer like DocFetcher or OpenSemanticSearch
through the browser? Then you can also search other resources and files.

------
stewbrew
"Search every word of every website you visited"

That will most likely return a lot of hits for sites I didn't like and would
like to forget about.

~~~
CGamesPlay
But a normal web search already has this trait, which is presumably how you
ended up on the garbage site in the first place. If you eventually found your
result, a more narrow search can help you re-find it a few months down the
road.

~~~
RealWorldBrain
Oli here, from the team developing Memex.

Yeah indeed, just full-text search can let you end up with a lot of garbage.
This is why it is so important, that you can search for various other "vague
memories" to narrow down your search.

What you often remember about an article is stuff like: Did I bookmark it,
when did I visit it, did I like/share/cite it on social media You can already
filter by time, tags, domains, bookmarks, and soon also if you liked/shared or
even seen it in your newsfeed, or on a friends wall, on Twitter and Facebook.

We gradually expand it so you can search with as much of your associative
memories as possible.

------
samat
I was imagining this for a long time. Wondering, how much more space/energy
Chrome uses with this extensions enabled.

Any research on that?

~~~
samat
[https://worldbrain.io/#faq](https://worldbrain.io/#faq)

Info on RAM/CPU/Storage, but no info on energy pressure.

------
NeedMoreTea
Sounds like the very first version of Google Desktop Search before they turned
it into a widget engine.

------
warent
Nice, now you just need to enable a feature to search anyone else's browsing
history

~~~
RealWorldBrain
Oli here from the team building Memex :)

Good news: that's the plan! worldbrain.io/vision_deck

------
agussell
Sadly there is a lot to improve to be useful. I had horrible results with the
searches.

------
iovrthoughtthis
Ive been building my own for a while. Will definitely give this a go!

------
pranshuchittora
for researching on a topic and bundling them.

------
0xkeff
how do you solve the problem of banning / not a robot captchas after to many
requests to a site while indexing?

