
Memex: Browser Extension to full-text search your browsing history and bookmarks - kick
https://github.com/WorldBrain/Memex
======
dyeje
This looks cool. It reminds me that someone did a lightning talk at RailsConf
2018 about a Memex app they built and it was pretty mindblowing to me at the
time:

[https://youtu.be/Ld414EypQzg?t=2709](https://youtu.be/Ld414EypQzg?t=2709)

------
h2odragon
Tried using this for months; it never worked well. Settings changed back to
"intrusive & useless" every update (seemed like) and the few times i needed it
-- it hadn't been indexing anything for the past week, sorry.

Hope it's working better now

~~~
BlackForestBoy
Oliver here, from the Memex team.

Yeah we know it has not been really well functioning in the past few months.
Had a lot of bugs.

We finally got a bigger round of funding and are now able to fix those. Until
February our main focus is improving release stability through fully
integration testing the UI and backend, fixing all major bugs, improvements to
UI/UX and the release of the mobile Apps+Sync.

Too bad this post has been upvoted so much now, a few weeks later things would
have already been much much better :)

Regarding the bugs you mention I would like to know more if you could help me
out here: 1\. Which browser and system are you working on? 2\. When you say
the settings changed back to "intrusive & useless", what do you mean by that?
3\. When you say "it never worked well" what were you expecting to be working
but it didnt?

Thanks and sorry for the troubles Memex causes.

~~~
h2odragon
Sorry to give you a bad review, and i look forward to seeing the version where
everything works and it just magically does what you want.

My browsers would be firefox on windows. To expand on "intrusive and useless":
I found myself turning off the "respond when i type" setting several times.
the few times i did search for something, it hadn't been "seen" because the
indexing was off again, or terms just were not found.

~~~
BlackForestBoy
> Sorry to give you a bad review

Please don't feel sorry. :) If something does not work then we deserve those
things to be said, also publicly. It's what after all helps us to improve and
keeps us accountable.

> "respond when i type" setting several times.

I am confused about that setting. Not aware we have that anywhere? Can you
explain a bit more what you mean by that?

> the indexing was off again

That can be, we had some issues with FF permissions for things to be copied to
clipboard. We had to manually add that permission to the install process (only
for FF). And I missed doing that a few times. (sorry about that). We are about
to automate that process in the next releases so that does not happen anymore.
I assume what happened is that the extension was turned off because on update
it required some of the permissions to be granted again.

~~~
h2odragon
it has been a month or more since i turned it off; so i can't be more
specific; but it was popping the sidebar up in circumstances where i didnt
want it, and i recall chasing down a setting that stopped it.

Your explanation of permissions sounds like a very likely cause for most of
the things that annoyed me :) Text search is hard, I know that from mere
experiments and demos; adding the "be a browser plugin" to that must make life
fun indeed.

~~~
BlackForestBoy
> but it was popping the sidebar up in circumstances where i didnt want it

Ok so the core issue was that the sidebar was popping up in places you didn't
want to? Or was it also that the keyboard shortcuts were firing the sidebar to
appear in places you didn't want?

> Text search is hard, I know that from mere experiments and demos; adding the
> "be a browser plugin" to that must make life fun indeed.

Yes, it was/is quite painful, especially in the browser. With its changing
APIs, and websites constantly changing and using different technologies, makes
browsers a very hostile environment to work with. We can't even do test
releases with the chrome/firefox store. So we have to roll out all updates to
all users at once. Crazy.

Before we focused on building something to show how our vision could work at
the expense of stability. It was in a lot of ways not a good move because we
probably could have been financially independent with a little less features
that were more stable and better worked out. However our approach also allowed
us to show the vision and get bigger amounts of funding. Now is the time to
make the current feature set nice to use and people feel like it is worth
financially supporting.

------
BenGosub
I have been looking for a cross-browser solution that will allow me to
bookmark, tag, highlight and group URLs, I have even been contemplating on
writing my own solution.

I will give this a spin, on paper it looks like what I need.

~~~
njovin
I fondly remember del.icio.us which was an amazingly simple tag-based
bookmarking tool.

I believe it shut down shortly after being acquired by Yahoo and appears inop
today. There may be a decent alternative out there but my system for
bookmarking things in that way died with their service, sadly.

~~~
BlackForestBoy
Oliver here from the Memex team.

glad you bring that up :) After we are finished with the current work on
making Memex more stable and user friendly, we are working on shareable
collections. With that you can share lists of websites, papers, notes and
annotations with your peers, or co-curate those collections together.

~~~
ddorian43
Any connection to darpa memex?

~~~
BlackForestBoy
No that is an entirely different beast :)

Both projects took the inspiration for the name from the same thing though:
[https://en.wikipedia.org/wiki/Memex](https://en.wikipedia.org/wiki/Memex)

------
stelonix
I've been wanting this for Firefox since Opera exploded. Nice!

It also amazes me this kind of "store everything [locally] so life is easier"
mindset isn't more common in today's world of extensions. It's one of the few
things that can make the computer work _for you_ instead of you working in a
way _the computer_ understands.

~~~
steveeq1
opera exploded?

~~~
stelonix
By "exploded" I meant when Opera ditched Presto in favor of Blink, becoming
basically a Chromium shell.

------
journalctl
Analytics should be opt-in, not opt-out.

~~~
ComodoHacker
Opt-in analytics generally don't work.

~~~
TeMPOraL
Asking people to do things without a threat or weapon to coerce them with
often doesn't work either.

Just because something doesn't work well when done right, doesn't mean it
should be done wrong. There are ways to gather data necessary to inform
product decisions without surveillance, or turning your users into non-
consenting test subjects.

(EDIT: Nevertheless, I'm very happy about the values they describe and that
they chose to store the data locally. It makes me trust the authors that much
more, and conversely, if it wasn't local, I wouldn't even consider using it.)

------
disqard
This is definitely a step in the right direction! Ideally, I’d like to have
something that I can set up on my phone and desktop(s), so I can have an
aggregate search over all the things I visited, and on whatever device. Is
there a way (architecturally) to make that happen, without having to rely on
(say) using one browser on every device?

~~~
kick
Memex is a WebExtension, so it works on all browsers that support that API,
which is most of them.

But there are many ways you can do that architecturally.

Here's one off the top of my head that might work:

1\. Set up a DNS server on a home server.

2\. Run a cron job to automatically wget every URL checked by the DNS server.

3\. Set all of your devices to your DNS server.

4\. Set up a local installation of one of the thousands of open source search
engines and set it to go through all of your pages once an hour.

5\. Enjoy.

~~~
pythux
Doesn’t DNS server only see the hostname instead of the URL? The approach
would allow to only scrap home pages then I guess. Also, what about pages
behind login?

~~~
kick
Correct for the first two, I hadn't thought about that. Could be done by
uploading history once every hour but that's annoying. Pages behind a login
aren't ideal for this under any approach because they tend to be dynamic.

------
cr0sh
I haven't looked over the whole site, but I find it curious that this wasn't
mentioned here (or possibly there):

[https://en.wikipedia.org/wiki/As_We_May_Think](https://en.wikipedia.org/wiki/As_We_May_Think)

Because it seems this is yet (another) iteration or attempt on the device Bush
described?

Really, hypertext, HTML, the web, browsers, etc - are all a part of such a
system, but what is needed to really complete the loop - so to speak - is for
everything to be P2P - so anyone can easily create and share information
(hypertext "documents" \- whatever a "document" may be) and share contextual
annotations (highlights and notes) on those pages, with maybe some manner to
incorporate them into the main "text" as needed when they become too
cumbersome outside the main text, along with easy P2P indexing and searching
of the collective corpus of "works".

But this is completely at odds with how the current "web ecosystem" works, and
certainly doesn't appear to be how this particular incarnation works (this
seems to be a siloed system, if anything). Today's system takes control away
from the users for the most part, unless they pay extra (ie - to create and
maintain a server or whatnot for their own personal content - or not pay, but
pay by giving away other information that the company giving them "free
hosting" or whatnot can use). A complete P2P system would upend that model,
assuming it could be made to work well (the asymmetry of the broadband
infrastructure for consumers doesn't help, either).

I know there are more than a few P2P competitors out there that do much of
what I am speaking of, but I don't think any of them do the complete "Bush
Memex" system. Then again, maybe that's a good thing - otherwise we might end
up with an unholy combo of wikipedia coupled with 4/8chan with a dab of reddit
or something like that...?

~~~
dredmorbius
Memex was Bush's term. The project's name is an obvious nod, directly
referenced in the README.

------
gpderetta
Random factoid of the day: A fictional memex machine is featured prominently
in cstross laundry files.

------
haddr
I was always looking forward to something like that, since at least 15 years.
Today's minds are shaped by what we read online and this is our source of
knowledge on many topics (apart from books and other media), so why not
organizing it? But then I noticed that this would add another source of data
to be managed, evolved, etc, and today I'm more thinking that it's more about
being organized as an individual rather than simply relying on just another
tool for that.

Anyway I will give it a spin! Looks like a good piece of work.

~~~
jplayer01
I'd say the biggest problem with something like this is that it's a silo.
You're suddenly 100% reliant on them providing the right tools and functions
to access your data in the way you want/need. And if your needs/preferences
change, then you're entirely reliant on whether the silo has foreseen the new
use case.

I already have an existing knowledge base (that not only consists of webpages,
but also org files, videos, pdf's, etc.) that's accessible and synchronized
across multiple devices - and while having a complete searchable history of
all my browsing would be fantastic, there's no way to integrate it into my
system (or any other system) with my own tools.

~~~
BlackForestBoy
Oliver here from the Memex team.

Yes! That is indeed a major problem with all knowledge management tools. They
tend to not be interoperable enough so you can easily integrate them into your
existing workflows. Also they are not built to be adaptive to the individual
workflows of people, so you have to wait for the dev's priorities to be high
enough to implement your features. You can't do it yourself.

We are also not there yet when it comes to the level of interoperability or
flexibility needed. However we started from a fundamentally different angle by
changing our economic model and not taking venture capital money:
[https://community.worldbrain.io/t/why-worldbrain-io-does-
not...](https://community.worldbrain.io/t/why-worldbrain-io-does-not-take-
venture-capital/75)

In essence what we want to achieve is that you can copy/fork Memex, adapt it
to your needs and still use your old data and social connections. Once that
transition is complete you'll be able to even use 2 different Memex tools at
the same time, both maybe serving different use cases for you.

May I ask what tools you use and how Memex in your ideal world would integrate
them? What is the workflow you'd like to implement?

~~~
jplayer01
After experimenting with everything under the sun from Evernote to OneNote to
TiddlyWiki and everything inbetween, I’ve settled on plain and simple files in
a deep folder structure. The whole folder (now 80GB in size) is permanently
kept in sync with SyncThing across my Android, laptop and desktop.

Using normal files allows me to store anything I need, whether it’s webpages
as html files saved with SingleFile (FF extension), videos downloaded from
YouTube, notes made with emacs orgmode, podcast MP3’s, eBook PDF's, etc.

Folders are deeply nested according to field/topic, and I have a git repo that
ignores all non-org and non-html files. This lets me use ripgrep or emacs Helm
to immediately search text for whatever I’m looking for. z allows me to
traverse the tree without double-clicking through a deep tree of directories
or cd'ing and typing crazy amounts.

So tools can be anything - Firefox with extensions, ripgrep, emacs, vim, git,
z, or even whatever Python script I write to fill a unique use case that I
discover for something that feels tedious. Normal files mean that if I find a
cool program that does something useful with files, I can easily integrate
that. I'm also working on ways to give me easier/quicker access to the
metadata like the most recent files of a subtopic, or even add my own metadata
like ratings and tags.

Ideally, something like Memex would provide some sort of api from which I
could automatically query for all the browsing/history and text data, so I
could potentially add it to my knowledge base in some way. Or maybe if Memex
synced automatically to a DB file or some other file(s) that are well-
documented that I could easily access, parse and sync.

~~~
kevinslin
reading your comment, I feel a sense of dejavu because its almost the same as
the response I would have written. I've gone through a similar path of trying
every knowledge base under the sun and settling on a text based, git managed
custom knowledge base with plain text files (currently managing +10k notes
this way).

currently building a service that can index and query across text based
knowledge bases. you can find demo here:
[http://demo.alphacortex.io](http://demo.alphacortex.io)

would love to hear your thoughts and talk further about organizing knowledge
:)

------
nmstoker
Looks like there's huge potential value with a tool like this.

Does it have a way to handle page content that's less relevant? For instance
plenty of genuinely useful articles have footers full of clickbait ads &
comments - whilst trying to avoid such content, sometimes the article you need
is only available with that dross tacked on the end, and yet ideally users
wouldn't want it polluting search results (unless they really were looking for
"that amazing trick that only grandmothers know"!)

~~~
BlackForestBoy
> Does it have a way to handle page content that's less relevant?

It already collects some basic interaction data like visit frequency, stay
time and scroll %. This data could be used to clean the db a bit.

Other than that there is definitely room to improve to clean out the terms
that are captured but not really add value. Its a difficult task though
because every page is so differently structured.

What ideas do you have to reduce the number of unhelpful terms? A spontaneous
one is to detect the footers of this OutBrain et. all crap and remove them
from the HTML before filtering out the words to index.

Right now we are focussing on developing a stable service with better UX and
that does the current feature set really well. I'll take up your suggestion so
we can think about how we can use them in the upcoming overhaul of the search.

Thanks for your input!

------
solarkraft
I have been using it for a few months, maybe a year. Maybe I'm doing it wrong,
but I can't seem to get any benefit out of it.

Texts highlights are a pain to use (so I don't, even though I love
highlighting text while reading), the search doesn't find anything despite the
keyword being in the title of the page and all the context menu and side bar
are doing are unexpectedly hijacking my clicks.

I really hoped for more and still believe that it has the potential to be
useful. I hope one day it will be.

~~~
BlackForestBoy
Oliver here from the Memex team.

A bummer, sorry you experience troubles.

> Texts highlights are a pain to use (so I don't, even though I love
> highlighting text while reading)

Why are they a pain to use for you? What would be 1-2 things we could improve
that would make it a lot better?

> the search doesn't find anything despite the keyword being in the title of
> the page

It might be that you have some indexing settings changed. Usually the page is
only indexed after visiting it for at least 5 seconds, but you can change that
setting

> all the context menu and side bar are doing are unexpectedly hijacking my
> clicks.

On which pages does this happen to you?

Good news: We finally got a bigger round of funding and are now able to fix
those. Until February our main focus is improving release stability through
fully integration testing the UI and backend, fixing all major bugs,
improvements to UI/UX and the release of the mobile Apps+Sync.

~~~
solarkraft
Highlighting:

I think it used to be worse, with lots of glitches, but I'm glad those now
seem to be gone. I'm still annoyed by the multi-step process to highlight
something, however. Select something -> Click "annotate", sidebar opens ->
Find the annotation you just created -> Save it without typing in any text
(remember to save it, because there's no auto-saving for some reason!). It
should have been done with "Click 'highlight'". Since clicking the highlight
still expands it in the side bar no funcionality would be lost by modifying
the annotation feature to directly just save a highlight (If you still don't
know want I would like to see, have a look at Medium's implementation).

Search:

Thanks. After testing it out again half-heartedly, it seems to find things.
However it's just frustrating when you want to find back to an obscure site
you have been on weeks ago, Firefox history is comically useless and you think
"Oh, nice thing I have Memex installed" and then it _doesn 't_ find it, even
though I remember that being the promise.

Click jacking:

When I want to open a context menu on some text using my finger I have to long
press on it, but the degraded accuracy can lead to the click landing on the
Memex menu (admittedly this seems hard to fix). Trying to reach far out page
elements can lead to the side bar obtruding it. Both of these cases are very
annoying.

I don't generally oppose the popup: It would pretty convenient for quick
highlighting if highlighting was actually quick. But then again highlighting
is _not_ very quick and the "share highlight" feature, (that is very cool) is
not only of much rarer use, but also extremely confusingly implemented. Why do
you need to be able to share a highlight before it has been created? Why is it
not an option for an actually created highlight? Why does it not show me the
link, but try to copy it to my clipboard, which doesn't even work on my
machine (Manjaro/KDE/X11/Firefox)?

I like that there is a close-button on the popup, but I do find it way too
obtrusive. It's not a feature a normal user (or at least I) will want often.
That it's shown at the same priority of the other items confuses me hard, same
with the options in the side bar. I think I will disable it, since the icon
does all the things I need.

Why are there collections AND tags? Don't they do the exact same thing?

Why is there no option for a denser page listing so I can see more than 6
pages, (which would be good while trying to find something, and why else would
one open it)? Why can I only have it sorted chronologically, not by number of
visits? Can I even see that anywhere? How about length of visit (do you record
such things?)? Minor UI addition (hey. seems like the place for that): The
opening of the pop-up looks pretty jarring. A more fluid animation would be
nice :-)

These are mostly complaints, but I'm glad you're working on this project.

I'm also really happy that it seems to be continually improving. I wish you
lots of success going forward!

~~~
BlackForestBoy
> These are mostly complaints, but I'm glad you're working on this project.

Wow. Thanks so so so much for taking the time to write this all out. This has
been super useful.

I added all your feedbacks to our UX/UI/bug prioritisation board. Some of the
things we had already on the radar (like improving the way annotating works)
and will be improved very soon.

------
month13
Great idea, implementation is a bit intrusive. Not a fan of the by default
sidebar ribbon or the highlight popup menu. Luckily this can be disabled.

Importing history and indexing that is cool.

~~~
BlackForestBoy
Oliver here, from the Memex team.

Thanks for the feedback. Except it being able to be turned off, how could we
make the implementation less intrusive for you?

------
rane
Installed the extension and added some pages, but it's clearly not indexing
content of the pages because there are no hits for words that appear on those
pages.

~~~
BlackForestBoy
Oliver here, from the Memex team

By default that setting is disabled, but you can change to full indexing in
the onboarding or in the settings.

~~~
rane
Thanks, I found it.

The label for the setting is unclear.

> [ ] Visited for at least n seconds

Given that the setting above it says "Make title and URL always searchable
(recommended)", I think the setting for this option also should be something
like:

> [ ] Make pages visited for at least n seconds full-text searchable

It's difficult to connect the grey text that says "Which websites do you want
to make full-text searchable?" to the label of the option that controls full-
text searching.

context: [https://i.imgur.com/2p7OyHJ.png](https://i.imgur.com/2p7OyHJ.png)

~~~
BlackForestBoy
Great feedback. Will incorporate it in the upcoming UX/UI updates. Thanks for
the effort to make us aware.

------
NoGravitas
This looks interesting. I wrote something quite similar back around 2000 as a
proxy; today, of course, you'd have to implement it as they have, as a browser
extension, because of https. It was a fun little project, and for the feature
set I included (full text search of your history), it was quite easy to
implement. At the time, I even did it with no dependencies that weren't in the
Python standard library.

------
jimktrains2
I was trying to build something similar based on Firefox sync, but it took
quite a while to get a client working, and I never got back to it. (I found
some bugs in their documentation along the way and helped them fix that, so
that's good?)

Maybe I should pick it up again. My intent was that it'd run on a server and
be your own personal search engine covering activity on all devices.

~~~
NoGravitas
I would be interested in helping you on this, if it's in a language I'm
familiar with. Having it integrate with Firefox sync solves the problem of not
being on any one device enough to make it your full history.

~~~
jimktrains2
It's in python3
[https://github.com/jimktrains/ffsyncsearch](https://github.com/jimktrains/ffsyncsearch)
I just pushed some of my last changes and it'll grab and decrypt all of the
collections stored. Like I said, I hadn't got it to a point where I saved and
indexed everything, but my intent was to start with postgresql's full text
search and go from there. I also wouldn't mind some help cleaning up the code
some as it is more proof-of-concept right now.

~~~
NoGravitas
Thanks. I don't have a ton of time to work on it, but I've cloned it and will
look at it.

------
100011
I've wanted this ever since Firefox 2 or so.

------
lmilcin
Years ago I was using UltraRecall for this purpose. It has FF plugin that
allows for quickly bookmarking a page and then within UltraRecall it would
build FT index. It was also possible to do the same from any other browser as
the browser support is to just push the link to the standalone application,
but I never felt need to invest time.

------
ComodoHacker
The idea is promising. I have some questions though.

1\. How do you handle content updates/corrections? Do you update the index on
subsequent visits?

2\. How do you handle fully dynamic pages like Facebook feed, Reddit, HN etc.?

3\. What is performance and storage overhead of indexing "every word of all
websites & PDFs you visited"?

4\. What about i18n, have some plans?

~~~
BlackForestBoy
Oliver here from the Memex team.

1\. Yes the index is updated every time you visit. It appends the new terms,
and keeps all old ones

2\. That is a bit more difficult and not as reliable unfortunately. If there
is a lazy load on the page it often fails to capture the content, because it
starts indexing the page after the initial page load is finished/successful.
These are improvements we want to work on a bit later.

3\. For about a year's worth of history(~20k pages), without also capturing
the screenshots, it needs about 400mb of storage. Indexing performance is
still good with 20-25k pages but querying gets slower. So you won't feel the
performance on your system with a reasonably fast computer (recommended 8gb of
ram, and a dual core with at least 2GHz) We are about to work on performance
improvements to make it fast and scalable much beyond that amount and with
less resources.

4\. Unfortunately we didn't get around to optimise the indexing for CKJ
characters, but all latin characters should work fine.

~~~
ComodoHacker
Thank you.

On 2, I mean these (feeds, topic lists etc.) are probably not worth indexing
at all, esp. since you keep all old content in the index.

Is your code all home-built, or you're using some FT engine compiled to Wasm?

~~~
BlackForestBoy
> On 2, I mean these (feeds, topic lists etc.) are probably not worth indexing
> at all, esp. since you keep all old content in the index.

Yeah indeed. A better implementation for now would be to let people save
single posts, or sync with likes, shares, tweets and retweets, so people can
search with those facettes.

> Is your code all home-built, or you're using some FT engine compiled to
> Wasm?

For the search we are using dexie.js, the rest is home-built. Our storage
engine is:
[https://github.com/worldbrain/storex](https://github.com/worldbrain/storex)

------
NelsonMinar
I feel like this idea pops up every two years or so; I was definitely running
something like it in the early 2000s. How come none of the previous efforts
succeeded?

Firefox's built in history is so, so close to being useful. My impression is
it's hampered mostly by limited UI.

------
jborichevskiy
Been using this for a week or so and am liking it so far. I’ve used full text
search extensions before but this feels less opaque and more usable. Looking
forward to continued development!

------
oknoorap
Sorry to bother you, in Indonesia, MEMEX means female genital.

~~~
alfanhui
[https://en.m.wikipedia.org/wiki/Memex](https://en.m.wikipedia.org/wiki/Memex)

Theres no bother, since its a different culture and words are bound to mean
something else.

------
playpause
The intro panel on the homepage (worldbrain.io) is dark grey text on a dark
grey background. Is that intentional or a bug?

------
tmikaeld
This is really nice, now I can get rid of HistorySearch which does the same
thing but stores everything on the cloud.

------
techntoke
Doesn't the search bar already have the option to search your history and
bookmarks?

~~~
tmikaeld
This one seems to index also the content of pages.

~~~
techntoke
Ah. May make sense for some people. For indexing, I personally would prefer to
use a bookmark service (self-hosted) that handles this.

~~~
tmikaeld
Oh, there's a self-hosted one? Any link?

~~~
tmikaeld
Nice, too bad the organization part is card-based, would have loved a
drag/drop interface like with Bookmarkninja [1]

[https://www.bookmarkninja.com/images/dashboard3-650.png](https://www.bookmarkninja.com/images/dashboard3-650.png)

~~~
BlackForestBoy
Oliver here, from the Memex team.

are you referring to Memex being card based? In which part of Memex would you
like to have that drag&drop ability to organise stuff?

~~~
tmikaeld
For organizing collections, that would be awesome!

~~~
BlackForestBoy
How would you like to organise it? Like in your screenshot? Is it the ordering
inside a collection that is important to you? Would it solve it if you were
able to order items inside a collection?

------
foobar_
How does this implement full-text search?

------
sneak
I have wanted this for years! Thank you!

------
olliej
Safari has this built in, I’m surprised chrome and Firefox don’t

~~~
nloa
What do you mean by "built in"?

