Hacker News new | past | comments | ask | show | jobs | submit login
Memex: Browser Extension to full-text search your browsing history and bookmarks (github.com)
288 points by kick 46 days ago | hide | past | web | favorite | 95 comments



This looks cool. It reminds me that someone did a lightning talk at RailsConf 2018 about a Memex app they built and it was pretty mindblowing to me at the time:

https://youtu.be/Ld414EypQzg?t=2709


Tried using this for months; it never worked well. Settings changed back to "intrusive & useless" every update (seemed like) and the few times i needed it -- it hadn't been indexing anything for the past week, sorry.

Hope it's working better now


Oliver here, from the Memex team.

Yeah we know it has not been really well functioning in the past few months. Had a lot of bugs.

We finally got a bigger round of funding and are now able to fix those. Until February our main focus is improving release stability through fully integration testing the UI and backend, fixing all major bugs, improvements to UI/UX and the release of the mobile Apps+Sync.

Too bad this post has been upvoted so much now, a few weeks later things would have already been much much better :)

Regarding the bugs you mention I would like to know more if you could help me out here: 1. Which browser and system are you working on? 2. When you say the settings changed back to "intrusive & useless", what do you mean by that? 3. When you say "it never worked well" what were you expecting to be working but it didnt?

Thanks and sorry for the troubles Memex causes.


Sorry to give you a bad review, and i look forward to seeing the version where everything works and it just magically does what you want.

My browsers would be firefox on windows. To expand on "intrusive and useless": I found myself turning off the "respond when i type" setting several times. the few times i did search for something, it hadn't been "seen" because the indexing was off again, or terms just were not found.


> Sorry to give you a bad review

Please don't feel sorry. :) If something does not work then we deserve those things to be said, also publicly. It's what after all helps us to improve and keeps us accountable.

> "respond when i type" setting several times.

I am confused about that setting. Not aware we have that anywhere? Can you explain a bit more what you mean by that?

> the indexing was off again

That can be, we had some issues with FF permissions for things to be copied to clipboard. We had to manually add that permission to the install process (only for FF). And I missed doing that a few times. (sorry about that). We are about to automate that process in the next releases so that does not happen anymore. I assume what happened is that the extension was turned off because on update it required some of the permissions to be granted again.


it has been a month or more since i turned it off; so i can't be more specific; but it was popping the sidebar up in circumstances where i didnt want it, and i recall chasing down a setting that stopped it.

Your explanation of permissions sounds like a very likely cause for most of the things that annoyed me :) Text search is hard, I know that from mere experiments and demos; adding the "be a browser plugin" to that must make life fun indeed.


> but it was popping the sidebar up in circumstances where i didnt want it

Ok so the core issue was that the sidebar was popping up in places you didn't want to? Or was it also that the keyboard shortcuts were firing the sidebar to appear in places you didn't want?

> Text search is hard, I know that from mere experiments and demos; adding the "be a browser plugin" to that must make life fun indeed.

Yes, it was/is quite painful, especially in the browser. With its changing APIs, and websites constantly changing and using different technologies, makes browsers a very hostile environment to work with. We can't even do test releases with the chrome/firefox store. So we have to roll out all updates to all users at once. Crazy.

Before we focused on building something to show how our vision could work at the expense of stability. It was in a lot of ways not a good move because we probably could have been financially independent with a little less features that were more stable and better worked out. However our approach also allowed us to show the vision and get bigger amounts of funding. Now is the time to make the current feature set nice to use and people feel like it is worth financially supporting.


Indeed, it was awfully buggy a couple of months ago when I gave it a try. Hope to reiterate in a couple of months. It would be a very useful tool.


Same experience here. Ended up settling on app.getpolarized.io which seems to be less "intrusive."


I have been looking for a cross-browser solution that will allow me to bookmark, tag, highlight and group URLs, I have even been contemplating on writing my own solution.

I will give this a spin, on paper it looks like what I need.


Oliver here, from the Memex team.

Memex is not able to do cross-browser/device yet, but we are short before releasing our mobile apps for iOS and Android, as well as syncing between devices.

Enjoy :)


Hi, thanks for answering questions. Does the Firefox extension work on mobile Android? Thanks.


The firefox extension is not officially supported, because it is still very spotty to use. Its missing APIs and was not worth the effort for now to support it with a limited feature set.

Memex App will be a dedicated app that in its MVP works just like Pocket, except that you can also add notes, lists, favorites and lists when you save content.

In the MVP+ you can also search your knowledge on the go and do annotations on the go.


I fondly remember del.icio.us which was an amazingly simple tag-based bookmarking tool.

I believe it shut down shortly after being acquired by Yahoo and appears inop today. There may be a decent alternative out there but my system for bookmarking things in that way died with their service, sadly.


I self-host Shaarli[1] with the 2004licious theme[2], as an emulation of early del.icio.us. It works really well - no database (it saves a single .php file), minimal interface, and it has an Android app. It also has tag and note support. I only wish it supported plain notes; you have to have a URL for each entry.

[1] https://github.com/sebsauvage/Shaarli

[2] https://github.com/Ruin0x11/shaarli-2004licious-theme


Oliver here from the Memex team.

glad you bring that up :) After we are finished with the current work on making Memex more stable and user friendly, we are working on shareable collections. With that you can share lists of websites, papers, notes and annotations with your peers, or co-curate those collections together.


Fantastic, I gave Memex a try and I really like, well done! I am also very much looking forward to the upcoming features, I think that such a bookmarking system is largely needed.


Any connection to darpa memex?


No that is an entirely different beast :)

Both projects took the inspiration for the name from the same thing though: https://en.wikipedia.org/wiki/Memex


delicious was purchased by pinboard.in their system is very similar.


I've been wanting this for Firefox since Opera exploded. Nice!

It also amazes me this kind of "store everything [locally] so life is easier" mindset isn't more common in today's world of extensions. It's one of the few things that can make the computer work for you instead of you working in a way the computer understands.


opera exploded?


By "exploded" I meant when Opera ditched Presto in favor of Blink, becoming basically a Chromium shell.


Analytics should be opt-in, not opt-out.


Opt-in analytics generally don't work.


Asking people to do things without a threat or weapon to coerce them with often doesn't work either.

Just because something doesn't work well when done right, doesn't mean it should be done wrong. There are ways to gather data necessary to inform product decisions without surveillance, or turning your users into non-consenting test subjects.

(EDIT: Nevertheless, I'm very happy about the values they describe and that they chose to store the data locally. It makes me trust the authors that much more, and conversely, if it wasn't local, I wouldn't even consider using it.)


Consider forking it.


Oliver here, from the Memex team

At the moment analytics are turnt off, but in general we only do telemetrical analytics. So we only track the way you interact with the software, like the buttons you click, to optimise user flows. We don't ever send anything that is user generated content, like tags, annotations, visited urls or anything that can be considered your personal knowledge.

More you can see here: http://worldbrain.io/privacy


What is your business model then?


Oliver here from the Memex team.

Yes indeed we will be a regular SaaS business, but with a strong focus on privacy, data ownership and interoperability. More can be found here: https://community.worldbrain.io/t/why-worldbrain-io-does-not...

Initially our services like Sync and Backup will only be available for the premium upgrade, but our infrastructure is already built in a way that allows making it self-hostable too. It's just a matter of resources we have available. We very much value your data sovereignty but we first need to make enough money to be able to continue development and have enough time and money to make it self-hostable. It's all a bit more difficult without raising VC capital (see link above).


Even the most basic of cursory views of their homepage would lead you to https://worldbrain.io/pricing/.

They charge for synchronization and backups, with no doubt more to come.


Fair enough, you're right.


This is definitely a step in the right direction! Ideally, I’d like to have something that I can set up on my phone and desktop(s), so I can have an aggregate search over all the things I visited, and on whatever device. Is there a way (architecturally) to make that happen, without having to rely on (say) using one browser on every device?


Memex is a WebExtension, so it works on all browsers that support that API, which is most of them.

But there are many ways you can do that architecturally.

Here's one off the top of my head that might work:

1. Set up a DNS server on a home server.

2. Run a cron job to automatically wget every URL checked by the DNS server.

3. Set all of your devices to your DNS server.

4. Set up a local installation of one of the thousands of open source search engines and set it to go through all of your pages once an hour.

5. Enjoy.


Doesn’t DNS server only see the hostname instead of the URL? The approach would allow to only scrap home pages then I guess. Also, what about pages behind login?


Correct for the first two, I hadn't thought about that. Could be done by uploading history once every hour but that's annoying. Pages behind a login aren't ideal for this under any approach because they tend to be dynamic.


you can only get domains out of DNS though not URLs


Oliver here, from the Memex team.

Memex is not able to do cross-browser/device yet, but we are short before releasing our mobile apps for iOS and Android, as well as syncing between devices. Both the extension and mobile app will be offline first and sync happens with an end2end encryption via a relay server that deletes update message after every device has been picking it up.


I haven't looked over the whole site, but I find it curious that this wasn't mentioned here (or possibly there):

https://en.wikipedia.org/wiki/As_We_May_Think

Because it seems this is yet (another) iteration or attempt on the device Bush described?

Really, hypertext, HTML, the web, browsers, etc - are all a part of such a system, but what is needed to really complete the loop - so to speak - is for everything to be P2P - so anyone can easily create and share information (hypertext "documents" - whatever a "document" may be) and share contextual annotations (highlights and notes) on those pages, with maybe some manner to incorporate them into the main "text" as needed when they become too cumbersome outside the main text, along with easy P2P indexing and searching of the collective corpus of "works".

But this is completely at odds with how the current "web ecosystem" works, and certainly doesn't appear to be how this particular incarnation works (this seems to be a siloed system, if anything). Today's system takes control away from the users for the most part, unless they pay extra (ie - to create and maintain a server or whatnot for their own personal content - or not pay, but pay by giving away other information that the company giving them "free hosting" or whatnot can use). A complete P2P system would upend that model, assuming it could be made to work well (the asymmetry of the broadband infrastructure for consumers doesn't help, either).

I know there are more than a few P2P competitors out there that do much of what I am speaking of, but I don't think any of them do the complete "Bush Memex" system. Then again, maybe that's a good thing - otherwise we might end up with an unholy combo of wikipedia coupled with 4/8chan with a dab of reddit or something like that...?


Memex was Bush's term. The project's name is an obvious nod, directly referenced in the README.


Oliver here from the Memex team.

Thanks for adding so much input to the conversation. :)

Let me address a few points you made:

> A complete P2P system would upend that model, assuming it could be made to work well (the asymmetry of the broadband infrastructure for consumers doesn't help, either).

There have been a few decisions we had to make in the past in order to provide a non-technical user friendly, scalable, affordable and privacy focused product.

One option would be to use newer p2p technologies like IPFS(https://github.com/ipfs/) or Dat(http://datproject.org). Those would provide high decentralisation but have a couple of drawbacks that make it not suitable for our use case. Indeed they are difficult to make work well. 1) The technologies are not ready yet. They all still have significant performance and scalability issues which won't be solved in the next 1-2 years (optimistically) 2) They are unsuitable for private data as they shard your very personal knowledge across nodes you don't know. At least that is how they now work. Private networks are planned but it is still a far way to go. 3) They are not suitable for non-technical users yet

Next option are blockchains. Let's not talk about that :P But seriously who wants to store their personal data like a history on an immutable ledger. Nope not gonna happen.

Next option is using a p2p sync via WebRTC, which we actually do use. Our servers are only there to offer a relay service to pass your message for asynchronous syncing and signalling for synchronous syncing and to punch through your firewalls. The sync messages are end2end encrypted and deleted from our servers when all devices have picked up the message. This approach offers the ability to be much cheaper than whats out there because we don't have to store the data on a cloud constantly.

> Today's system takes control away from the users for the most part, unless they pay extra (ie - to create and maintain a server or whatnot for their own personal content - or not pay, or not pay by giving away other information that the company giving them "free hosting" or whatnot can use).

For both p2p or cloud infrastructure, you won't get around servers. Someone needs to pay for that too. (and we are not even including the development costs for the software). Even if you would use a full p2p system like IPFS and dat, once they become more common there is a need for infrastructure someone maintains. It's probably your ISP that then starts charging more. There is no free lunch. In the end it begs the question if it it so bad that you have to pay a bit for services that really make your life better? I don't think thats a bad thing. We got so used to getting things for free without valuing how they contribute to our lives. Except of course you give away your data, in return for those free services in (implicit) exchange for data, which is not an option for us. We won't rule out that there might be a very consensual relationship between users and us to share data and do some amazing stuff with it, but also let them participate in the fruits. But that is definitely not the default like on most other services. By default your data is always yours.

In our case we will initially offer the syncing service for a premium subscription. The code is all there though, so it can be made self hostable. We either need a committed group of contributors taking this in their hands, or need to make the money first to have the capacity to do that. Either way it's unlikely going to happen immediately, but we definitely want to see it.

> (this seems to be a siloed system, if anything).

However most importantly is to express one of our core values: We think optimising for interoperability is far far more important than decentralisation. We believe if users have the ability to easily move between different providers/silos of a software and take their data and social graph with them a lot of the trust issue we experience today, and hope decentralisation helps, would be solved. If users can migrate to more ethical and privacy focused services easily, it would put an incentive on ethical and trustworthy behaviour and would be able to still use the many advantages of centralised systems (iteration speed, cost efficiency, development convenience, performance).

To protect your privacy, data ownership and freedom to move, we already invested considerable effort.

1) We focus on the software being offline first. This had quite some challenges, among others to get search and storage performant enough in the browser.

2) We build a database and storage layer that will turn into an interoperable datastore for knowledge data that gives you full control over your data (https://github.com/worldbrain/storex). Memex will turn into a light client so you can copy/fork it, adapt it to your needs and use the same database. So you don't even have to migrate anymore.

3) We have a completely different approach to set the economic incentives in our company. We don't raise venture capital so 1) we don't create incentives to lock you into our service and 2) provide you free services that exploit your privacy for the sake of growth: To raise capital we use a model called Steward Ownership, that aligns with the incentives for interoperability. We did that because we believe there needs to be an ecosystem of many "memex"-like tools developed by other people that interoperably work together. More you can read here: https://community.worldbrain.io/t/why-worldbrain-io-does-not... It took us almost 2 years to find money from investors because not taking venture capital was so fundamentally important for our vision. So we had to refuse a lot: https://community.worldbrain.io/t/how-worldbrain-io-tries-to...

I hope that answers many of your questions, and likely spurs many more. Happy to answer them :)


Random factoid of the day: A fictional memex machine is featured prominently in cstross laundry files.


I was always looking forward to something like that, since at least 15 years. Today's minds are shaped by what we read online and this is our source of knowledge on many topics (apart from books and other media), so why not organizing it? But then I noticed that this would add another source of data to be managed, evolved, etc, and today I'm more thinking that it's more about being organized as an individual rather than simply relying on just another tool for that.

Anyway I will give it a spin! Looks like a good piece of work.


I'd say the biggest problem with something like this is that it's a silo. You're suddenly 100% reliant on them providing the right tools and functions to access your data in the way you want/need. And if your needs/preferences change, then you're entirely reliant on whether the silo has foreseen the new use case.

I already have an existing knowledge base (that not only consists of webpages, but also org files, videos, pdf's, etc.) that's accessible and synchronized across multiple devices - and while having a complete searchable history of all my browsing would be fantastic, there's no way to integrate it into my system (or any other system) with my own tools.


Oliver here from the Memex team.

Yes! That is indeed a major problem with all knowledge management tools. They tend to not be interoperable enough so you can easily integrate them into your existing workflows. Also they are not built to be adaptive to the individual workflows of people, so you have to wait for the dev's priorities to be high enough to implement your features. You can't do it yourself.

We are also not there yet when it comes to the level of interoperability or flexibility needed. However we started from a fundamentally different angle by changing our economic model and not taking venture capital money: https://community.worldbrain.io/t/why-worldbrain-io-does-not...

In essence what we want to achieve is that you can copy/fork Memex, adapt it to your needs and still use your old data and social connections. Once that transition is complete you'll be able to even use 2 different Memex tools at the same time, both maybe serving different use cases for you.

May I ask what tools you use and how Memex in your ideal world would integrate them? What is the workflow you'd like to implement?


After experimenting with everything under the sun from Evernote to OneNote to TiddlyWiki and everything inbetween, I’ve settled on plain and simple files in a deep folder structure. The whole folder (now 80GB in size) is permanently kept in sync with SyncThing across my Android, laptop and desktop.

Using normal files allows me to store anything I need, whether it’s webpages as html files saved with SingleFile (FF extension), videos downloaded from YouTube, notes made with emacs orgmode, podcast MP3’s, eBook PDF's, etc.

Folders are deeply nested according to field/topic, and I have a git repo that ignores all non-org and non-html files. This lets me use ripgrep or emacs Helm to immediately search text for whatever I’m looking for. z allows me to traverse the tree without double-clicking through a deep tree of directories or cd'ing and typing crazy amounts.

So tools can be anything - Firefox with extensions, ripgrep, emacs, vim, git, z, or even whatever Python script I write to fill a unique use case that I discover for something that feels tedious. Normal files mean that if I find a cool program that does something useful with files, I can easily integrate that. I'm also working on ways to give me easier/quicker access to the metadata like the most recent files of a subtopic, or even add my own metadata like ratings and tags.

Ideally, something like Memex would provide some sort of api from which I could automatically query for all the browsing/history and text data, so I could potentially add it to my knowledge base in some way. Or maybe if Memex synced automatically to a DB file or some other file(s) that are well-documented that I could easily access, parse and sync.


reading your comment, I feel a sense of dejavu because its almost the same as the response I would have written. I've gone through a similar path of trying every knowledge base under the sun and settling on a text based, git managed custom knowledge base with plain text files (currently managing +10k notes this way).

currently building a service that can index and query across text based knowledge bases. you can find demo here: http://demo.alphacortex.io

would love to hear your thoughts and talk further about organizing knowledge :)


Oliver here, from the Memex team

Yes indeed, there will be no tool that can help you organise yourself without you as an individual being at least a bit organised.

We hope we can make it a lot easier to not have the FOMO and work to save everything you MIGHT wanna find again. Because people remember things by the way the interacted with them, we try to add more and more "associative" queries so you can search with stuff like "things I liked/shared/retweeted" or "sent to a friend on telegram/Slack/email". This way you at least don't have to manually save everything.

Hope that is the right direction to solve your problems?


Looks like there's huge potential value with a tool like this.

Does it have a way to handle page content that's less relevant? For instance plenty of genuinely useful articles have footers full of clickbait ads & comments - whilst trying to avoid such content, sometimes the article you need is only available with that dross tacked on the end, and yet ideally users wouldn't want it polluting search results (unless they really were looking for "that amazing trick that only grandmothers know"!)


> Does it have a way to handle page content that's less relevant?

It already collects some basic interaction data like visit frequency, stay time and scroll %. This data could be used to clean the db a bit.

Other than that there is definitely room to improve to clean out the terms that are captured but not really add value. Its a difficult task though because every page is so differently structured.

What ideas do you have to reduce the number of unhelpful terms? A spontaneous one is to detect the footers of this OutBrain et. all crap and remove them from the HTML before filtering out the words to index.

Right now we are focussing on developing a stable service with better UX and that does the current feature set really well. I'll take up your suggestion so we can think about how we can use them in the upcoming overhaul of the search.

Thanks for your input!


I have been using it for a few months, maybe a year. Maybe I'm doing it wrong, but I can't seem to get any benefit out of it.

Texts highlights are a pain to use (so I don't, even though I love highlighting text while reading), the search doesn't find anything despite the keyword being in the title of the page and all the context menu and side bar are doing are unexpectedly hijacking my clicks.

I really hoped for more and still believe that it has the potential to be useful. I hope one day it will be.


Oliver here from the Memex team.

A bummer, sorry you experience troubles.

> Texts highlights are a pain to use (so I don't, even though I love highlighting text while reading)

Why are they a pain to use for you? What would be 1-2 things we could improve that would make it a lot better?

> the search doesn't find anything despite the keyword being in the title of the page

It might be that you have some indexing settings changed. Usually the page is only indexed after visiting it for at least 5 seconds, but you can change that setting

> all the context menu and side bar are doing are unexpectedly hijacking my clicks.

On which pages does this happen to you?

Good news: We finally got a bigger round of funding and are now able to fix those. Until February our main focus is improving release stability through fully integration testing the UI and backend, fixing all major bugs, improvements to UI/UX and the release of the mobile Apps+Sync.


Highlighting:

I think it used to be worse, with lots of glitches, but I'm glad those now seem to be gone. I'm still annoyed by the multi-step process to highlight something, however. Select something -> Click "annotate", sidebar opens -> Find the annotation you just created -> Save it without typing in any text (remember to save it, because there's no auto-saving for some reason!). It should have been done with "Click 'highlight'". Since clicking the highlight still expands it in the side bar no funcionality would be lost by modifying the annotation feature to directly just save a highlight (If you still don't know want I would like to see, have a look at Medium's implementation).

Search:

Thanks. After testing it out again half-heartedly, it seems to find things. However it's just frustrating when you want to find back to an obscure site you have been on weeks ago, Firefox history is comically useless and you think "Oh, nice thing I have Memex installed" and then it doesn't find it, even though I remember that being the promise.

Click jacking:

When I want to open a context menu on some text using my finger I have to long press on it, but the degraded accuracy can lead to the click landing on the Memex menu (admittedly this seems hard to fix). Trying to reach far out page elements can lead to the side bar obtruding it. Both of these cases are very annoying.

I don't generally oppose the popup: It would pretty convenient for quick highlighting if highlighting was actually quick. But then again highlighting is not very quick and the "share highlight" feature, (that is very cool) is not only of much rarer use, but also extremely confusingly implemented. Why do you need to be able to share a highlight before it has been created? Why is it not an option for an actually created highlight? Why does it not show me the link, but try to copy it to my clipboard, which doesn't even work on my machine (Manjaro/KDE/X11/Firefox)?

I like that there is a close-button on the popup, but I do find it way too obtrusive. It's not a feature a normal user (or at least I) will want often. That it's shown at the same priority of the other items confuses me hard, same with the options in the side bar. I think I will disable it, since the icon does all the things I need.

Why are there collections AND tags? Don't they do the exact same thing?

Why is there no option for a denser page listing so I can see more than 6 pages, (which would be good while trying to find something, and why else would one open it)? Why can I only have it sorted chronologically, not by number of visits? Can I even see that anywhere? How about length of visit (do you record such things?)? Minor UI addition (hey. seems like the place for that): The opening of the pop-up looks pretty jarring. A more fluid animation would be nice :-)

These are mostly complaints, but I'm glad you're working on this project.

I'm also really happy that it seems to be continually improving. I wish you lots of success going forward!


> These are mostly complaints, but I'm glad you're working on this project.

Wow. Thanks so so so much for taking the time to write this all out. This has been super useful.

I added all your feedbacks to our UX/UI/bug prioritisation board. Some of the things we had already on the radar (like improving the way annotating works) and will be improved very soon.


Great idea, implementation is a bit intrusive. Not a fan of the by default sidebar ribbon or the highlight popup menu. Luckily this can be disabled.

Importing history and indexing that is cool.


Oliver here, from the Memex team.

Thanks for the feedback. Except it being able to be turned off, how could we make the implementation less intrusive for you?


Installed the extension and added some pages, but it's clearly not indexing content of the pages because there are no hits for words that appear on those pages.


Oliver here, from the Memex team

By default that setting is disabled, but you can change to full indexing in the onboarding or in the settings.


Thanks, I found it.

The label for the setting is unclear.

> [ ] Visited for at least n seconds

Given that the setting above it says "Make title and URL always searchable (recommended)", I think the setting for this option also should be something like:

> [ ] Make pages visited for at least n seconds full-text searchable

It's difficult to connect the grey text that says "Which websites do you want to make full-text searchable?" to the label of the option that controls full-text searching.

context: https://i.imgur.com/2p7OyHJ.png


Great feedback. Will incorporate it in the upcoming UX/UI updates. Thanks for the effort to make us aware.


> By default that setting is disabled

Why, might I ask? Your tagline is full text search, but I have to turn it on?

"Full-Text Search your Web History & Bookmarks"

I'd reconsider the landing page hero section design. Don't write with black on darkgray background.

https://imgur.com/a/hgZrerp


> Why, might I ask? Your tagline is full text search, but I have to turn it on?

It was a trial we added in the last onboarding overhaul 2 weeks ago. Will already be changed in the next update.

> I'd reconsider the landing page hero section design. Don't write with black on darkgray background.

Yeah that is a bug from the last (automatic) wordpress update. Will be fixed soon too. Thanks for making us aware. :)


This looks interesting. I wrote something quite similar back around 2000 as a proxy; today, of course, you'd have to implement it as they have, as a browser extension, because of https. It was a fun little project, and for the feature set I included (full text search of your history), it was quite easy to implement. At the time, I even did it with no dependencies that weren't in the Python standard library.


I was trying to build something similar based on Firefox sync, but it took quite a while to get a client working, and I never got back to it. (I found some bugs in their documentation along the way and helped them fix that, so that's good?)

Maybe I should pick it up again. My intent was that it'd run on a server and be your own personal search engine covering activity on all devices.


I would be interested in helping you on this, if it's in a language I'm familiar with. Having it integrate with Firefox sync solves the problem of not being on any one device enough to make it your full history.


It's in python3 https://github.com/jimktrains/ffsyncsearch I just pushed some of my last changes and it'll grab and decrypt all of the collections stored. Like I said, I hadn't got it to a point where I saved and indexed everything, but my intent was to start with postgresql's full text search and go from there. I also wouldn't mind some help cleaning up the code some as it is more proof-of-concept right now.


Thanks. I don't have a ton of time to work on it, but I've cloned it and will look at it.


I've wanted this ever since Firefox 2 or so.


Years ago I was using UltraRecall for this purpose. It has FF plugin that allows for quickly bookmarking a page and then within UltraRecall it would build FT index. It was also possible to do the same from any other browser as the browser support is to just push the link to the standalone application, but I never felt need to invest time.


The idea is promising. I have some questions though.

1. How do you handle content updates/corrections? Do you update the index on subsequent visits?

2. How do you handle fully dynamic pages like Facebook feed, Reddit, HN etc.?

3. What is performance and storage overhead of indexing "every word of all websites & PDFs you visited"?

4. What about i18n, have some plans?


Oliver here from the Memex team.

1. Yes the index is updated every time you visit. It appends the new terms, and keeps all old ones

2. That is a bit more difficult and not as reliable unfortunately. If there is a lazy load on the page it often fails to capture the content, because it starts indexing the page after the initial page load is finished/successful. These are improvements we want to work on a bit later.

3. For about a year's worth of history(~20k pages), without also capturing the screenshots, it needs about 400mb of storage. Indexing performance is still good with 20-25k pages but querying gets slower. So you won't feel the performance on your system with a reasonably fast computer (recommended 8gb of ram, and a dual core with at least 2GHz) We are about to work on performance improvements to make it fast and scalable much beyond that amount and with less resources.

4. Unfortunately we didn't get around to optimise the indexing for CKJ characters, but all latin characters should work fine.


Thank you.

On 2, I mean these (feeds, topic lists etc.) are probably not worth indexing at all, esp. since you keep all old content in the index.

Is your code all home-built, or you're using some FT engine compiled to Wasm?


> On 2, I mean these (feeds, topic lists etc.) are probably not worth indexing at all, esp. since you keep all old content in the index.

Yeah indeed. A better implementation for now would be to let people save single posts, or sync with likes, shares, tweets and retweets, so people can search with those facettes.

> Is your code all home-built, or you're using some FT engine compiled to Wasm?

For the search we are using dexie.js, the rest is home-built. Our storage engine is: https://github.com/worldbrain/storex


I feel like this idea pops up every two years or so; I was definitely running something like it in the early 2000s. How come none of the previous efforts succeeded?

Firefox's built in history is so, so close to being useful. My impression is it's hampered mostly by limited UI.


Been using this for a week or so and am liking it so far. I’ve used full text search extensions before but this feels less opaque and more usable. Looking forward to continued development!


Sorry to bother you, in Indonesia, MEMEX means female genital.


https://en.m.wikipedia.org/wiki/Memex

Theres no bother, since its a different culture and words are bound to mean something else.


Oliver here from the Memex team.

We have been featured a couple of times on HN. This one comes up every time. Gold. <3


Sorry to bother you, but in Indonesia, "X" is spelled "ks"


The intro panel on the homepage (worldbrain.io) is dark grey text on a dark grey background. Is that intentional or a bug?


This is really nice, now I can get rid of HistorySearch which does the same thing but stores everything on the cloud.


Doesn't the search bar already have the option to search your history and bookmarks?


This one seems to index also the content of pages.


Ah. May make sense for some people. For indexing, I personally would prefer to use a bookmark service (self-hosted) that handles this.


Oliver here, from the Memex team.

Up until now all data is stored on your own device, or in your personal cloud with a file system integration (like a Dropbox folder)

We are about to release our mobile apps for iOS and Android, as well as syncing between devices. Both the extension and mobile app will be offline first and sync happens with an end2end encryption via a relay server that deletes all messages after all devices have been picking them up. That server will in the mid-term be self-hostable too.


Oh, there's a self-hosted one? Any link?


Nice, too bad the organization part is card-based, would have loved a drag/drop interface like with Bookmarkninja [1]

https://www.bookmarkninja.com/images/dashboard3-650.png


Oliver here, from the Memex team.

are you referring to Memex being card based? In which part of Memex would you like to have that drag&drop ability to organise stuff?


For organizing collections, that would be awesome!


How would you like to organise it? Like in your screenshot? Is it the ordering inside a collection that is important to you? Would it solve it if you were able to order items inside a collection?



I have wanted this for years! Thank you!


How does this implement full-text search?


Safari has this built in, I’m surprised chrome and Firefox don’t


What do you mean by "built in"?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: