Hope it's working better now
Yeah we know it has not been really well functioning in the past few months. Had a lot of bugs.
We finally got a bigger round of funding and are now able to fix those.
Until February our main focus is improving release stability through fully integration testing the UI and backend, fixing all major bugs, improvements to UI/UX and the release of the mobile Apps+Sync.
Too bad this post has been upvoted so much now, a few weeks later things would have already been much much better :)
Regarding the bugs you mention I would like to know more if you could help me out here:
1. Which browser and system are you working on?
2. When you say the settings changed back to "intrusive & useless", what do you mean by that?
3. When you say "it never worked well" what were you expecting to be working but it didnt?
Thanks and sorry for the troubles Memex causes.
My browsers would be firefox on windows. To expand on "intrusive and useless": I found myself turning off the "respond when i type" setting several times. the few times i did search for something, it hadn't been "seen" because the indexing was off again, or terms just were not found.
Please don't feel sorry. :) If something does not work then we deserve those things to be said, also publicly. It's what after all helps us to improve and keeps us accountable.
> "respond when i type" setting several times.
I am confused about that setting. Not aware we have that anywhere?
Can you explain a bit more what you mean by that?
> the indexing was off again
That can be, we had some issues with FF permissions for things to be copied to clipboard. We had to manually add that permission to the install process (only for FF). And I missed doing that a few times. (sorry about that).
We are about to automate that process in the next releases so that does not happen anymore.
I assume what happened is that the extension was turned off because on update it required some of the permissions to be granted again.
Your explanation of permissions sounds like a very likely cause for most of the things that annoyed me :) Text search is hard, I know that from mere experiments and demos; adding the "be a browser plugin" to that must make life fun indeed.
Ok so the core issue was that the sidebar was popping up in places you didn't want to? Or was it also that the keyboard shortcuts were firing the sidebar to appear in places you didn't want?
> Text search is hard, I know that from mere experiments and demos; adding the "be a browser plugin" to that must make life fun indeed.
Yes, it was/is quite painful, especially in the browser. With its changing APIs, and websites constantly changing and using different technologies, makes browsers a very hostile environment to work with.
We can't even do test releases with the chrome/firefox store. So we have to roll out all updates to all users at once. Crazy.
Before we focused on building something to show how our vision could work at the expense of stability.
It was in a lot of ways not a good move because we probably could have been financially independent with a little less features that were more stable and better worked out. However our approach also allowed us to show the vision and get bigger amounts of funding. Now is the time to make the current feature set nice to use and people feel like it is worth financially supporting.
I will give this a spin, on paper it looks like what I need.
Memex is not able to do cross-browser/device yet, but we are short before releasing our mobile apps for iOS and Android, as well as syncing between devices.
Memex App will be a dedicated app that in its MVP works just like Pocket, except that you can also add notes, lists, favorites and lists when you save content.
In the MVP+ you can also search your knowledge on the go and do annotations on the go.
I believe it shut down shortly after being acquired by Yahoo and appears inop today. There may be a decent alternative out there but my system for bookmarking things in that way died with their service, sadly.
glad you bring that up :)
After we are finished with the current work on making Memex more stable and user friendly, we are working on shareable collections. With that you can share lists of websites, papers, notes and annotations with your peers, or co-curate those collections together.
Both projects took the inspiration for the name from the same thing though:
It also amazes me this kind of "store everything [locally] so life is easier" mindset isn't more common in today's world of extensions. It's one of the few things that can make the computer work for you instead of you working in a way the computer understands.
Just because something doesn't work well when done right, doesn't mean it should be done wrong. There are ways to gather data necessary to inform product decisions without surveillance, or turning your users into non-consenting test subjects.
(EDIT: Nevertheless, I'm very happy about the values they describe and that they chose to store the data locally. It makes me trust the authors that much more, and conversely, if it wasn't local, I wouldn't even consider using it.)
At the moment analytics are turnt off, but in general we only do telemetrical analytics. So we only track the way you interact with the software, like the buttons you click, to optimise user flows. We don't ever send anything that is user generated content, like tags, annotations, visited urls or anything that can be considered your personal knowledge.
More you can see here: http://worldbrain.io/privacy
Yes indeed we will be a regular SaaS business, but with a strong focus on privacy, data ownership and interoperability.
More can be found here: https://community.worldbrain.io/t/why-worldbrain-io-does-not...
Initially our services like Sync and Backup will only be available for the premium upgrade, but our infrastructure is already built in a way that allows making it self-hostable too. It's just a matter of resources we have available. We very much value your data sovereignty but we first need to make enough money to be able to continue development and have enough time and money to make it self-hostable.
It's all a bit more difficult without raising VC capital (see link above).
They charge for synchronization and backups, with no doubt more to come.
But there are many ways you can do that architecturally.
Here's one off the top of my head that might work:
1. Set up a DNS server on a home server.
2. Run a cron job to automatically wget every URL checked by the DNS server.
3. Set all of your devices to your DNS server.
4. Set up a local installation of one of the thousands of open source search engines and set it to go through all of your pages once an hour.
Memex is not able to do cross-browser/device yet, but we are short before releasing our mobile apps for iOS and Android, as well as syncing between devices.
Both the extension and mobile app will be offline first and sync happens with an end2end encryption via a relay server that deletes update message after every device has been picking it up.
Because it seems this is yet (another) iteration or attempt on the device Bush described?
Really, hypertext, HTML, the web, browsers, etc - are all a part of such a system, but what is needed to really complete the loop - so to speak - is for everything to be P2P - so anyone can easily create and share information (hypertext "documents" - whatever a "document" may be) and share contextual annotations (highlights and notes) on those pages, with maybe some manner to incorporate them into the main "text" as needed when they become too cumbersome outside the main text, along with easy P2P indexing and searching of the collective corpus of "works".
But this is completely at odds with how the current "web ecosystem" works, and certainly doesn't appear to be how this particular incarnation works (this seems to be a siloed system, if anything). Today's system takes control away from the users for the most part, unless they pay extra (ie - to create and maintain a server or whatnot for their own personal content - or not pay, but pay by giving away other information that the company giving them "free hosting" or whatnot can use). A complete P2P system would upend that model, assuming it could be made to work well (the asymmetry of the broadband infrastructure for consumers doesn't help, either).
I know there are more than a few P2P competitors out there that do much of what I am speaking of, but I don't think any of them do the complete "Bush Memex" system. Then again, maybe that's a good thing - otherwise we might end up with an unholy combo of wikipedia coupled with 4/8chan with a dab of reddit or something like that...?
Thanks for adding so much input to the conversation. :)
Let me address a few points you made:
> A complete P2P system would upend that model, assuming it could be made to work well (the asymmetry of the broadband infrastructure for consumers doesn't help, either).
There have been a few decisions we had to make in the past in order to provide a non-technical user friendly, scalable, affordable and privacy focused product.
One option would be to use newer p2p technologies like IPFS(https://github.com/ipfs/) or Dat(http://datproject.org). Those would provide high decentralisation but have a couple of drawbacks that make it not suitable for our use case. Indeed they are difficult to make work well.
1) The technologies are not ready yet. They all still have significant performance and scalability issues which won't be solved in the next 1-2 years (optimistically)
2) They are unsuitable for private data as they shard your very personal knowledge across nodes you don't know. At least that is how they now work. Private networks are planned but it is still a far way to go.
3) They are not suitable for non-technical users yet
Next option are blockchains. Let's not talk about that :P But seriously who wants to store their personal data like a history on an immutable ledger. Nope not gonna happen.
Next option is using a p2p sync via WebRTC, which we actually do use. Our servers are only there to offer a relay service to pass your message for asynchronous syncing and signalling for synchronous syncing and to punch through your firewalls. The sync messages are end2end encrypted and deleted from our servers when all devices have picked up the message. This approach offers the ability to be much cheaper than whats out there because we don't have to store the data on a cloud constantly.
> Today's system takes control away from the users for the most part, unless they pay extra (ie - to create and maintain a server or whatnot for their own personal content - or not pay, or not pay by giving away other information that the company giving them "free hosting" or whatnot can use).
For both p2p or cloud infrastructure, you won't get around servers. Someone needs to pay for that too. (and we are not even including the development costs for the software).
Even if you would use a full p2p system like IPFS and dat, once they become more common there is a need for infrastructure someone maintains. It's probably your ISP that then starts charging more. There is no free lunch.
In the end it begs the question if it it so bad that you have to pay a bit for services that really make your life better? I don't think thats a bad thing. We got so used to getting things for free without valuing how they contribute to our lives.
Except of course you give away your data, in return for those free services in (implicit) exchange for data, which is not an option for us. We won't rule out that there might be a very consensual relationship between users and us to share data and do some amazing stuff with it, but also let them participate in the fruits. But that is definitely not the default like on most other services. By default your data is always yours.
In our case we will initially offer the syncing service for a premium subscription. The code is all there though, so it can be made self hostable. We either need a committed group of contributors taking this in their hands, or need to make the money first to have the capacity to do that. Either way it's unlikely going to happen immediately, but we definitely want to see it.
> (this seems to be a siloed system, if anything).
However most importantly is to express one of our core values: We think optimising for interoperability is far far more important than decentralisation. We believe if users have the ability to easily move between different providers/silos of a software and take their data and social graph with them a lot of the trust issue we experience today, and hope decentralisation helps, would be solved.
If users can migrate to more ethical and privacy focused services easily, it would put an incentive on ethical and trustworthy behaviour and would be able to still use the many advantages of centralised systems (iteration speed, cost efficiency, development convenience, performance).
To protect your privacy, data ownership and freedom to move, we already invested considerable effort.
1) We focus on the software being offline first. This had quite some challenges, among others to get search and storage performant enough in the browser.
2) We build a database and storage layer that will turn into an interoperable datastore for knowledge data that gives you full control over your data (https://github.com/worldbrain/storex). Memex will turn into a light client so you can copy/fork it, adapt it to your needs and use the same database. So you don't even have to migrate anymore.
3) We have a completely different approach to set the economic incentives in our company. We don't raise venture capital so 1) we don't create incentives to lock you into our service and 2) provide you free services that exploit your privacy for the sake of growth:
To raise capital we use a model called Steward Ownership, that aligns with the incentives for interoperability. We did that because we believe there needs to be an ecosystem of many "memex"-like tools developed by other people that interoperably work together. More you can read here: https://community.worldbrain.io/t/why-worldbrain-io-does-not...
It took us almost 2 years to find money from investors because not taking venture capital was so fundamentally important for our vision. So we had to refuse a lot: https://community.worldbrain.io/t/how-worldbrain-io-tries-to...
I hope that answers many of your questions, and likely spurs many more. Happy to answer them :)
Anyway I will give it a spin! Looks like a good piece of work.
I already have an existing knowledge base (that not only consists of webpages, but also org files, videos, pdf's, etc.) that's accessible and synchronized across multiple devices - and while having a complete searchable history of all my browsing would be fantastic, there's no way to integrate it into my system (or any other system) with my own tools.
Yes! That is indeed a major problem with all knowledge management tools. They tend to not be interoperable enough so you can easily integrate them into your existing workflows.
Also they are not built to be adaptive to the individual workflows of people, so you have to wait for the dev's priorities to be high enough to implement your features. You can't do it yourself.
We are also not there yet when it comes to the level of interoperability or flexibility needed.
However we started from a fundamentally different angle by changing our economic model and not taking venture capital money: https://community.worldbrain.io/t/why-worldbrain-io-does-not...
In essence what we want to achieve is that you can copy/fork Memex, adapt it to your needs and still use your old data and social connections. Once that transition is complete you'll be able to even use 2 different Memex tools at the same time, both maybe serving different use cases for you.
May I ask what tools you use and how Memex in your ideal world would integrate them? What is the workflow you'd like to implement?
Using normal files allows me to store anything I need, whether it’s webpages as html files saved with SingleFile (FF extension), videos downloaded from YouTube, notes made with emacs orgmode, podcast MP3’s, eBook PDF's, etc.
Folders are deeply nested according to field/topic, and I have a git repo that ignores all non-org and non-html files. This lets me use ripgrep or emacs Helm to immediately search text for whatever I’m looking for. z allows me to traverse the tree without double-clicking through a deep tree of directories or cd'ing and typing crazy amounts.
So tools can be anything - Firefox with extensions, ripgrep, emacs, vim, git, z, or even whatever Python script I write to fill a unique use case that I discover for something that feels tedious. Normal files mean that if I find a cool program that does something useful with files, I can easily integrate that. I'm also working on ways to give me easier/quicker access to the metadata like the most recent files of a subtopic, or even add my own metadata like ratings and tags.
Ideally, something like Memex would provide some sort of api from which I could automatically query for all the browsing/history and text data, so I could potentially add it to my knowledge base in some way. Or maybe if Memex synced automatically to a DB file or some other file(s) that are well-documented that I could easily access, parse and sync.
currently building a service that can index and query across text based knowledge bases. you can find demo here: http://demo.alphacortex.io
would love to hear your thoughts and talk further about organizing knowledge :)
Yes indeed, there will be no tool that can help you organise yourself without you as an individual being at least a bit organised.
We hope we can make it a lot easier to not have the FOMO and work to save everything you MIGHT wanna find again.
Because people remember things by the way the interacted with them, we try to add more and more "associative" queries so you can search with stuff like "things I liked/shared/retweeted" or "sent to a friend on telegram/Slack/email". This way you at least don't have to manually save everything.
Hope that is the right direction to solve your problems?
Does it have a way to handle page content that's less relevant? For instance plenty of genuinely useful articles have footers full of clickbait ads & comments - whilst trying to avoid such content, sometimes the article you need is only available with that dross tacked on the end, and yet ideally users wouldn't want it polluting search results (unless they really were looking for "that amazing trick that only grandmothers know"!)
It already collects some basic interaction data like visit frequency, stay time and scroll %. This data could be used to clean the db a bit.
Other than that there is definitely room to improve to clean out the terms that are captured but not really add value.
Its a difficult task though because every page is so differently structured.
What ideas do you have to reduce the number of unhelpful terms?
A spontaneous one is to detect the footers of this OutBrain et. all crap and remove them from the HTML before filtering out the words to index.
Right now we are focussing on developing a stable service with better UX and that does the current feature set really well.
I'll take up your suggestion so we can think about how we can use them in the upcoming overhaul of the search.
Thanks for your input!
Texts highlights are a pain to use (so I don't, even though I love highlighting text while reading), the search doesn't find anything despite the keyword being in the title of the page and all the context menu and side bar are doing are unexpectedly hijacking my clicks.
I really hoped for more and still believe that it has the potential to be useful. I hope one day it will be.
A bummer, sorry you experience troubles.
> Texts highlights are a pain to use (so I don't, even though I love highlighting text while reading)
Why are they a pain to use for you? What would be 1-2 things we could improve that would make it a lot better?
> the search doesn't find anything despite the keyword being in the title of the page
It might be that you have some indexing settings changed. Usually the page is only indexed after visiting it for at least 5 seconds, but you can change that setting
> all the context menu and side bar are doing are unexpectedly hijacking my clicks.
On which pages does this happen to you?
Good news: We finally got a bigger round of funding and are now able to fix those. Until February our main focus is improving release stability through fully integration testing the UI and backend, fixing all major bugs, improvements to UI/UX and the release of the mobile Apps+Sync.
I think it used to be worse, with lots of glitches, but I'm glad those now seem to be gone. I'm still annoyed by the multi-step process to highlight something, however. Select something -> Click "annotate", sidebar opens -> Find the annotation you just created -> Save it without typing in any text (remember to save it, because there's no auto-saving for some reason!).
It should have been done with "Click 'highlight'". Since clicking the highlight still expands it in the side bar no funcionality would be lost by modifying the annotation feature to directly just save a highlight (If you still don't know want I would like to see, have a look at Medium's implementation).
Thanks. After testing it out again half-heartedly, it seems to find things. However it's just frustrating when you want to find back to an obscure site you have been on weeks ago, Firefox history is comically useless and you think "Oh, nice thing I have Memex installed" and then it doesn't find it, even though I remember that being the promise.
When I want to open a context menu on some text using my finger I have to long press on it, but the degraded accuracy can lead to the click landing on the Memex menu (admittedly this seems hard to fix). Trying to reach far out page elements can lead to the side bar obtruding it. Both of these cases are very annoying.
I don't generally oppose the popup: It would pretty convenient for quick highlighting if highlighting was actually quick. But then again highlighting is not very quick and the "share highlight" feature, (that is very cool) is not only of much rarer use, but also extremely confusingly implemented. Why do you need to be able to share a highlight before it has been created? Why is it not an option for an actually created highlight? Why does it not show me the link, but try to copy it to my clipboard, which doesn't even work on my machine (Manjaro/KDE/X11/Firefox)?
I like that there is a close-button on the popup, but I do find it way too obtrusive. It's not a feature a normal user (or at least I) will want often. That it's shown at the same priority of the other items confuses me hard, same with the options in the side bar. I think I will disable it, since the icon does all the things I need.
Why are there collections AND tags? Don't they do the exact same thing?
Why is there no option for a denser page listing so I can see more than 6 pages, (which would be good while trying to find something, and why else would one open it)? Why can I only have it sorted chronologically, not by number of visits? Can I even see that anywhere? How about length of visit (do you record such things?)? Minor UI addition (hey. seems like the place for that): The opening of the pop-up looks pretty jarring. A more fluid animation would be nice :-)
These are mostly complaints, but I'm glad you're working on this project.
I'm also really happy that it seems to be continually improving. I wish you lots of success going forward!
Wow. Thanks so so so much for taking the time to write this all out. This has been super useful.
I added all your feedbacks to our UX/UI/bug prioritisation board. Some of the things we had already on the radar (like improving the way annotating works) and will be improved very soon.
Importing history and indexing that is cool.
Thanks for the feedback. Except it being able to be turned off, how could we make the implementation less intrusive for you?
By default that setting is disabled, but you can change to full indexing in the onboarding or in the settings.
The label for the setting is unclear.
> [ ] Visited for at least n seconds
Given that the setting above it says "Make title and URL always searchable (recommended)", I think the setting for this option also should be something like:
> [ ] Make pages visited for at least n seconds full-text searchable
It's difficult to connect the grey text that says "Which websites do you want to make full-text searchable?" to the label of the option that controls full-text searching.
Why, might I ask? Your tagline is full text search, but I have to turn it on?
"Full-Text Search your Web History & Bookmarks"
I'd reconsider the landing page hero section design. Don't write with black on darkgray background.
It was a trial we added in the last onboarding overhaul 2 weeks ago. Will already be changed in the next update.
> I'd reconsider the landing page hero section design. Don't write with black on darkgray background.
Yeah that is a bug from the last (automatic) wordpress update. Will be fixed soon too.
Thanks for making us aware. :)
Maybe I should pick it up again. My intent was that it'd run on a server and be your own personal search engine covering activity on all devices.
1. How do you handle content updates/corrections? Do you update the index on subsequent visits?
2. How do you handle fully dynamic pages like Facebook feed, Reddit, HN etc.?
3. What is performance and storage overhead of indexing "every word of all websites & PDFs you visited"?
4. What about i18n, have some plans?
1. Yes the index is updated every time you visit. It appends the new terms, and keeps all old ones
2. That is a bit more difficult and not as reliable unfortunately. If there is a lazy load on the page it often fails to capture the content, because it starts indexing the page after the initial page load is finished/successful. These are improvements we want to work on a bit later.
3. For about a year's worth of history(~20k pages), without also capturing the screenshots, it needs about 400mb of storage. Indexing performance is still good with 20-25k pages but querying gets slower. So you won't feel the performance on your system with a reasonably fast computer (recommended 8gb of ram, and a dual core with at least 2GHz) We are about to work on performance improvements to make it fast and scalable much beyond that amount and with less resources.
4. Unfortunately we didn't get around to optimise the indexing for CKJ characters, but all latin characters should work fine.
On 2, I mean these (feeds, topic lists etc.) are probably not worth indexing at all, esp. since you keep all old content in the index.
Is your code all home-built, or you're using some FT engine compiled to Wasm?
Yeah indeed. A better implementation for now would be to let people save single posts, or sync with likes, shares, tweets and retweets, so people can search with those facettes.
> Is your code all home-built, or you're using some FT engine compiled to Wasm?
For the search we are using dexie.js, the rest is home-built.
Our storage engine is: https://github.com/worldbrain/storex
Firefox's built in history is so, so close to being useful. My impression is it's hampered mostly by limited UI.
Theres no bother, since its a different culture and words are bound to mean something else.
We have been featured a couple of times on HN. This one comes up every time. Gold. <3
Up until now all data is stored on your own device, or in your personal cloud with a file system integration (like a Dropbox folder)
We are about to release our mobile apps for iOS and Android, as well as syncing between devices.
Both the extension and mobile app will be offline first and sync happens with an end2end encryption via a relay server that deletes all messages after all devices have been picking them up. That server will in the mid-term be self-hostable too.
are you referring to Memex being card based?
In which part of Memex would you like to have that drag&drop ability to organise stuff?