Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Open-Source Memex – Alternative Approach to Roam/Obsidian (steveliu.co)
188 points by steve1820 on Sept 23, 2020 | hide | past | favorite | 59 comments

This blog is a summary of a fun 1 month adventure I had with Knowledge Management Systems and building a POC that I thought had potential. It was inspired by so many of the products I see on Hacker News.

I’ve open sourced all the code + written down some notes on my insights/ architecture. Just a warning, the code leaves much to be desired as this was a mini project over 2-3 weekends.

At the moment, I have 3 key layers of the application.

The first is Chrome Extension which allows data collection. It tracks all sorts of data such as attention, time on page, scroll/ click and hover behaviour. It then sends this data to the Electron app.

The Electron app receives this data through an API and saves it down locally using PouchDB (data structure is compatible with Apache CouchBD and thus allows easy cloud storage).

Within the Electron app, I use an Express/ NodeJS web server to expose endpoints for functionality with the front end/ receiving data from Chrome Extension.

For the front end I use ReactJS.

If you’re interested, please read the full article! There are lots of videos of the application in action.

Built something like that 5 years ago https://twitter.com/MindDriveCo/status/668518135880916992 Have been working ever since to address many of the issues discussed in this thread and lot more. On the way to turning Engelbart's Conceptual Framework https://www.dougengelbart.org/content/view/138 for Augmenting Human Intellect into a Practical (meta) Framework. "Go Meta Young Man' and join the effort to build a Kernel for Open Collective Intellect on the Decentralized Web. It is not a commercial venture, but wanting to contribute to Web 3 something comparable in impact to Ward Cunningham's Wiki, something that have the potential to trully change the workld as Ward Cunningham idea of the Wiki had.

Primarily Enduring, Open, Co-evolvable, Bootstrapable, metadesignable, promoting edge-user autonomy, uneclosable collaboration and much more


I'll have a read...very interesting!

Forgot to add: It also tracks what you click so it can build relationships between articles/ blogs/ anything else you read on the internet.

For example, if I’m reading an article about big data/ ETL pipelines and then I click on a link from within the article to a resource on machine learning, this relationship should be tracked and digested.

This is neat. I had a similar idea of tracking what we read online in the browser. In addition to just recording time spent and links followed, I would also archive every article read to automatically build up a personal library of the articles themselves. Additionally, light weight note taking on the articles themselves ala hypothe.is

Unfortunately a new baby has drained all my time for such pursuits.

Hope you develop your concept further!

Awesome that we came up with similar ideas!

So right now, the Chrome Extension tracks all link clicks as well so the visualisation actually builds relationships between articles/ blogs/ anything else you read on the internet.

I also had another idea that once we build our Knowledge Maps, we should be able to compare and share it will one another.

For instance, I'd love to see what our software engineers around my age/ experience are reading and the insights they are drawing from articles.

Obviously some sort of security/ privacy mechanism will need to be implemented as well.

Do you then sync to a real CouchDB? Or, do you use PouchDB server on the backend, or otherwise?

I'm interested in understanding how you eventually translate to Graph4j and how you do authentication. Where can I look for this?

This is really, really cool that you open sourced this. Thank you.

No worries! I'm always leeching from the open source community so I'd thought to give back haha.

The visualisation library I'm using is by Ant Design (Alibaba).

You can see some examples below from the documentation.


Re syncing with real CouchDB - I haven't implemented that yet but from what I read, its definitely possible with PouchDB.

Re I have a bunch of Express APIs that expose PouchDB so I can do basic CRUD operations such as creating resources etc.

It is definitely possible to sync with CouchDB. I've just found that the uncertainty around how to do authz/authn (CouchDB recommends, now, that you do this in your own proxy layer) and the difficulties to make sure the entire process works, makes debugging tricky. I've tried with a variety of NodeJS+CouchDB and PouchDB-server and nothing gives me a solution which I don't have to babysit quite a bit.

This is a really cool use of browser history data. I'm building ActivityWatch [1] and have been thinking of building something similar on top of it.

[1]: https://activitywatch.net

This is a great concept, especially the link tracking.

One of my early research papers was about a tab reorganisation UI that tracked the links you clicked to reorganise your tabs to follow your train of thought. In most cases, the flat organisation is the worst, whereas if you follow my pattern of clicking links and moving to different tabs over time, you're halfway to describing (in a way that I can pick back up) my stream of consciousness.

Didn't get too far, for something like this to work it would have to be well integrated with the browser - and with new privacy restrictions, you'd have to end up recompiling a new browser to actually provide enough functionality.

Have you thought about ease of resuming where you left off? Biggest problem for me with Memexes isn't administration (even though it is huge and exponential as you say), it's that the longer it's been since I documented something, the harder it is for me to get back into the same mindset as I did then, with all the pieces still intact and connected.

Seems to me that finding a good representation of the internal mental model will help get over this.

I would be interested in hearing more about your tab reorganisation ideas, because I definitely recognize the pain of not being able to reconnect to a train of thought and have been exploring solutions myself. Is the research paper you mentioned available anywhere?

Unfortunately I ended up shelving the project once it became clear that I couldn't go far without rebuilding the browser, and I think things have only gotten worse in terms of what extensions can do.

Here's what I could find from my old folders - https://drive.google.com/file/d/1W4nxW9GaQXybdX4zKqVsdaE7unI...

It's not much - and I remember getting a little further, but I must've lost the file - but hopefully it eases the discovery of prior art if you end up going down this path.

If you do, would love to hear from you to help or share thoughts!

Thanks! Especially loved the cognitive neuroscience references.

For context, I'm working on a browser and note taking app for the iPad and have been exploring ways to organize browsing activity to go beyond treating tabs as ephemeral state. It's not quite there yet, but it would be great to hear what you think once I have more to share!

Looking forward to it!

Tree Style Tabs browser extension sounds relevant to what you're describing https://github.com/piroor/treestyletab

Interesting points regarding the internal mental model!

I agree, for effective information retrieval we (as humans) need to remember the context/ mindset where/when we consumed the knowledge.

I haven't really thought about this problem. It is definitely something to ponder on.

Could I add further constraint. These are really interesting ideas, I for one would be prepared to contribute to a project developing them.

Given the amount of investment in these systems, they need to sufficiently future proofed as to be useful 10 to 20 years from now.

That's why we need first and foremost protocols and ecosystems that prioritize Permanence, and changeability adaptability without loosing continuity. A new Kernel for the Decentralized Web that also guarantees interoperability

> Insight 3: Knowledge discoverability is a problem. Say your KMS is 5-10 years old with thousands of notes/ pieces of information, how do you sort through it?

I think this is one of the biggest problems in the space right now. We have smart inputs but not smart outputs.

Best case scenario, I would like to recommended knowledge based on the context of what I was working on or trying to achieve at any given moment.

and smart People Centered Architectures emergent HyperMaps of Meaning @TrailMarks

I'm really glad you put a lot of focus into automatically tracking the things the user is likely interested in. I tried using Obsidian, but it felt like I was spending more effort remembering to save information and create proper back-links than I was actually retaining any information.

I've recently started working on something similar as an excuse to learn machine learning, but it's still mostly vaporware outside the firefox extension I wrote. I think that by saving some basic metadata (when a page was viewed, what browser was used to view it), and using ML to judge how similar the contents of a page is to another, it should be able to automatically create links between related information. Ideally, it'd be able to handle information outside the browser. For example, if a log file is saved, then a web page is viewed with similar contents to the log file, it would be able to detect that the web page is probably a reference for the log file.

Like I said, it's mostly vaporware, but I think that products like these are going to be the future of collaboration tools.

Vaporware is better than going nowhere! (Get it...noware...haha).

Congrats on getting started.

I agree with Obsidian - I think that most people forget the maintenance time it takes to build a lifelong Knowledge Management System.

I like your idea - document similarity is a well known area in ML.

Feel free to take my Chrome Extension and use the parts where it tracks key paragraphs in an article (using a user's click/ hover/ attention behaviour) and use that as the corpus for your ML similarity models.

Intuitively it makes more sense to run document similarity on key points/ paragraphs than the whole web page.

If you want the whole web page though, there's code in the Chrome Extension that use's Mozilla's readability lib (https://github.com/mozilla/readability) to purify the web content.

Thanks for the tip on the readability library. I don't have much experience with webdev, so my extension was just saving a local copy of whatever was returned every time the browser made a request, I should be able to cut down on storage space if I can use the readability library to skip saving things like trackers and images.

I'm super interested in this area, with my own vaporware attempt at building it (fully abandoned, unlike yours).

I'm an ML engineer focused on NLP applications. Contact info in my profile if you ever want to chat, e.g. about different approaches for estimating document similarity.

This is unreadable for me, and I'd welcome any suggestions as to what I can do to my setup to make it readable by default. To see why, here's a section of screenshot:


Unreadably light grey on white.

All suggestions welcome ... TIA.

You can use reader mode on your browser. That should work to improve the contrast. You can customize it your liking.


My first thought was it's because of adblocking. Turned out it's not.

If you're using Firefox, try View/PageStyle/No Style.

I am using Firefox ... where or what is "View"?

It's the main menu. It's invisible now until you press Alt.

OK, got it. Not ideal, but at least that makes it readable ... thanks!

Apologies for that Colin! I'm using Squarespace for my personal blog.

I'll investigate further to make it more readable.

EDIT: spelling

Thanks for the response ... it might be fine for people with better eyes, or better monitors, but as you can see from the screenshot, the contrast is very low for me.

Love the work, I've saved the link into my system and will be reading it more carefully later. If you'd care to send me an email I'll send you a link to some draft thoughts I've had about knowledge systems. No obligation, obviously.


This is really fantastic. I've played around with Obsidian periodically, but it's difficult to groom and maintain. This approach seems more manageable.

I'd like to start using this, but at the same time you likely won't be maintaining it.

Final point: we really need a memex/knowledge graph ecosystem with easily interoperable components (e.g. browser extensions, book highlighting, etc that can all feed into one of many viewers).

I actually feel the opposite about Obsidian: finally a note app that feels native with powerful keyboard/action shortcuts, vim navigation and great categorization (graphs, references, tags).

We use Notion (Electron) at work and it has been extremely painful to navigate, organize and maintain. I think electron (OP project as well) is not the right choice as it's just awful for native interactions which is really important for note taking app.

I probably won't be maintaining this unfortunately :-(

You have a good point about interoperable components!

I was worried about yet another note taking app post on HN but I have to admit the Chrome Extension/Inbox part is a fresh idea and really intriguing.

Thanks man! I really appreciate it :-)

There is also https://getmemex.com/ which runs as a chrome extension.

My old personal KMS had a feature to load the local chrome history and display my activity on the daily timeline. I was mostly interested in understanding my day, recalling and to revisit.

Great experiment you have here, thanks for sharing!

Thanks! I appreciate the encouragement :-)

Have you talked to Andrew Louis about his Memex?


Definitely someone you should reach out to in this space if you're interested in seeing other approaches.

This is interesting. We at OrgPad.com try to do something a bit different but still in the related/ almost same space - we want to make a system usable for normal people, not for mostly highly educated and concentrated hackers/ IT professionals. We don't do any activity tracking, you have to be deliberate, what to write down and what not. We also currently have no way to evaluate information in the units e.g. for some kind of hierarchy or automatic sorting. These features will eventually come, but we are not there yet. As I said, our users are mostly people with expertise in other fields than IT, like teachers, students or managers.

OrgPad.com is a SaaS tool that you can use for free. It tries to do away with as much hassle/ non-sense as possible by really focusing on information and relationships between those. You can connect units as you want, you can put into units what you want. The layout is topologically stable, but the absolute position can change slightly e.g. if you open up a unit which has a large picture or something. This is a completely new algorithm that understands a units area and therefore knows, if a link is crossing it or not. A killer feature is being able to do "a path through the graph" which is basically a presentation/ slideshow of sort. The nice thing is, you don't have to transform your knowledge to a different format to be able to present it. We have found that normal users are a lot quicker creating such a presentation compared to e.g. Microsoft PowerPoint. Oh, and of course you can easily collaborate on one OrgPage with multiple people e.g. by sending them a link for editing or adding them to your team. If you want to just have a look and not create a login: have a look at some of the public OrgPages https://orgpad.com/list

Currently, we are writing our own editor, that should be much simpler than the current one and therefore integrate much better with the whole concept. When it is done, editing on mobile will also be possible. Mobile is currently read-only, but you can at least upload photos/ videos and sort them later, when you are at a computer which is a big help for us and our users. The whole thing is developed in Clojure/ClojureScript so the idea of simplicity really was an inspiration.

I believe knowledge discoverability can only be solved with NLP. Parse every article a user reads, extract topics, keywords, key concepts and store it is whatever your storage is, in searchable format.

OOT, in Indonesia, if you replace x with k, it's a slang for female genitalia. A very strong word you don't want to say out loud at work.


The memex (originally coined "at random",[1] though sometimes said to be a portmanteau of "memory" and "index"[2]) is the name of the hypothetical proto-hypertext system that Vannevar Bush described in his 1945 The Atlantic Monthly article "As We May Think".

That cat's been out of the bag for at least 75 years.

Indonesians will just have to respond in kind with their own package, as coq was en revanche for bit.

    Il était une bergère
    Et ron, ron, ron, petit patapon
    Qui saisit formulaires
    Du Coquand pour pouvoir
    (Ron, ron)
    Verifier son chaton.

Haha! Talk about learning something new every day!

Can confirm

The 'technologies used' section should've mentioned 'antd'


Very cool! Hope you might consider adding a Firefox extension too

there is another Memex project with a Firefox extension.


Awesome open source project as well!!

Good find.

Hello steve, I like how you are integrating pytorch and other ML models in your project. Looking forward to it and I hope you the best. Do you have a twitter profile or a mailing list? I found only your email and linked in profile on the website. I want to be updated on this project.


Very cool, even if I hate electron. :P

It was my first time playing around with Electron! Overall pretty decent experience.

What makes you hate it if you don't mind me asking?

Here's a pretty good HN thread that highlights some criticisms of Electron: https://news.ycombinator.com/item?id=18733989

Thanks, reading it now. Good points on memory usage and latency.

Not OP but main gripe is that it fails to integrate with native environments and for efficiency-orentied note taking app losing that efficiency edge native environments provide is just counter-productive.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact