I’ve open sourced all the code + written down some notes on my insights/ architecture. Just a warning, the code leaves much to be desired as this was a mini project over 2-3 weekends.
At the moment, I have 3 key layers of the application.
The first is Chrome Extension which allows data collection. It tracks all sorts of data such as attention, time on page, scroll/ click and hover behaviour. It then sends this data to the Electron app.
The Electron app receives this data through an API and saves it down locally using PouchDB (data structure is compatible with Apache CouchBD and thus allows easy cloud storage).
Within the Electron app, I use an Express/ NodeJS web server to expose endpoints for functionality with the front end/ receiving data from Chrome Extension.
For the front end I use ReactJS.
If you’re interested, please read the full article! There are lots of videos of the application in action.
Enduring, Open, Co-evolvable, Bootstrapable, metadesignable, promoting edge-user autonomy, uneclosable collaboration and much more
For example, if I’m reading an article about big data/ ETL pipelines and then I click on a link from within the article to a resource on machine learning, this relationship should be tracked and digested.
Unfortunately a new baby has drained all my time for such pursuits.
Hope you develop your concept further!
So right now, the Chrome Extension tracks all link clicks as well so the visualisation actually builds relationships between articles/ blogs/ anything else you read on the internet.
I also had another idea that once we build our Knowledge Maps, we should be able to compare and share it will one another.
For instance, I'd love to see what our software engineers around my age/ experience are reading and the insights they are drawing from articles.
Obviously some sort of security/ privacy mechanism will need to be implemented as well.
I'm interested in understanding how you eventually translate to Graph4j and how you do authentication. Where can I look for this?
This is really, really cool that you open sourced this. Thank you.
The visualisation library I'm using is by Ant Design (Alibaba).
You can see some examples below from the documentation.
Re syncing with real CouchDB - I haven't implemented that yet but from what I read, its definitely possible with PouchDB.
Re I have a bunch of Express APIs that expose PouchDB so I can do basic CRUD operations such as creating resources etc.
One of my early research papers was about a tab reorganisation UI that tracked the links you clicked to reorganise your tabs to follow your train of thought. In most cases, the flat organisation is the worst, whereas if you follow my pattern of clicking links and moving to different tabs over time, you're halfway to describing (in a way that I can pick back up) my stream of consciousness.
Didn't get too far, for something like this to work it would have to be well integrated with the browser - and with new privacy restrictions, you'd have to end up recompiling a new browser to actually provide enough functionality.
Have you thought about ease of resuming where you left off? Biggest problem for me with Memexes isn't administration (even though it is huge and exponential as you say), it's that the longer it's been since I documented something, the harder it is for me to get back into the same mindset as I did then, with all the pieces still intact and connected.
Seems to me that finding a good representation of the internal mental model will help get over this.
Here's what I could find from my old folders - https://drive.google.com/file/d/1W4nxW9GaQXybdX4zKqVsdaE7unI...
It's not much - and I remember getting a little further, but I must've lost the file - but hopefully it eases the discovery of prior art if you end up going down this path.
If you do, would love to hear from you to help or share thoughts!
For context, I'm working on a browser and note taking app for the iPad and have been exploring ways to organize browsing activity to go beyond treating tabs as ephemeral state. It's not quite there yet, but it would be great to hear what you think once I have more to share!
I agree, for effective information retrieval we (as humans) need to remember the context/ mindset where/when we consumed the knowledge.
I haven't really thought about this problem. It is definitely something to ponder on.
Given the amount of investment in these systems, they need to sufficiently future proofed as to be useful 10 to 20 years from now.
I think this is one of the biggest problems in the space right now. We have smart inputs but not smart outputs.
Best case scenario, I would like to recommended knowledge based on the context of what I was working on or trying to achieve at any given moment.
I've recently started working on something similar as an excuse to learn machine learning, but it's still mostly vaporware outside the firefox extension I wrote. I think that by saving some basic metadata (when a page was viewed, what browser was used to view it), and using ML to judge how similar the contents of a page is to another, it should be able to automatically create links between related information. Ideally, it'd be able to handle information outside the browser. For example, if a log file is saved, then a web page is viewed with similar contents to the log file, it would be able to detect that the web page is probably a reference for the log file.
Like I said, it's mostly vaporware, but I think that products like these are going to be the future of collaboration tools.
Congrats on getting started.
I agree with Obsidian - I think that most people forget the maintenance time it takes to build a lifelong Knowledge Management System.
I like your idea - document similarity is a well known area in ML.
Feel free to take my Chrome Extension and use the parts where it tracks key paragraphs in an article (using a user's click/ hover/ attention behaviour) and use that as the corpus for your ML similarity models.
Intuitively it makes more sense to run document similarity on key points/ paragraphs than the whole web page.
If you want the whole web page though, there's code in the Chrome Extension that use's Mozilla's readability lib (https://github.com/mozilla/readability) to purify the web content.
I'm an ML engineer focused on NLP applications. Contact info in my profile if you ever want to chat, e.g. about different approaches for estimating document similarity.
Unreadably light grey on white.
All suggestions welcome ... TIA.
If you're using Firefox, try View/PageStyle/No Style.
I'll investigate further to make it more readable.
Love the work, I've saved the link into my system and will be reading it more carefully later. If you'd care to send me an email I'll send you a link to some draft thoughts I've had about knowledge systems. No obligation, obviously.
I'd like to start using this, but at the same time you likely won't be maintaining it.
Final point: we really need a memex/knowledge graph ecosystem with easily interoperable components (e.g. browser extensions, book highlighting, etc that can all feed into one of many viewers).
We use Notion (Electron) at work and it has been extremely painful to navigate, organize and maintain. I think electron (OP project as well) is not the right choice as it's just awful for native interactions which is really important for note taking app.
You have a good point about interoperable components!
My old personal KMS had a feature to load the local chrome history and display my activity on the daily timeline. I was mostly interested in understanding my day, recalling and to revisit.
Great experiment you have here, thanks for sharing!
Definitely someone you should reach out to in this space if you're interested in seeing other approaches.
OrgPad.com is a SaaS tool that you can use for free. It tries to do away with as much hassle/ non-sense as possible by really focusing on information and relationships between those. You can connect units as you want, you can put into units what you want. The layout is topologically stable, but the absolute position can change slightly e.g. if you open up a unit which has a large picture or something. This is a completely new algorithm that understands a units area and therefore knows, if a link is crossing it or not. A killer feature is being able to do "a path through the graph" which is basically a presentation/ slideshow of sort. The nice thing is, you don't have to transform your knowledge to a different format to be able to present it. We have found that normal users are a lot quicker creating such a presentation compared to e.g. Microsoft PowerPoint. Oh, and of course you can easily collaborate on one OrgPage with multiple people e.g. by sending them a link for editing or adding them to your team.
If you want to just have a look and not create a login: have a look at some of the public OrgPages https://orgpad.com/list
Currently, we are writing our own editor, that should be much simpler than the current one and therefore integrate much better with the whole concept. When it is done, editing on mobile will also be possible. Mobile is currently read-only, but you can at least upload photos/ videos and sort them later, when you are at a computer which is a big help for us and our users. The whole thing is developed in Clojure/ClojureScript so the idea of simplicity really was an inspiration.
The memex (originally coined "at random", though sometimes said to be a portmanteau of "memory" and "index") is the name of the hypothetical proto-hypertext system that Vannevar Bush described in his 1945 The Atlantic Monthly article "As We May Think".
That cat's been out of the bag for at least 75 years.
Il était une bergère
Et ron, ron, ron, petit patapon
Qui saisit formulaires
Du Coquand pour pouvoir
Verifier son chaton.
What makes you hate it if you don't mind me asking?