There were 3 main features of the Memex that we just don't have.
Annotation - The ability to mark up hypertext... yeah... HTML claims to do it, but you can't change a document once it's read only. You should be able to layer annotations on top of a document. HTML simply is not a Language for the Markup of Hypertext Documents.
[Edit/Extend] - Annotation is anything you could do with a printout, at minimum. Circle something, highlight a section, black something out, attach links, etc. It's not just tagging a file as a black box.
Trails - the ability to store a trail of annotations, linking up parts of two documents together, or more. Like a stack of bookmarks, except bookmarks can't anchor to part of a document.
Publication - One of the crucial features of the Memex was the ability to output a collection to allow sharing with others. It could spit out a copy of everything linked to a trail, all the files, all in one dump. Copyright holders would have a huge problem with Memex if it were implemented as originally shown.
I think it should be possible to build a proxy that records ALL local http(s) traffic, and thus record all the sources to documents you've seen... and then only keep those that you've tagged or linked to after a few months, purging the rest. A private Memex, but it couldn't even be an open source project, as copyright holders would work to squash it.
Oh, we did all that. I even wrote my PhD dissertation on it (2001). Back then, there were quite a few research projects working on what I then termed ‘Web Augmentation’. You could do it serverside, in special proxies, or in the browser through extensions or scripting. The earliest browser based system I know of that allowed users to insert links and annotations into existing Web pages was DHM/WWW, developed by yours truly in 1997 – we even won best paper for it at Hypertext 1997. This was a specialised case of what was then known as Open Hypermedia, i.e., a hypermedia system that could integrate (more or less) arbitrary documents and systems through links and other structures.
Using a more general architecture, it was possible to add many kinds of structures on top of existing Web pages collaboratively. Navigational or spatial hypermedia, guided tours, etc. Link bases were stored outside of the linked resources, so that you could have different sets of structures applied to a corpus of documents.
There were also quite a few commercial attempts in this space. However, all, commercial and research alike, ultimately failed, and I believe that there were several reasons behind this: Web site owners disliked that others could annotate their pages; shared link bases often became polluted; bookmarks were ‘good enough’ for most users; maintaining link anchor consistency in (increasingly dynamic) Web pages became more and more difficult (despite our best efforts); and while I remain enthused about the notion of the MEMEX scholar carefully annotating their corpus of literature, I don’t think the appeal is all that widespread.
Still, there are remnants in some systems – e.g., I enjoy to see that others have highlighted the same passage as I in a Kindle book.
It's weird — we, the builders of HTTP-based APIs, very well understand the concept of "REpresentational State Transfer" ...but I've never seen a web-app server that actually accepts PUT or PATCH or POST of HTML Content-Type against the baked-out server-rendered HTML views, such that if you were to push a modified version of said representational state (the post-edit HTML), it would get "decompiled" on the backend into a delta of the internal state (e.g. Markdown in a CMS), and that change to the internal state would get persisted, reflecting back as a change in the representational state when the page is fetched.
If anyone would just actually implement that on a web-app backend, then you'd be able to "annotate, edit, and extend" HTML-presented content using freakin' NSCA Mosaic from 1993. Or the "edit in Frontpage" feature of IE4.
(No modern equivalents, though — the features that would generate a "native" browser PUT or PATCH against an HTML page have since shrivelled up and died, after it became clear that no server would ever bother to allow them.)
To dodge the chicken-and-egg problem of browser support, you could do this today, with a Javscript SPA serving the browser role: the frontend app would ensure that it's reflecting all internal state changes in its in-memory client-side internal state into changes to the HTML (even if those changes don't cause any presentation changes on the client side); and then, where a regular SPA would push those changes by calling an API endpoint that accepts a version of its in-memory client-side internal state, this SPA would just push the updated HTML, as a PUT or PATCH to the very same URL that was loaded to deliver the SPA.
I'm not sure about others but I'd prefer my extended brain not live on other people's servers. The Memex always seemed to me to be a personal archive and notebook. You would pull in material from outside and store it locally along with your links and annotations and then only publish the work that you wanted to make public.
Someone at NSCA implemented annotations in Mosaic, but the bus number dropped to zero when they left and the available evidence seems to be that none of those of us after the exodus were familiar with Memex. I think a coworker recommended Vannevar Bush to me around 2006. Boy was that a Tyler Darden moment.
I think what the world has needed basically since then is browser integration for something like annotations, so that web sites and commentary about them can live together. And with less of a direct line to advertising dollars. Perhaps something that could live in the Fediverse. Sometimes I want HN’s opinion and sometimes I want someone else’s.
> I think it should be possible to build a proxy that records ALL local http(s) traffic
There was a Microsoft Research project in the early 00’s called “Stuff I’ve Seen” that did basically this. I don’t know why it never really went anywhere - I’ve wanted one ever since I saw the talk.
You can annotate any upload (arbitrary data, files, or collections of files) with tags and since the data is permanent, your links don't break.
Publication is possible too, and with the Universal Data License, copyright holders can define the terms for their part of the publication right at upload time.
I would think that Apple has the level of integration that would make this possible, but then... where's the potential profit that would make it worth their while ?
The need to adapt it for the average user is mentioned. The average person uses a pc (if ever - good luck with this on mobile) mostly for work (ms office + some ERP) and in some cases for private uses (e.g. news, e-banking, mails, important administrative work). If you go a bit deeper maybe reddit and video games. An average user would never want to link around stuff on the web with a hundred arrows and multiple colors. He simply does not care.
The author and the old guy in the video he linked to behave almost cult-like, especially the old guy: He literally claims that this IS the best method for working with documents ever, the www is a fork of his idea based on a "dumbed-down in the 70s at brown university", he does not understand why it has not already taken off and he thinks its the most important feature for the human race. Really?
If people really see potential in this and work in spaces like journalism and academic research there would be already big programs out there.
Yes it is good to have passion about something and yes it is good if someone has a real need for this and his delivered with a solution, but this will never go mainstream. And in my opinion not even in the segment of technical skilled people like engineers.
This is the typical invention that fits the "I know it is the best thing, I love it and almost pressure people to use it, but it has not taken off the lightest for decades" case.
The old guy in the video is Ted Nelson, the man who coined the term hypertext, made significant contributions to computer science, inspired two generations of researchers and continues to inspire as his works are being rediscovered.
There have been "big programs" but when the web came, fundamental hypertext research and development on other systems came to a grinding halt. Ted Nelson, and many other researchers, predicted many of the problems that we now face with the Web, notably broken links, copyright and payment as well as usability/user interface issues.
I don't know what an average user is, but what a user typically does or wants to do with a computer is somewhat (pre)determined by its design. Computer systems have, for better or worse, strong influence on what we consider as practical, what we think we need and even what we consider as possible. (Programming languages have a similar effect).
One of the key points of Ted Nelson's research is that much of the writing process is re-arranging, or recombining, individual pieces (text, images, ...) into a bigger whole. In some sense, hypertext provides support for fine-grained modularized writing. It provides mechanisms and structures for combination and recombination. But this requires a "common" hypertext structure that can be easily and conveniently viewed, manipulated and "shared" between applications. Because this form of editing is so fundamental, it should be part of an operating system and an easily accessible "affordance".
The Web is not designed for fine-grained editing and rearranging/recombining content and has started as a compromise to get work done at CERN. For example, following a link is very easy and almost instantaneous, but creating a link is a whole different story, let alone making a collection of related web pages tied to specific inquiries, or, even making a shorter version of a page with some details left out or augmented. Hypertext goes far deeper than this.
Although a bit dated, I recommend reading Ted Nelson's seminal ACM publication in which he touches many issues concerning writing, how we can manage different versions and combinations of a text body (or a series of documents), what the problems are and how they can be technically addressed.
> One of the key points of Ted Nelson's research is that much of the writing process is re-arranging, or recombining, individual pieces (text, images, ...) into a bigger whole. In some sense, hypertext provides support for fine-grained modularized writing. It provides mechanisms and structures for combination and recombination. But this requires a "common" hypertext structure that can be easily and conveniently viewed, manipulated and "shared" between applications. Because this form of editing is so fundamental, it should be part of an operating system and an easily accessible "affordance".
Here's where I'm stuck:
Hypertext - whether on the web or just on a local machine - can't solve the UX problem of this on its own, though. People can re-arrange contents in a hypertext doc, recombine pieces of it... but mostly through the same cut-and-paste way they'd do it in Microsoft Word 95.
The web adds an abstraction of "cut and paste just the link or tag that points to an external resource to embed instead making a fresh copy of the whole thing" but all that does is add in those new problems of stale links, etc.
So compared to a single-player Word doc, or even a "always copy by value" shared-Google-doc world that reduces the problems of dead external embeds, what does hypertext give me as a way of making rearranging things easier? Collapsible tags? But in a GUI editor the ability to select and move individual nodes can be implemented regardless of the backend file format anyway.
TLDR: I haven't seen an compelling-to-me-in-2023 demo of how this system should work, doing things that Google docs today can't that avoids link-rot problems and such, to think that the issue is on the document format instead of user tools interface side.
I have to catch some sleep, but I will address your questions as good as I can later. In the meanwhile, you might want to take a look at how Xanadu addresses the problems of stale links, and maybe some of your other questions will be answered.
Also, I highly recommend reading Nelson's 1965 ACM paper I mentioned to better understand the problems hypertext tries to solve and the limitations of classical word processing (which also expands to Google Docs).
Thanks, though I think even in these docs there are some early on concepts I just don't find convincing. Such as in the Xanadu doc, the Software Philosophy short version being a recomposed copy of live text from the long version. If I'm following their goals, they want live cross-editing through their "transpointing windows" - I really really don't, personally. I picture three docs, A, B, and C which is a summary composite of things pulled from A & B - C will still need its own manual curation and updating if A and B are changed to remain legible, flowing, and meaningful, and I'd rather have a stale doc than a garbled one.
The intro of the Nelson paper/Memex discussion is similarly alien to me. I don't think it's human-shaped, at least not for me. The upkeep to use it properly seems like more work than I would get back in value out. It's a little too artifact-focused and not process/behavior focused, IMO?
>I picture three docs, A, B, and C which is a summary composite of things pulled from A & B - C will still need its own manual curation and updating if A and B are changed to remain legible, flowing, and meaningful, and I'd rather have a stale doc than a garbled one.
I think I see what you mean. Garbling, as you mention it, is actually what Xanadu is supposed to prevent. The problem is that it is not explicitely mentioned that a document/version (a collection of pointers to immutable content) should also be an addressable entity (an part of the "grand address space") and must not change, once it has been published to the database.
In particular, if a link, e.g., a comment, is made to a text appearing in a document/version, the link must, in addition to the content addresses, also contain the address of that document/version (In Fig. 13. [1] that is clearly not the case and I think that's a serious flaw).
This way, everything that document C refers to - and is presented - is at the time it was when the composition was made. How revisions are managed is an orthogonal (and important) problem, but with the scheme above we lose no information about the provenance of a composition and can use that information for more diligent curation.
I understand. I think your last objection is very valid and perhaps needs far more consideration.
Anyway, I have limited computer access at the moment, but maybe you find the following response I wrote in the meanwhile useful. Ill get back to you.
---
Some remarks that hopefully answer your question:
The Memex was specifically designed as a supplement to memory. As Bush explains
in lengthy detail, it is designed to support associative thinking. I think it's
best to compare the Memex not to a writing device, but more to a loom. A user would use a
Memex to weave connections between records, recall findings by interactively following
trails and present them to others. Others understand our conclusions by retracing the
our thought patterns. In some sense, the Memex is a bit like a memory pantograph.
Mathematically, what Bush did is to construct a homomorphism to our brain.
I think it is important to realize that, when we construct machines like the Memex
or try to understand many research efforts behind hypertext. Somewhere in Linda Barnett's
historical review, she mentions that hypertext is an attempt to capture the structure of thought.
What differentiates most word processing from a hypertext processor are the underlying
structures and operations and ultimately, how they are reflected by the user-interface.
The user interfaces and experiences may of course vary greatly in detail, but by
large the augmented (and missing!) core capabilities will be the same. For example,
one can use a word processor to emulate hypertext activities via cut and paste, but
the supporting structure for "loom-like" workflows are missing (meaning bad user-experience),
and there will be a loss of information, because there are no connections, no
explicitely recorded structure, that a user can interactively and instantly retrace (speed and effort matter greatly here!),
since everything has been collapsed to a single body of text. The same goes for annotations,
or side trails that have no structure to be hanged onto and have to be put into margins,
footnotes or various other implicit structural encodings.
Hypertext links, at least how Ted Nelson conceptualizes them, are applicative. They are not
embedded markup elements like in HTML. In Xanadu, a document (also called version) is
a collection of pointers and nothing more. The actual content (text, images) and the documents,
i.e., the pointer collections, are stored in a database, called the docuverse. Each content
is atomic and has a globally unique address. The solution to broken links is a bit radical: Nothing may
be deleted from the global database.
In other Hypertext systems, such as Hyper-G (later called Hyperwave), content may be deleted and link integrity is ensured by a centralized link server.
(If I am not mistaken, the centralized nature of Hyper-G and the resulting scalability problem was its downfall when the Web came).
Today, we have plenty of reliable database technologies and the tools for building reliable
distributed systems are much, much better, so I think that a distributed, scalable version of
Ted Nelson's docuverse scheme can be done, if there is enough interest.
How a document is composed and presented, is entirely up to the viewer. A document is only
a collection of pointers and does not contain any instructions for layouting or presenting,
though one can address this problem by linking to a layouting document. However,
the important point is that processing of documents should be uniform. File formats such as PDF (and HTML!) are the exact opposite of this approach.
I don't think that different formats for multimedia content can be entirely avoided, but when
it comes to the composition of content there should be only one "format" (or at least
very very simple ones).
It's interesting how the author settles on plain old files as the data access paradigm. A file cannot be yanked from under your feet, like a link can. It remains where it is indefinitely until deleted, it does not change (so cannot be tampered with or damaged by the author), it normally is not subject to artificial access restrictions a la logins or DRM. It's an artifact that can actually be archived for the rest of your natural life.
I don't see any reason why the folders in question should necessarily be shared, though. I get the impression that as defined by Bush originally, the memex is first and foremost a memory expander for your private memory, and any sharing would take place infrequently, or on a narrow subset of data contained within.
What do you have available to you when the internet is down? (It was for me last week, for pretty much a whole day.) That's the point (or at least one of them) of personal control.
What do you have available to you after a hard drive failure? That's the point (or at least one of them) of having stuff in the cloud.
How do you balance the two? Local, personal storage backed up in the cloud?
> What do you have available to you when the internet is down?
Everything, because I pretty much started shunning everything I can't reduce to a bunch of files I can use anywhere, and am using the same folder structure on all machines. If an application has a portable version (and doesn't have configuration that may vary between machines, e.g. hardware sensors displayed in the tray), I always use the portable version, install it on one machine, and then sync that to the rest. I love it so much. I can open a Sublime Text project file anywhere and it just magically works, because the paths are the same.
The only reason I sync some things manually instead of using Syncthing for everything is that I don't want to needlessly cause more SSD wear. E.g. Firefox/Thunderbird constantly write and delete stuff in the profile, so if I want to use a profile on another machine, I close it, sync, and open it on the other machine.
My source of truth is completely in those files, including my web stuff. Which, admittedly, is rather simple: just PHP and JavaScript and MariaDB really, or sqlite where that's enough. I thought I liked golang, but after spending an hour yesterday trying to compile something I haven't touched in 5 years, I decided to just rewrite it in PHP - because something that is slow but doesn't require constant hand holding beats anything else for me. I want to be able to pick up where I left, regardless of whether 10 years passed. PHP and JavaScript are unsurpassed in my personal experience when it comes to that. Even the shittiest JS still works fine 20 years later, and when you use PHP with warnings turned on, upgrading to new versions tends to throw very few errors, and they're actually helpful, so making the changes is trivial.
I miss out on a lot of fancy things that way, but I don't miss any of them. I love thinking of my stuff as where it is in my "home cloud" folder structure, and that though they're not actually thin clients, the particular machines really don't matter. Of course I also have backups, but the plan is to never need them because I always have more than one "live" working environment, which reads from and writes to that shared set of files.
> What do you have available to you when the internet is down?
Certainly not everything at this stage, but in a properly run Memex system, IMO, that really should be everything. Assuming enough local storage, there is no reason not to save every page you view while browsing the web in full.
> What do you have available to you after a hard drive failure? That's the point (or at least one of them) of having stuff in the cloud.
The standard answer to that, is to have good backups. Bite the bullet and spend some time and money on buying the proper equipment (a commercial NAS, a Raspberry Pi NAS with some USB-connected drives, a small form-factor PC with a couple of SSDs, anything you want), and integrating it into your other machines as a backup destination. Cloud makes it much easier, but also less private, with an ongoing monthly cost, and adds an additional point of failure that's outside of your control.
As for how I balance it personally? Local storage all the way. Even if I picked an end-to-end encrypted backup service, several terabytes of data is just too expensive to store in the cloud.
We definitely need open source tools. I personally use Logseq and have a lot of notes. I want to use AI to better search and make connections, using something like ChromaDB. Logseq is mainly for text. What My Mind (mymind.com) does for "smart bookmarking" is really great. No folders, collections at most if you really need it. Making the most use of digital knowledge is something that really needs to be focused on. It can accelerate a lot of things. And we must stop doing this with yet another cloud note-taking app with OpenAI integration with a bunch of (personally) useless collaborative features. I want my own mind that works for me and in my interest.
First time I heard of My Mind. Looks really nice, but how is it not yet another cloud note taking app?
To me, the biggest problem of this kind of tool is that if you use it to its fullest potential, it necessarily becomes a crucial part of your daily life, something that will cause very serious problems if you lose it.
And that means anything proprietary or closed SaaS is an absolute no go. My Mind's promise to respect privacy and stay independent of outside influence is nice, but worthless. They can change their mind at any time - or get bought up, or go bankrupt. Being a paid service might reduce the likelihood, but does not eliminate it, especially not over a timespan of decades.
So at an absolute minimum, such a tool needs to be able and willing to export all of my data in an easily processible format. This is far more important than shiny features.
My current solution is a self hosted TiddlyWiki. It may not have all the shiny features, but it is mine, and always will be, even 50 years from now. The worst case scenario is that it stops working on new browsers and nobody maintains it anymore - but I still have the data. Easily processible text files are actually its internal storage format.
I think files are too limited to be the base for a memex. I've been managing a lot of my data recently in my custom built "memex" and have found them a poor abstraction in many cases. An example is my music collection. I have 2000+ songs, of which I have music metadata, metadata for myself (added date, found source), and the music itself. Bundling this into the mp3 is unideal. A json file per song is also not great, doable but overwhelming at 2000+. My ideal is a json file for all of it and then linking to the path of the mp3. The abstraction I've made allows you to pick arbitrary positions in the data "tree" when to break from a file tree abstraction to a intra-file data format (JSON, YAML, etc...)
I also think the interoperable argument breaks down when you want to do anything off the beaten path. The author mentions how you might have a folder with a presentation or document. But attaching any kind of nonstandard data for the format is impossible, and requires the format parsing clients to know about it. You're limited by what each format supports and the total number of available, mature formats with mature parsers/editors.
My system is basically to emphasize easy rebuild of a database. So I have directories that contain a text file that can be easily parsed to build databases or process workflows. I have found this works extremely well even in draconian IT environments.
That sounds nice. I've diverged a bit from actual files on disk (even though I wanted my system to stay there for simplicity) into an abstract tree structure with a fs-like API. Though you can get back to plain files by just using a single FSTree (in my system). Would love to hear more about the structure of those plain text files you use for regeneration
I think it's very important to have the ability to work directly with files, for draconian environments you describe, but just because it's the simplest, stable, most portable, least-time to _something_ abstraction. It just has to be balanced with more complex functionality files alone can't handle.
Whats overwhelming about the json per file? You need to fill up all that data either way whether you divide it up into a json per mp3 or a single mega json.
Well, with a single json file I can pull it into my text editor (Sublime) and do find and replace (and it's a little nicer than find in all files) and I can do things like code block folding, and probably others Im forgetting. Also, I'm not sure how to `jq` across 2000 files but how to with one json file. I've also heard that the filesystem is not a great place to store larger amounts of files like, say, 500k records?
My "memex" tries to use well-known abstractions so that I can do certain tasks in other programs until my "memex" can do it. That and because for quite a few operations, I'd like to simply mount the window of a separate program into my "memex" as a sort of window manager (but deeply integrated with the other stuff the "memex" does, like data and tab management) instead of trying to remake such monolithic, decently-built programs
I usually prefer a Google Doc shared among the team to work on a given problem or feature instead of the so popular Miro. Diagrams are nice, yes, I agree, but they are not a substitute for actual text. Engineers think that because they draw boxes and lines and add little notes here and there the design is Done. Product managers usually like this approach too because “we don’t have time to read a whole page of text”.
Drives me nuts.
"A picture is worth a thousand words" or some such.
That said, I've certainly heard that different people prefer different ways of learning/consuming knowledge. I myself will choose video for some things, illustrations/graphs/images for other things, and text/tables for yet others. I do have a fondness for well done images, though.
Even the style of text can have an impact. Bullet points for providing instructions, instead of a long paragraph of sprawling text.
Another thing I picked up some 15 years ago is borrowed from newspapers: the concept of "above the fold"; get the main points out there near the top, and keep it concise.
A well-done graph can communicate data and the desired message quickly. At the opposite end of the spectrum, we have poorly done or deceptive graphs.
An illustrated magazine will communicate a story differently, compared to a text-only novel. There's nuance to each approach that the other can't quite capture.
A photograph or painting can elicit a different level (breadth, depth) of mood/emotion that words might merely hint at.
Of course, what's useful in one context could be pointless in another. An introduction to photography would do well to include some illustrations, whereas a book on meditation could probably do fine with few, if any, images. Another example is Gray's Anatomy, which would probably be more challenging to consume if there were no illustrations.
Some subjects such as learning a martial art can be helped by using pictures/illustrations, but realistically when used as the only source of knowledge fall short of adequately communicating proficiency in said martial art, where only in-person instruction would really suffice.
The import from URL feature on Google docs is perfect for importing mermaid.live diagrams to accompany the text. A mermaid.live URL contains the complete diagram state, which allows anyone to follow the original URL, make modifications and update the Google doc.
That's a hard sell for those who work in companies that prohibit dependencies on free 3rd-party web services to discuss business confidential ideas in an unsecure way.
The big problem with files is that they are linear. Up to computers, the standard interface for people to work is the sheet of paper, which is a two dimensional surface. In fact, it was even better because you can organize papers into books or collections, which gives a 3rd dimension. Files, being linear, make it difficult to parse and analyze, which is always the problem we need to solve with computers. This problem is much easier to solve when we can work on 3 dimensions and organize data spatially (this is the guiding principle of charts and tables).
I disagree. The linearity is super helpful for scanning and finding quickly, by time, by name, etc. Id argue that any app that makes the interface "spatial first" will have to add these linear features on top of what they're doing to find things fast, at a minimum for full-text search.
I do think spatial organization can help for human consumption of not easily orderable datasets >1000 items, but I think that's a niche case for everything ive interacted with so far.
>I do think spatial organization can help for human consumption of not easily orderable datasets >1000 items, but I think that's a niche case for everything ive interacted with so far.
Well, when the data gets above 1000 items, you have to start filtering, finding, or revisualizing it to be useful. You can no longer keep the whole dataset comfortably in your head, or even on screen.
For revisualizing, it might be comfortable to plot it on a 2D map or to simply graph an interesting scalar.
For revisualizing in a 3D spatial setting... I haven't seen any actual good 3D spatial layout of data. Disregarding "experiences" (like art galleries, which lay out data in a way to promote interest in a topic and fractal-like amounts of subtopics, but is human curated, like Saganworks[1]), the only one I guess is like the high level "portal" to categories of data. Like "Documents", "Downloads", etc but think 500+ categories of more granular entity types. Everything is non-comparable apples and oranges (except for perhaps shared common data like timestamp) and there's no way to quickly scan it (5 or so seconds to find alphabetically perhaps), so spatial might help to quickly locate your entity type with muscle memory. I haven't seen another use case, but would love to explore what others are doing in this space!
I meant specifically what organization did you use for "not easily orderable" data. Or was i wrong in assuming you meant simply lack of (natural number like) order and didn't mean total lack of clear classifiability?
Ohh, the only one I have found was the one I mention in the last paragraph. Large number of categories. Everything else I've found has at least one good way to order it
I meant organization scheme not any particular organization if unclear. That app looks like some kind of 3d world/metaverse to me I will study it more later.
I'm confused. Do you mean what particular organization algorithm would I use to organize unorganizable data? Not just the type of visualization it would use (ie, 2d list vs 3d spatial)?
I was just clarifying and asking if you understood the word "organization" to mean some particular company or organization like that sagan thing or in my intended sense of organization scheme/visualization technique in general?
I agree, version control is absolutely necessary when you are copying documents from other sources and modifying them. You need to know if the document has changed upstream since you cloned it and you need to be able to see the diff of the changes you made.
Check the original article on memex in Life magazine. Something like 25% of the real estate of the article are ads interrupted by five full page ads and followed by another two full page ads. Prophetic. It's actually difficult to extract the article itself while reading.
The key connecting element of a lot of the examples of "Memex" is the web platform - http, html, css, js, etc. - and that it is both a document and app delivery system. That is the secret that has made it work. All the examples of file sharing still have a http interface to connect to.
As he touches on, the missing, or a least somewhat disjointed, element is local data and connecting to that from other web destinations. My hope, and belief, is that we are breaking down those local <-> server barriers with newer JS APIs for better file like local storage.
It's just a pity in a way that the web browser and web server were not combined into one element from the start. It's still too hard to make content available on the network from your personal device without a server.
His conclusion about the need to cognitively organise such trails/collections though could well change in the coming years - and potentially sooner. It's clear the LLMs are good at organising and summarising text and data. I'm expecting us to be using tools that will ingest all our work activity (documents, meeting, email, calls, browsing) and automatically curate and file it with a chat based ui to extract that data. We just desperately need that to be local and not server based.
At the risk of diverting this into yet another discussion of note taking apps...
I'm using Notion. Its ok, but I'd like to have a way to snapshot documents to incorporate into my notes. And lately I feel like I'm paying for all their new AI stuff that I just don't use.
So help me HN: whats a good open source, self-hostable, replacement with decent desktop/mobile tooling and a practicable migration path from a Notion export/backup?
I'm not sure what you mean by snapshots, and it's not open source -- but if you haven't tried it yet, check out Obsidian. "Vaults" are self-contained, just markdown files in folders, with an extra folder for the Obsidian stuff, i.e. all settings and state and even the plugins. Plays great with SyncThing etc. And there are so many great community plugins, which is where the real magic of it is for me.
I really do like the data model of obsidian. For the most part the data (markdown files) and the meta data (the .obsidian directory) stay out of each other's way.
I tried Obsidian again recently and their editor caught up to Typora's (and it was faster than the last time I used it). I'm trying to switch to it now, as the tree is more "file-y" than Typora (which removes blind spots for me, as I save PDFs and other stuff in my note tree).
I've resorted to writing my own though now. Notion, Typora, and Obsidian are great for one specific type of management (documents, markdown-like), but I think there's a higher level abstractions here.
In my humble opinion these tools aren't quite there yet, which is not to say they aren't polished, but that they don't disappear from view like good tools should.
There's still a noticeable impedance mismatch due to having to use their editor, for example, which has its quirks and is different enough to a standard text editor to cause a context switch.
The physical desk surface as a clean interface motif hiding all the internal complexity is a big part of the Memex that people tend to skip over in the conversation about how hyperlinks solved this all years ago. The use of windows as an approximation for physical depth perception allowing something to be 'on top' is still only a very limited implemention of what is digitally possible.
Congrats, you just reinvented the early web and made no improvements. You just have an index of documents with no hyperlinks. Even Notion is a step up from this and it’s no where close to having the features of a memex. I’m always amazed that people completely miss the entire point of the trail feature.
> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something. [0]
I’m personally at a stage of disillusionment with the modern web such that I think returning to the early ideas without improvement is an improvement over the status quo.
It may be true that what the author describes is rudimentary, but to me, that’s kind of the point.
This isn’t to say that better tools aren’t needed. But I’m fully on board with what I see as a movement/mindset shift back to the fundamentals of the early web.
If memex becomes a concept of interest in the 2020s, perhaps interested people will invest in building better tools.
A trail (IIRC) charts a journey through a set of documents, or perhaps sections of documents.
Think about a case where you are trying to figure out a problem based on a bunch of error messages. You search the web for solution to the first message, then the second message and so on, until you arrive at the root cause and the solution to that. You then save this as a trail (or the memex does this for you automatically, using some fancy ML method).
Now, imagine you encounter a similar or identical issue 6 months down the line. You remember that 6 months ago you searched for a solution, and that you found one. Maybe you forgot to bookmark it. Not a problem, you find the original trail using some simple heuristics (how long ago it was, the keywords from the error message), and then you follow it to the end. Solution found, reasonably quickly.
Yes, and until we have the fancy ML method, one way to do it is to have something like a Google Doc where you document your search by copypasting from the source documents and linking to them.
That was my point with the original article. I'm with you that it would be fantastic to have memex that's described in 'As we may think' (only better), but until we do, I'm doing it the hard/manual way.
I was surprised to recently find out that Google Chrome in fact already has that fancy ML method for chunking history into trails.
Nothing wrong with doing this manually, but I do think that having the Memex do some of the thinking re: how the data is organise would save a lot of time.
> On the other hand, memex is something that will give you an edge
Umm, no. That is the problem really. An "edge" is something defined in societal context. Rightly or wrongly society is largely unaware of memex and its gifts.
Memex, hypertext, wikis, semantic web etc. These are just early visions of plausible ways we can capture, organise and share knowledge at scale, using digital technologies. Alas it is a catalog of paths that the digitally interconnected society did not opt for.
Why this (on the face of it) irrational behavior? The very short answer is that "an edge" is only loosely coupled to superior knowledge management. In the vast majority it is obtained by proximity to existing power structures. The proverbial rolodex rather than the memex.
> The very short answer is that "an edge" is only loosely coupled to superior knowledge management. In the vast majority it is obtained by proximity to existing power structures
I somewhat disagree. A memex is effectively the plain old notebook/wiki, with the potential capability raised to the second or third power.
Does a personal diary, calendar, home budget, or a small library help you get into places of power? For average people, no. But it does make organising life, and figuring out where you are, and charting how to get to where you want to be, much easier than it would be for your peers without such tools.
So yes, it gives you an edge. Whether that edge has anything to do with power is a completely orthogonal concern.
I would readily admit that my analysis of memex's (non-)edge is a bit of a quick extrapolation, weaving together some broader themes as potential explanatory factors.
There is an empirical fact though, that begs for some explanation: These tools are not used as much as one would think is commensurate to their (potential) added value. There is no virtuous cycle that would fund more development, create sharper "edges" etc.
The connection with "power" might appear remote but consider e.g. why powerpoint (no pun) - a manifestly inferior piece of software - became so succesful.
I think it should be possible to build a Chrome/Firefox extensions that adds annotation and editing to pages (storing a superset copy of the page somewhere), adding links/trails to pages, and sharing these deltas wit others, etc.
To be honest, I find JIRA + Confluence(+ lets say Gitlab) a strong memex system. You can link everything very easily, the software is happy to create the links for you as well. Adding Gliphy as well ( for charts and other usages ).
The openness of this approach will allow for great recall potential with GPT models when we eventually get them running locally. Dave Winer has already been experimenting with this (admittedly not locally) based on his large backlog of blog posts and has found it effective[1].
This approach where even the basics are broken - links require full manual management (what happens when you rename some target file) - is another 50 years away from Memex
Or you use something like Obsidian, Logseq, Notion and so on that have already abstracted this problem away. You rename a file or move it to some subfolder and every single reference to it gets auto-updated without you having to do anything.
I'd argue that this sort of interlinking flexibility (alongside easy-to-use plugins) is what separates the current era of note-taking apps over alternatives that people seem to (ab)use: a folder of Markdown notes edited via VSCode, a Google Docs folder, or even a static site generator.
After wasting much time on fixing the bookeeping you'd get to the next basic issue - content search across devices. Then to tracking changes done by various people. Then... backtracking a couple of steps and dropping the whole thing
| the reason those shiny new apps don’t stick is interoperability.
Shameless reference to Whatboard.app where we are working hard to address this issue. Admittedly, we are over-focused on sales, and less intra-document linked. Much work ahead for us, and several enhancements in the pipeline.
www.whatboard.app if you want check us out. Reach out if you have interest in getting involved with us.
When I think about memex, I think about being able to share my “trails.” On this matter, I’m often irritated by how little value I get for my carefully developed web browsing history. Why is it so hard to extract personal value from our histories?
It's not limited to your web history. File paths are URIs as well, and so is a path to a specific page in a PDF or to a tab in a spreadsheet. Nothing prevents you from composing both local and web URIs and reopening them automatically at any time. The value you extract depends on the quality and reliability of the content you picked to bookmark.
David Gelernter's tired old plagerized Lifestream and Mirror Worlds ideas (which he illegitimately patented) were hardly original or unique (which even Leonardo da Vinci and Bob Graham invented long before he did), and now he (and his son Daniel Gelernter) have degenerated into yet another couple of dime-a-dozen foaming at the mouth batshit crazy alt-right extremist Trump boot licking big lie spreading climate change denying misogynistic racist birtherism promoting sociopaths, who are religiously and sexistly against women in the workforce, and angrily think working mothers harm their children and should stay at home.
>He is a former national fellow at the American Enterprise Institute and senior fellow in Jewish thought at the Shalem Center. In 2003, he became a member of the National Council on the Arts.[21] Time magazine profiled Gelernter in 2016, describing him as a "stubbornly independent thinker. A conservative among mostly liberal Ivy League professors, a religious believer among the often disbelieving ranks of computer scientists..."[22]
>Endorsing Donald Trump for president, in October 2016, Gelernter wrote an op-ed in The Wall Street Journal calling Hillary Clinton "as phony as a three-dollar bill", and saying that Barack Obama "has governed like a third-rate tyrant".[23][24] In his capacity as a member of the Trump transition team, Peter Thiel nominated Gelernter for the Science Advisor to the President position; Gelernter did meet with Trump in January 2017 but did not get the job.[25]
>In 2018, he said that the idea that Trump is a racist "is absurd."[26] In October 2020 he joined in signing a letter stating: "Given his astonishing success in his first term, we believe that Donald Trump is the candidate most likely to foster the promise and prosperity of America."[27]
>Gelernter has spoken out against women in the workforce, saying working mothers were harming their children and should stay at home.[13] Gelernter has also argued for the U.S. voting age to be raised, on the basis that 18-year-olds are not sufficiently mature.[28]
Politics of software authors are usually irrelevant. I disagree with Eric S. Raymond on a bunch of things, but that doesn't mean his software should be boycotted by association. I draw the line at the author being a literal murderer, which is why I don't use ReiserFS.
Annotation - The ability to mark up hypertext... yeah... HTML claims to do it, but you can't change a document once it's read only. You should be able to layer annotations on top of a document. HTML simply is not a Language for the Markup of Hypertext Documents.
[Edit/Extend] - Annotation is anything you could do with a printout, at minimum. Circle something, highlight a section, black something out, attach links, etc. It's not just tagging a file as a black box.
Trails - the ability to store a trail of annotations, linking up parts of two documents together, or more. Like a stack of bookmarks, except bookmarks can't anchor to part of a document.
Publication - One of the crucial features of the Memex was the ability to output a collection to allow sharing with others. It could spit out a copy of everything linked to a trail, all the files, all in one dump. Copyright holders would have a huge problem with Memex if it were implemented as originally shown.
I think it should be possible to build a proxy that records ALL local http(s) traffic, and thus record all the sources to documents you've seen... and then only keep those that you've tagged or linked to after a few months, purging the rest. A private Memex, but it couldn't even be an open source project, as copyright holders would work to squash it.