
Show HN: Promnesia – an attempt to fix broken web history - karlicoss
https://beepb00p.xyz/promnesia.html
======
rosstex
As a PhD student, your post reads like a beautiful research paper. Motivation,
prior work, contributions, technical details, example use cases, self-
references, future work, even a system design chart. You've certainly sold me
on the extension, great work!

------
newman314
This sounds like it might be close to meeting my use case.

I have bad memory and hence try to write down everything I can. But often
throughout a single day/week, I do research on a topic and have a bunch of
tabs open that I intend to come back to. Or I read an article that several
days later that I cannot recall where I read it at (HN, Twitter, etc.) This
usually leads to a frantic search until I can find what I’m looking for as
well as having a ton of tabs open.

Manually grouping topics together is too hard. What would be great is a tool
that knows where I’ve been, discards bad information (google search result,
followed by near immediate close) and some sort of an attempt at topic
autoclassficiation (SAP, storage, backup etc.) that gives me the confidence to
close tabs knowing that I can get back to a particular topic at a later date.

~~~
soulofmischief
Bruh I've got tabs open from years ago. Hundreds across multiple VMs. I have
tabs open that I migrated from my last computer. Someone please help.

~~~
karlicoss
I've struggled for a while with this kind of overload and ended up with a
system that makes it manageable:

1\. make is as easy to 'bookmark' stuff as possible -- with a single hotkey

2\. make it as easy to search over bookmarks as possible -- also ideally with
a single hotkey or as quick as you could do google search

My way of achieving this is using org-mode files for 'bookmarks' [0] and using
emacs/ripgrep to search over it [1]. Additional benefit of org-mode is that
it's very easy to add notes, priorities, refile bookmarks, so the most
interesting stuff propagates through my notes, and I don't feel bad about
missing out on information that I don't have time to process because I can
always quickly find it when I need.

[0]
[https://github.com/karlicoss/grasp#readme](https://github.com/karlicoss/grasp#readme)

[1] [https://beepb00p.xyz/pkm-
search.html#personal_information](https://beepb00p.xyz/pkm-
search.html#personal_information)

~~~
riverlong
I'm curious -- Roam Research appears to have won over a lot of folks with this
kind of need recently. You didn't list Roam among the prior art -- do you
think it's not really relevant? I can certainly see Roam eventually including
cross-platform bookmarking/archiving, etc.

By the way, I think you should also take a look at Archive Box, which is very
much in this direction: [https://archivebox.io/](https://archivebox.io/)

~~~
karlicoss
You mean, I didn't include it as prior art for Promnesia or Grasp project?

For Promnesia, the goals of Roam and Promnesia are pretty different at the
moment (although Roam data can be used with Promnesia, as I mentioned). In
addition, I can't personally bet on a closed source tool.

For Grasp, simply because Roam wasn't known (or possibly didn't even exist!)
when I wrote it. But even now that I tried Roam now, I don't think I can go
back from using plaintext files, it's just so much snappier and more hackable.

Thanks, I used archivebox! Still need to set up a proper automatic archiving,
and integration Promnesia with personal web archives is also in my plans!

------
kemonocode
This sounds like something that would be close to meeting my needs. I, too,
end up leaving far too many tabs open and I feel the need to have something in
between a bookmark I'll never look at again and may have little context as to
why I may have created it to begin with, and a tab just eternally polluting my
browser and that might just end up getting sent to OneTab and thus as a
"lesser" bookmark. I know Firefox (and probably Chrome as well) lets you leave
tags on bookmarks, but these always seem like they're hardly enough. And
that's without even mentioning all the different pseudo-bookmarks scattered
over many different services!

------
scrollaway
Hey karlicoss, I'm in love with your writeup.

You should know about Timeliner; Matt Holt's attempt at solving some of the
data sourcing / data silo problems.
[https://github.com/mholt/timeliner](https://github.com/mholt/timeliner)

He also points out Perkeep: [https://perkeep.org/](https://perkeep.org/)

Anyway. Data liberation is a huge driver for me. Making it a primary goal of
my next app (primarily for bookkeeping/financial data, but I want to allow
users to connect to the third party services I integrate with, eg. Uber,
Amazon, etc, and be able to download their own data / play with it via an
API).

Feel free to email me / tweet me (@Adys) if you ever want to chat.

~~~
karlicoss
Thanks! :)

Oh, nice, I bookmarked Timeliner recently, but haven't tried yet. Look
promising, I expect this to integrate well in Promnesia, and vice versa, my
helper HPI package to integrate easily in Timeliner.

For Perkeep -- I tried it briefly, but haven't exactly understood the problem
they are solving. I was planning to try again and writeup my experience with
it to spark a discussion!

------
CWuestefeld
I've been frustrated by many of the same things, and have recently been
playing with Memex.

The solution outlined here leaves me with a couple of questions, though.

1\. Since there's a local app acting as a service, it's not clear to me how
this would run on a mobile device.

2\. Once it is running on my mobile device (and home computer, and work
computer, and chromebook, and various other machines I use), how do I
aggregate all of the data? I'd like to be doing work-related research at home
in the evening, and be able to see the fruits of it from the office.

I suspect that the answer to this is the same thing: that rather than a
locally-running server, I could put something on my home server or on a cloud-
based server, and direct all my various devices to communicate with that
rather than localhost?

~~~
karlicoss
Someone was asking that before, perhaps I should add to FAQ!
[https://github.com/karlicoss/promnesia/issues/114](https://github.com/karlicoss/promnesia/issues/114)

Yep, you could use a VPS or something and host it behind a reverse proxy,
that's what I've been doing so far.

Also for mobile specifically, on Android it works under Termux (haven't
personally tried yet, but can't see why not, and the person in the issue I
linked claims it works).

For data aggregation: it depends on the data source, but the easiest seems to
make sure your data ends on a single computer, index it there, and after than
you get an sqlite database which you can simply sync with Dropbox/Syncthing or
anything else you prefer.

------
indentit
Nice description of what you're trying to solve - it certainly resonates with
me so I plan to try it out!

I've recently started trying Shiori[1] to manage my "bookmarks" and preserving
offline copies locally without relying on The Internet Archive, however it
still doesn't really help with private content (i.e. Pages only accessible as
an authenticated and authorized user) so it'd be great if Promnesia caters for
that. Plus the whole data silo thing...

I was a little surprised to see no mention of the "tree style tabs" extension
which can help with "where did I get to this link from?" style questions

[1]: [https://github.com/go-shiori/shiori](https://github.com/go-
shiori/shiori)

------
idm
You've convinced me to try it out.

My personal knowledge management project, Gthnk (gthnk.com), would appear to
plug in easily as a Source - without any special plugin necessary. I really
like what you've made!

------
spurgu
This might be more suitable as a Github issue but since you're here, I'm
simply getting an error using Brave: "ERROR: Failed to fetch" (shown in the
extension popup when clicking the eye, which is always red)

Another thing: Have you considered adding annotation capability directly into
the extension? This is something I've thought about creating an extension for,
since I don't use anything like Instapaper.

~~~
rosstex
You have to run the local Python server by following the next instructions.

~~~
karlicoss
Yep! I guess I should make the error more clear in the extension and point to
the readme.

In theory, I could make it defensive too and allow using without the local
backend (only with local browser history), but not sure if there is much value
in this.

~~~
rosstex
I think the aspect of knowing where you browsed to a page from, and
visualizing a hierarchy of pages within a site that I've visited, are the most
interesting parts for me, and those certainly apply to the browser history
alone.

~~~
karlicoss
Fair enough! Created an issue
[https://github.com/karlicoss/promnesia/issues/120](https://github.com/karlicoss/promnesia/issues/120)

~~~
spurgu
Cheers guys!

------
ybbond
I am following this post too. I meant, from the first time you published this.
I am using Worldbrain's Memex 2 and when I see this post reposted, I check the
"Memex 2" section.

There is update! Maybe I will look into Promnesia and StorexHub integration
next weekend. Thank you for your effort with Promnesia!

------
contravariant
Regarding the cleaning of URLs, are you aware of the ClearURLs [1] extension?
It seems to achieve much of what you're trying to do.

[1]:
[https://gitlab.com/KevinRoebert/ClearUrls](https://gitlab.com/KevinRoebert/ClearUrls)

~~~
karlicoss
Oh nice, didn't know of it, thanks! Indeed, looks like there is a lot of
opportunity to collaborate with privacy enhancement extensions.

------
m0zg
Also, since the various web archives are getting shut down soon, it'd be great
if such extensions could locally and securely preserve pages much like an
archival crawler does it, or better yet create a distributed archive that's
impossible to shut down or censor. Better yet still if there's local,
language-aware index over such pages so that I could search them easily,
without Google deciding what I should and should not see.

------
m-localhost
Great write up for a problem I'm thinking about myself a lot ([https://marcus-
obst.de/wiki/Notetaking](https://marcus-obst.de/wiki/Notetaking))

Thanks also for using the Yak Shaving - for one, I got curious what was first,
the term or the Ren & Stimpy episode illustrating the term and second, I found
a description of most of my modus operandi.

------
j88439h84
Have you thought about using SingleFile/SingleFileZ [1] to download archives
of the pages instead of using links to wayback?

[1]
[https://chrome.google.com/webstore/detail/singlefilez/offkdf...](https://chrome.google.com/webstore/detail/singlefilez/offkdfbbigofcgdokjemgjpdockaafjg?hl=en)

------
hansvm
Haha, I love reading other people's code :)

    
    
      # TODO fuck. why doesn't that work???
    

Seriously though, this project looks great. I've been tossing around building
something similar for awhile, and frankly I'm glad somebody else did it first
(and from the looks of it, probably better)

------
infogulch
The motivations and analysis of current problems resonates with me deeply,
thank you for the writeup!

Perkeep is another project that might be interesting to analyze in this
context. [https://perkeep.org/](https://perkeep.org/)

------
StavrosK
I've been thinking about this problem a lot myself too, and I'm currently
rewriting www.historio.us to attack the problem more efficiently. I've been
considering various new features, and this writeup is very useful, thank you.

------
dpacmittal
This is awesome! I've definitely wanted this for as long as you have. I have
this idea noted down exactly as you have described somewhere in evernote. Well
done! Looking forward to contributing to it.

------
an4rchy
This is awesome! I just started using the WorldBrain Memex and was trying to
solve the issue of accessing other data sources, so perfect timing -- thanks!

Looking forward to trying it out.

------
mirimir
It does seem very useful.

And I'm disappointed :( Given the title, I was hoping for a way to fix
Google's broken web history. So it goes.

------
zingermc
Does promnesia server run a local HTTP server? How do you prevent a website
from slurping up the entire database?

~~~
karlicoss
Yep, it's a local HTTP server by default. It's also possible to expose it via
reverse proxy, and you can set basic auth password in the extension's
settings.

What do you mean by slurping here? Security-wise, a random website shouldn't
be able to query a localhost because of CORS policies.

~~~
zingermc
Unfortunately, CORS isn't a magic bullet. Suppose a site named evil.example
adds a script tag pointing to
[http://localhost:1234/promnesia.js](http://localhost:1234/promnesia.js) and a
victim loads evil.example. If your JS updates a DOM element with info from the
database, evil.example's JS can read that DOM element and report it back to
the server, without violating CORS.

~~~
karlicoss
Ah I see, thanks! Good point, and I guess basic auth would protect against
such sort of attack. So it seems it makes sense to use a token even if it's
running as localhost, I could add an option, so it doesn't require setting up
a separate proxy.

Either way, I hope I've been fairly reasonable about security so far, but I've
mostly been concentrating on the 'plugging in the data' bit, so it's possible
I've overlooked something (also I'm not a security specialist!). There is an
open issue in case people have any specific concerns or spot something, happy
to receive feedback!
[https://github.com/karlicoss/promnesia/issues/14](https://github.com/karlicoss/promnesia/issues/14)

~~~
pvg
I think it's becoming clear that the whole 'local web server to do system
things for a browser extension' approach is probably too fraught and should be
abandoned for better IPC mechanism that browsers support. I don't think this
is some 'drop everything and rewrite stuff' thing but it's worth reading up on
and planning for.

~~~
karlicoss
Yeah, possibly. Chrome actually has something called "native messaging"
[https://developer.chrome.com/apps/nativeMessaging](https://developer.chrome.com/apps/nativeMessaging)
which seems like a potentially more secure (and faster?) alternative, but I
haven't had time to play with it yet.

~~~
pvg
Yep, that's one of the things I had in mind when mumbling about 'better IPC'.
Safari already only supports that type of model. I think the day is not far
when automated scans/app stores/etc start flagging the local http server thing
as high risk/potential malware vector. It's an architectural dead-end.

On the other hand, some of the other stuff may not be fully baked:

[https://news.ycombinator.com/item?id=23173724](https://news.ycombinator.com/item?id=23173724)

------
pkamb
How about just an option for new tabs to retain a “Back” history to their
parent.

~~~
karlicoss
Chrome actually keeps it in the database. However it only works within a
single browser and breaks as soon as you're leaving for a native app, note in
your todo list, etc. So I feel like correlating timestamps is the way to go
here, simple enough and agnostic of specific implementations.

~~~
pkamb
I'm talking mostly about normal tabs in normal desktop browsers. When you open
a new tab, it should keep the "Back" history of its parent.

Safari for iOS, too, has a feature where you can temporarily go "Back" from
new tabs. But like you said it breaks if you do anything else. Kind of a hack
for mobile convenience rather than a true feature.

------
owenshen24
Justifications are very well-reasoned; a good read in and of itself.

------
mongojunction
Well done. The write up really being together some concepts and creates some
clarity on things I've been feeling about for a while.

Is author aware of my history based fully interactive offline archiver?
[https://github.com/dosyago/22120](https://github.com/dosyago/22120)

~~~
karlicoss
Author here, thanks!

Haven't seen your tool in particular, thanks for the link, I'll check it out.
I only used
[https://github.com/pirate/ArchiveBox](https://github.com/pirate/ArchiveBox)
before, but haven't set up an automatic archival pipeline (yet)!

Also, integrating with local web archives is on my Promnesia todolist! I
expect them to be very useful for indirect history retrieval, e.g. "I haven't
visited that page, but it's within one link". Having local web archives makes
it possible to implement such functionality in efficient way.

~~~
mongojunction
You have a really interesting way of thinking about all this stuff and have
synthesized alot of different ideas, that I believe point to a future for the
web. very cool to come across your work. Do you have a blog?

~~~
karlicoss
My blog is literally the link I posted ;)

Perhaps this page [https://beepb00p.xyz/blog-
graph.html](https://beepb00p.xyz/blog-graph.html) would be a good start if you
want to explore

