Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Archivy – Self-hosted knowledge base embedded into your filesystem (github.com/uzay-g)
219 points by etherio 11 months ago | hide | past | favorite | 58 comments

Worth noting that this is written by a 15 year old.

This looks interesting, nice integration with Elasticsearch as well. However, I see and have tried tools like this several times in the past (tiddlywiki, org-brain, etc.) and haven't been able to stay on it, always reverting to paper, or, more recently, reMarkable tablet notebooks. Is it just me or it requires quite a bit of motivation and dedication to stick to it?

It's funny because I used to be more obsessive about storing my knowledge when I was younger. I kept track of tidbits of information using various systems (emacs wiki mode or OneNote), but as with you, I always reverted to paper, not so I could "look up" information again, but as a way of sketching out ideas. I might refer to the notes a few weeks later, but beyond that, likely not.

Now that I'm in my 40s, I don't care as much about recording information I encounter (other than for work). If it's worth knowing, I can look it up again. If not, ignorance is bliss.

> Now that I'm in my 40s, I don't care as much about recording information I encounter (other than for work). If it's worth knowing, I can look it up again. If not, ignorance is bliss.

I tend to keep a notes file precisely because it's such a pain to look up some things. If I want to know the exact set of qemu options that I want, for instance, I would have to piece things together from the man page, and/or try things from a half-dozen different blogs, most of which will be incompatible with each other or the version of qemu I'm using (in fairness, qemu networking options have changed a lot over time as they improve things; this might be an extreme example). Or, I could open my notes file, type /qemu<cr>, and grab one of a handful of commands that I already have composed and be done.

Same here, 40 and not very information junky anymore. But it probably has to do with the fact that anything is nowadays searchable online and also that there is so much information (even here on HN) that if you save it even selectively it is a sea of information. I use the "one tab" plugin and rather than closing the tab I save it to the one tab archive. Most of the time I never look at it again

I noticed this was the case when I used tools that made it hard to search through my archives. But once I switched to tools with better full-text search, I actually find myself using my archive all the time. It turns out just knowing that I have it and that its easy to search lets me rely on it a lot more when doing research on things I've read about before.

It could be and you're probably right, I just personally lost interest in preserving too much information. Of course once in a while I come across a piece of info that I do save but whatever I slap into some text files is enough to me, most of the time I don't need to look it up anyway or I forget it even exists. And regularly I purge it and that feels good. Maybe I've reached my limits... But I do understand the need if one's line of work requires to or there's a need to for research, these tools can be very useful.

Same, went through a wiki and onenote stage. Didn't last. It piles up, and becomes a chore.

Paper outlasts everything. Easy to organize spatially, sketch things, recompile notes, put many pages visible at once, decorate the walls... Unimportant papers end up on archive stacks with little effort.

Recently started using a synced folder with plain Word documents, though, alongside the paper. For writing the more elaborate drafts, and mobile note taking.

For me Word documents are the anti-thesis of plain ;-)

I went through approximately same stages, and eventually found that the knowledge base is mostly useless. What it matters turned out to be my own words: my personal notes on the subjects, my worklog, my emotions etc. These bits are something you can't easily reconstruct without a right context, and keeping the context is getting harder and harder as you are getting older.

I think the key is to make your knowledge base accessible from anywhere.

I have my tiddlywiki hosted on a server that I can reach from any computer ( work, home, phone, etc. )

So the only barrier to use it, is opening a new tab and typing some notes. The bigger the barrier between brain thought and writing something down, the lower the likelihood of usage.

To be fair, sometimes I'm even too lazy to open a new tab and add an entry in my wiki. I haven't solved this problem yet.

I wish I had again the intellectual confidence I had when I was 15...

Yeah. The idea of paper notebooks that could use ocr to export the paper notebook to a digital notebook would be really cool because digital is more flexible but paper feels more natural.

There's a cheap product called whitelines. Which might be interesting to you.

I don't understand the saleing point of whitelines. Their app just takes a picture and converts it to a PDF which is the same thing Adobe Scan, cam2scan, and probably a million other apps do except those apps don't require you buy special paper.

Like a Livescribe pen?

How do you like the reMarkable tablet? What does your workflow with it look like?

I enjoy it greatly! Since using it in the spring semester of university I have not used a paper notebook, and it's definitely led me to read more books and papers. There's also a great hacking element to it[0], since you get root access out of the box.

[0] https://github.com/reHackable/awesome-reMarkable

That's really cool. I had no idea the reMarkable was that kind of product.

Shows that user contribution to a platform can be part of its success.

It sure can, but they're not super supportive. https://remarkable.engineering/ is very terse and the linked GitHub org is not very active.

Do you know if there is a way to treat the remarkable as almost a paper screen connected to my computer? I'd like to be able to seamlessly "print" a PDF to it.

There's a CUPS script here that lets you print to their cloud service, which auto-syncs to the tablet: https://github.com/reHackable/awesome-reMarkable#cloud-tools - among other options in that list.

I like their pricing : 399 USD == 399 GBP == 399 EUR

You like that? Personally I prefer 399 USD == 304.50 GBP

It's irony. I also "like" how their screen on photo is whiter than it actually is.

Judging the marketing here, not the product. Never used it.

Ooh, I have been eyeing reMarkable. Do you have any comments on your experience?

After trying dozens of tools, I settled on MyBase for work related notes.

A much more complete version of this is tiddlywiki:


I saw this, and it seemed interesting but I had a few problems with it:

A) doesn't embrace the same idea of building your own digital garden that holds nt only your notes but also automatically syncs with different services like hn, pocket etc to save your digital presence locally.

B) Flexibility of search. Archivy uses elastic search with a neat nlp pipeline to process data and allow it to be searched with accuracy. This is all configured in elastic-search.json and the user can configure it to his needs as he pleases.

C) Ease of use and minimal interface. Archivy has a simple direct UI and its goal is NOT to become a note taking app nor does it pretend to.

You can directly search at the top and then you have a tree view of your data organised in folders.

> A) doesn't embrace the same idea of building your own digital garden that holds nt only your notes but also automatically syncs with different services like hn, pocket etc to save your digital presence locally.

How does something sync with HN? Running tiddlywiki as a service means it's always "in sync". It is easily extensible with browser addons if you want. It most certainly embraces the idea of building your own garden. People have done incredible things with tiddlywiki.

>> B) Flexibility of search. Archivy uses elastic search with a neat nlp pipeline to process data and allow it to be searched with accuracy. This is all configured in elastic-search.json and the user can configure it to his needs as he pleases.

Elastic search seems like such an overkill for a personal wiki/note taking app. Even for incredibly large wikis. With proper tagging ( and even without ) you will be surprised how fast tiddlywiki search is. And also, for a personal wiki, I would want the least amount of dependencies. I really don't want to have to set up and keep up to date elastic search.

>> C) Ease of use and minimal interface. Archivy has a simple direct UI and its goal is NOT to become a note taking app nor does it pretend to.

"Archivy is a self-hosted knowledge repository". That's the same goal as tiddlywiki. They both have a simple and minimal UI, although I agree Archivy looks more minimal. But I think that's because Archivy offers much less features.

I'd be curious to hear more, your goals seem to be very similar to mine with ArchiveBox. I'd love to chat with you and learn more about your project/approach to archiving!

I'd love to! Where can we chat?

Not parent but here's his Github:


I accomplish this with Emacs + Org-roam. And it's a 'forever' file system, all text. https://www.orgroam.com/

Nice idea, it would be interesting if you could integrate the wayback machine so it archives the webpage you add. That way even if the original page gets deleted, a snapshot will be permanently archived.

a) it does better

> If you add bookmarks, their webpages contents' will be saved to ensure that you will always have access to it

b) It's not inline with the idea of archiving things you care about. If you don't control it you can't rely on it. You definitely can't rely on wayback machine to always exist. Someone's got to keep paying for those servers and it's not a huge profit center, and there have been questions regarding its survival before because of lack of money.

> a) it does better

that's not very interesting because when I find an interesting read I rarely bookmark it. For me bookmarks are links that I visit often.

> b)

that is a fair point, however I think the wayback way is still not too shabby an idea. It ensures there's a copy snapshotted just in case, and yes it might not last forever but it's done a fantastic job so far so not trusting it now just because it'll not survive sounds a minor risk imho.

it's not "browser" bookmark. It's "add a bookmark" into the app

see https://github.com/Uzay-G/archivy/blob/master/main/templates...

it's just a form where you paste the URL of the site you're interested.

so? I don't understand why this is a meaningful statement.

Lots of us use Pinboard.in or similar "bookmark" services. They aren't "Browser bookmarks" they're just forms in some separate app too. We find them more useful than browser bookmarks. The pasting can easily be worked around with a simple bookmarklet.

I'm not sure what point you're trying to make.

clashmeifyoucan was saying "For me bookmarks are links that I visit often" which I interpreted as "browser bookmark".

When I read the description of Archivy, I had the same impression (it somehow captures the action of bookmarking).

So I looked at the code to see how it was done, and in fact, as you said, it's more like a local pinboard/similar "bookmark" services.

No real point to make, just clarify what 'bookmark' it was.

While I greatly appreciate that it saves it locally, which I consider essential, but saving it to wayback machine serves a different purpose: In case of a dispute with another party, you can resolve the claim about the state of a particular webpage using the wayback machine.

if you're concerned about legal proof, then again, relying on a third party service with questionable financials is a terrible idea.

This is plaintext markdown files. You can easily integrate git with it which will provide a timestamp and cryptographic proof that the contents are unchanged.

I just use one of many article parsers and save things in Markdown files, providing me with a local snapshot.

Yeah I recommend this approach, it's what ArchiveBox does as well.

Cool! I really like the (upcoming) idea of watching your own online profiles on HN and such for upvotes and saving those pages. It’s essentially what I already use those upvotes for in my head, but not what they are in reality.

Yeah I'm really excited about that idea of building a digital garden based on the stuff you enjoy online AND your notes.

I wonder if there’s a reasonable way to extend this to structured quantified self data... Fitbit steps, calorie logs, chat logs, etc.

That stuff won’t fit the “markdown with a header” format too well, unfortunately. Would probably need to add a SQLite or DuckDB [0] storage engine.

[0] https://duckdb.org/

Pretty cool.

Instead of elastic search you could also use SQLite5 with its full text search support.

That + ripgrep is how we're planning to do full-text search in ArchiveBox.io, seems much more appealing than running a full ElasticSearch cluster.

A grisly death is more appealing than running a full Elasticsearch cluster :)

Can't wait to see it! Nice combo.

First, I love that this exists. I was playing around with exactly the same concept (text files w/structured front matter as a personal knowledge base) a few months ago, but didn’t get as far.

I think the Elasticsearch dependency is overkill, specifically because it precludes deployment on a corporately-managed computer where you can run Python but not Elasticsearch.

It would be great to have an option to use a pure python search tool like Woosh [0]. This would trade away some search power for significant portability gains. I might do a fork for this!

[0] http://whoosh.readthedocs.io/en/latest/intro.html

I'm actually thinking of making this an option here: https://github.com/Uzay-G/archivy/issues/13.

It'd be nice to have those two choices.

I was absolutely coding exactly the same pet project. This one is so much better than mine!

Kudos. Very good job.

I love to see the web-page archiving. Curious about how you see your tool versus Obsidian. Have you checked out Obsidian? Similarly uses markdown and front matter on the local filesystem.

can't speak for the author but:

* obsidian is a proprietary piece of software so you can't rely on it for long term archival. There's zero guarantee the company that makes it will exist next year and keep updating it to work with new operating systems.

* obsidian doesn't download page contents

* obsidian doesn't handle bookmarks in any notable way.

While they both use markdown, and can store notes the similarity pretty much ends there.

It does use flat files for storage, so there's not much lockin there.

And the flat files are IIRC markdown, so it works just fine for long term archival.

not true. If you buy into obsidian's features then you loose the visualized graph, a functional task list, all your [[internal links]] break etc...

sure, you've got your core data, but now you've got to recreate obsidian if you want it to work the way you've become accustomed to, and presumably like since you would have used it long enough for this to be a concern.

exporting your data is great, but it's only the beginning. For example: I can export my data from twitter too. It doesn't mean I still have a functional way to share short text thoughts, or the history of them, with their responses linked to other user's accounts or functional way to show all my tweets with a tag or or or ...

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact