Hacker News new | past | comments | ask | show | jobs | submit login
Managing my personal knowledge base (tkainrad.dev)
348 points by tkainrad on Jan 9, 2020 | hide | past | favorite | 186 comments

I use Microsoft's Onenote. You can literally do everything in the blog as well as audiovisual captures. All text is searchable.

This video covers some of the things you can do with it in an academic setting but relatable. https://m.youtube.com/watch?v=JQD5c8A_D2g

0:33 - Math equations

1:45 - Replay text

2:45 - Ink to text

3:20 - Research tools

4:39 - Immersive reader

6:01 - Web clipper browser extension

7:12 - Save emails to OneNote

NB: https://www.microsoft.com/en-us/microsoft-365/blog/2015/11/1...

I disagree completely. I wouldn't trust anything important to OneNote, and it's hard to get things out of it again, too. Your notes are basically locked in. You can sort of print to PDF, but it uses different pagination to the original PDFs you annotate! PDFs are inserted as bitmap images instead of vector format, and text is only sometimes searchable.

OneNote has a really scary number of obvious bugs, with pen input especially. Writing disappears immediately or jumps left and right, the document randomly stops updating the screen, the program gets stuck in a high-CPU loop making huge input lag and needs restarting, selecting writing doesn't select what you draw around but a smaller area inside that, etc. It's incredibly frustrating and difficult compared to the Android pen-based notes app I use, Squid (which has a much narrower focus, but is much better at it).

My biggest issue with onenote is that you can no longer use local notebooks with the new versions. I'm stuck on onenote 2016 or all my local notebooks become read only.

One drive is blocked at work so onenote is effectively useless to me now.

For now, I've started moving my notes to a static site using hugo. It's a bit less convenient, but until I settle on a better permanent solution at least I have text files I can move around and manipulated in bulk if needed.


it blows my mind that there is no good migration strategy for pc base OneNote databases to cloud based ones.

Yes, lock-in is a big problem we have with Onenote in my company, since Microsoft decided not to offer offline notes anymore. Lots of people have their notes locked-in that format and we cannot store that data on a third party server.

...it's hard to get things out of it again, too.

Export is really the weakest area of OneNote. As far as I can tell, there is no way of batch exporting that also exports embedded attachments (for example, if you embed a PDF or Word document into a page rather than "printing" to the page).

>Export is really the weakest area of OneNote

I'll give you two guesses as to why that is.

I would love to get a Surface and use OneNote -- especially because it has excellent support for hand-written notes (and I much prefer hand-writing to typing). However, I'm hit in the face with two problems:

1. Uncertainty about data export. I don't want to be stuck depending on a third-party proprietary solution for something so important to me. (That said, Microsoft is typically quite good about supporting products for a long time)

2. I actually find Windows unpleasant to use, and much prefer linux. (Arch linux + i3/KDE happens to be my preferred setup).

Re 1. I would really not recommend using OneNote. I recently closed an Office 365 Business account and tried to move my notes on similar topics, e.g. pros/cons and how-tos of using different tools, from a group OneNote page to my own and it was very difficult to do so.

The documentation isn't very good (it's unclear which version) and I ended up having to install the local version on my Mac and manually copy/past the pages I wanted because the copy notebook dialog didn't work with notebooks that have many pages. The worst part is not being able to access the raw data and simply copy that from one place to another.

I've switched to markdown files on Sublime with the Markdown Editing package for now, because they are easier to read and edit, portable and I can convert my notes to wiki pages, blog posts or documentation very easily. I'm still not 100% happy because of the different flavours and how pandoc converts files differently sometimes (especially lists).

At that point Emacs- org mode must be optimal for the use case

I'm not sure if learning a new system is worth it, given the simplicity of markdown files - do you have any good resources?

My goal for now is to have a simple way to record (the files) and manage (the folders) notes, collaborate (I currently use gitlab repositories) and potentially use them later, e.g. by copying relevant parts to make posts, documentation etc, or by writing a script that finds relationships based on the content (something I've been thinking about for links I save in Safari or Firefox too - a script that scrapes the pages and builds a tree to organize links automatically).

Maybe a tool that could navigate the files, extract headlines and content to figure out relationships out of the files would be ideal to help me compose new documents and posts.

Org mode is not very different from Markdown ( Most of the rules are extremely similar) and it can do much more than what you expect.


I am sure that there are things OneNote is even better at than Notion.

However, some of the most important concepts covered in my post are databases, relations between those databases, and editing workflows (markdown, slash commands). As far as I can tell, OneNote does not even attempt to do these things.

I’ve tried tons over the years and just keep coming back to OneNote. It’s great, very underrated.

I kept having sync problems with OneNote so I had to ditch it and started using evernote instead.

Why would you store all of this valuable information in a proprietary tool? What happens when notion shuts down, jacks up prices, makes feature changes you don't like, etc etc? Make some attempt to migrate all this data? It's gonna be hard to do this in an automated way, if they even have an (export) api for this. I'm using (vim)wiki/gollum and I feel much safer about my data.

Obviously, you have a point. I am indeed a little bit worried.

Some things that ease my mind:

Notion's export and backup features are quite good. You can export everything in various formats, markdown being the most relevant for me. This only becomes messy with complex databases that use formulas and are related to each other. You would not have these with local solutions in the first place.

Notion is already profitable and growing rapidly, I do not see them shutting down in the foreseeable future. However, the other concerns are relevant.

The API is supposed to be released soon. I intend to either build a backup workflow myself or use other tools that will get developed then.

Have you actually done an export of your Notion workspace?

I looked at their options before using Notion and was happy they had a full export available - but then I actually went ahead and used it because they are putting zero focus into improving the performance of the apps - and the export format is a horrific mess!

They split out code blocks from your notes in separate notes, the filenames are a mess and images aren't saved inline (as they could be with something like base64).

For workspaces of any reasonable size, you're going to be doing a lot of work to make the export actually useful outside the context of Notion - which should be obvious given how they treat things as blocks (and all the abstractions that come with that make it into their underlying data structures)

They've also been saying the "API is coming soon" for about a year now.

Never used notion, but personally I'd prefer a machine readable and raw export, at least you can assemble your stuff back in any way you prefer. More annoying when you have to scrape data you're interested in from htmls.

But yeah, lack of API is a major problem, and main reason I haven't even considered trying it yet.

I think Markdown does qualify as machine-readable. There are very good libraries to do all kinds of things with it.

Edit: I guess you meant human-readable as opposed to things like base64, and not as a critique against Markdown.

By now, there are several unofficial APIs at least, for read-access they are quite good apparently, did not yet use one myself though. E.g. https://github.com/kjk/notionapi

I have not yet done a complete export of my workspace. I do however often export specific databases, notes, etc.

My post describes how I use it to draft my blog posts, where I also use the Markdown export feature. The code blocks are fine and within the same file, so I am not sure what you mean there.

Images are not ideal, but I don't know how it could be done much better. Everything is exported into a .zip and the images are also there.

Admittedly, my long term hope for a reliable and automated way of backing up everything relies on the API. Their promise that it will be released soon is a little awkward by now, but I am sure that it will come eventually. I don't think its true that it is already a year since they first said it would be here soon.

Why save images inline? The src wouldn’t be human readable if it’s base 64 encoded.

Fully agree. I was turned off notion for the same reason and reverted back to self-hosting dokuwiki. A much more limited feature set, to be sure, but at least I'll still have my notes 10 years from now.

If there was an open source notion, where would you prefer storage to take place? BYOFS?

WebDAV sync would be great. There are plenty of options like NextCloud, which is self-hostable. Sync it to plain text files.

Does anyone else use plain text notes for everything?

I've tried to optimize things so that while working, I can get things jotted down with close to zero resistance. If the barrier of entry is too high then I don't bother doing it.

That means often being able to spawn a new terminal, write a command and be done with it. Or to be able to pipe something from my clipboard into an auto-dated file.

I ended up putting together this ~15 line Bash script which seems to do the trick: https://github.com/nickjj/notes

I've been using plain text notes since 2001, although I only recently created this script. So far so good.

I'm experimenting with a single 500K+ line-long flat text file for EVERYTHING, from dream journalling to work email drafts to blog posts etc. -- I've even copied math from my home whiteboard in latex -- screw rendering, I'll transcribe them back when I need them.

At one point I was trying to do a Luhmann's Zettelkasten with one note per txt file, internal linking, etc. I concatenated every piece of txt content I had. It's much simpler.

I have no system anymore. I try to create internal "hashtags" that are ctrl-F searchable. The goal is having something that's always open and at hand. I got plugins for autosaving when the window loses focus, autoloading when changed in disk and it syncs to Dropbox.

I cobbled together in Notepad++ (which has a GUI for this) a small syntax coloring definition that sort of matches my spontaneous habits from old salvaged text. I intend to let this evolve too. E.g. I'm using {braces} to have a little collapse button that hides pieces of text.

I came to the same conclusion: a massive flat text file in emacs. I was using orgmode for a bit, but it was too convoluted. I have a very simple markup to identify TODO items with [ ], and I nest contiguous items with { }, making it easy to navigate chunks with emacs. I then have 3 keyboard macros to mark/unmark and move the TODO items into a DONE section. It's simple enough I can recreate the system from scratch in a few minutes when I'm at a new terminal.

I use the same method for years: big single .txt file, with hashtags that are CTRL+F searchable, exactly the same!

Among all the organization/notetaking methods I've tried, this is really the best. No fancy app in the browser, but really effective and fast.

I've written a small article about it here:


> Cons: no sync with a mobile device

How do you access your notes on your phone? How do you edit?

What about using syncthing? There are decent text editors for mobile.

Tangentially related, but after years of trying out (and struggling with) hundreds of calendar and schedule apps, in the end what was best for me turned out to be a simple text file that I keep updating and look at each day. A line on it represents a day and begins with (dd/mm), and events for the day are plaintext separated by commas. Each day gone by gets erased. Absolutely zero friction and I can see an entire month of stuff at a glance so I'm always aware of what to do. Can't believe it took me so long to go back to such bare basics, and now I strive to reach this level of brutal basic frictionless simplicity in my own toy projects.

Brutal basic frictionless simplicity is a good way to put it haha.

That's basically what this script does except it uses separate files. You run `notes something really cool and important` and it creates a YYYY-MM-DD.txt file for you in a configured directory and appends the arguments to the file on a new line. So every day you get a new file.

Or, if you just run `notes` it opens your configured $EDITOR for that specific day so you can go free form.

This is how I use Sublime Text cause I dont have to actually save new files it saves them for me. It loads instantly so its perfect. I paste in code snippets and what have you.

I dont use ST3 for proper dev as much cause I use JetBrains IDEs more.

I still would prefer something better for my more permanent notes.

Yep, zero resistance: plaintext, with context, metadata, keywords, minimal formatting, grep and a few scripts. The most important thing to remember is that I don't remember everything, so write it down, and use sensible filenames/hierarchy/tags/metadata so it can be found easily.

The less temptation to play with CSS, formatting and markup the better. One day it'll all go in a self-contained minimal wiki. Maybe.

On my android phone I use an ancient app called "unote" (installed from an .apk which I keep, it's long since disappeared from Play). Simple, hierarchical plaintext notes, kept in a local sqlite file. Sadly no search, but that just means I have to be disciplined about what, where and pruning.

I use emacs org mode which stores every thing in a text file.The text file is fully human readable even without emacs. Its has few rules that makes the collapsing the parts of the text file. One of the main rule is any line that starts with asterik followed by space ("* ") is interpreted as heading and the whole heading can be collapsed.

Yes; I've been using Notational Velocity for years. It gives just enough automatic structure (it's almost like a box filled with index cards, though the size of the note is unlimited) without having to actually fiddle with files. It's effortless to create new entries and incredibly easy to find things again later.

I started with something very similar but it got a bit fat and it's now ~60 lines of Bash :)


I use orgmode and emacs for everything plain text!

I've got a hybrid system.

Academic papers, standards documents, white papers etc... get downloaded as PDF and renamed according to a (human-readable) standard scheme that I use (year, title, [author(s)], [paper-type]), and stored in a 'meaningful' filesystem hierarchy. Podcasts and some videos are also stored in the same hierarchy.

Documentation for the software libraries and APIs that I use is downloaded from readthedocs (where possible) and stored in a parallel system that takes account of versioning. (So I can concurrently store differently versioned copies of the documentation for a single library).

I have a simple python script that iterates through my directory hierarchy and produces a sqlite database and a couple of xlsx files with a row-per document (one spreadsheet for reporting document-management metadata and another that allows me to assign labels and write precis notes). The script also extracts the content of the PDFs as plain-text and feeds some NLP tools that I'm playing with.

I use the spreadsheets to keep notes on the files, and the act of manually renaming and sorting the PDFs into the 'right' place in the hierarchy helps me to understand what's in them and remember what I've got. (I'm constantly reorganising the hierarchy as my understanding develops and evolves. My python script keeps everything -- notes and documents and other metadata - in sync).

So far this has scaled OK to around 26,000 PDFs.

Sounds good. How do you handle cases where the same document fits multiple categories (ie belongs in multiple locations of your hierarchy at once)?

Ps. Another question: can you tell which nlp libs are best for this purpose in your experience (and how do you eventually search the generated index?)

I can have multiple copies of the same document in the system. My notes are associated with documents via md5 hash, so it'll link them with all copies that are present. At some point I'll get the script to automatically hardlink or symlink duplicates. To be honest though, trying to decide which location best fits a document is quite a good way to engage with the document content -- so even though decisions are often suboptimal, it isn't really the final hierarchy which is the product here, it's what goes on in my brain during the process of trying to organize the papers. As far as the NLP libs question is concerned -- so far this is just an excuse for me to play with SpaCy .... I'm still at quite an early stage and haven't made much progress (my background is more machine vision than NLP - which I haven't touched since I was an undergrad 20 years ago).

Very interesting, thanks for your answer. good point too abt using the rigidity as a forcing function for yourself.

I have quite a big system myself, using bibdesk as my interface to filesystem, and searchability would be very nice indeed. Atm i only use default system tools like spotlight (macos) or mdfind. More custom nlp solution, your post inspires me to think more harder abt that.

What software do you use to rename PDF files and extract their metadata automatically? I have found this difficult to accomplish, despite trying multiple different tools. PDFs from Arxiv are commonly a problem.

So, renaming is done manually (forces me to look at the PDF). I use preview on OSX, or any viewer other than Acrobat Reader on Windows and Linux (Acrobat Reader locks the file preventing renames). Several Python libraries exist for extracting metadata. I'm trying a couple of different approaches. At the moment I just take the title of the PDF and do a search on Google Scholar using the scholarly python library -- but this is really very suboptimal and I want to replace it with something faster and more robust.

I’d like to look into the features of notion more, but I can’t help but wonder why it isn’t more common to see people using a local SQLite database, or any other self hosted database, to serve as a personal knowledge base management system. A lot of the paid solutions I’ve seen seem to bend over backwards to offer a limited subset of the features that are trivially available in an actual database.

Has anyone tried out using a personal database like this?

This reminds me of SQLite's own Fossil-SCM product [0]. It's a source control manager with wiki and built-in web interface all stored within one SQLite file. They use it to manage the source of SQLite itself.

I played with it briefly a few years ago, and just tried it again now. It's definitely not geared towards note taking the way other apps in this thread are, but it could have value as a self-contained and portable wiki/kb. I'd be interested to hear of people's experience with this, either for its main purpose or even just as a note taking system.

0: https://www.fossil-scm.org/home/doc/trunk/www/index.wiki

Yes, but there are no good tool to do it. In my case i need to store a a lot of structured data and files. So i need a Document manager / Database. It's hard...

I started with excel a lot of year ago, migrated to access and now i'm trying nixoxdb for android (the only other acceptable alternative was mobidb).

The main problem is the ease to use, you need to create a table for everything you want to store, and nothing support an hybrid free-form AND structured data(while also been able to store files), so the solutions with good text editing are terrible at structured data and viceversa.

Other problem are the ability to sync or even open the database in more than a couple of platform and the ability to access the file embedded in the database from other application (there are others but this are the one that bother me the most).

I'm also evaluating notion, but i really don't want a cloud service since i store basically everything inside my main database and the ability to access years from now is a MUST.

Also i still not have a good solution for my email that are stored mostly in thunderbird and only some message are "exported" to my main database.

I'm thinking of building something myself based on sqlite, .net, dokan/fuse and something similar to syncthing. But it's a big project and i don't have a lot of time

edit: couple of fixes

I did. I created a simple html form to input/edit/enter queries, and I had an html index holding common database queries. It worked extremely well as far as that goes, and it only took 90 minutes or so to get it working, so really a trivial setup cost.

There were two things I didn't like about it. I realized that browsability is really important - an SQL database works well if you know what to query, but it's just easier to have an old Yahoo-style index. I went with a wiki approach instead. The other thing is that it's easy to use version control with a pile of text files, but not so much with a database.

These days I do have some things in a "database" but that is a big text file that I query using Linux tools like grep and sed. Does everything I need and plays well with version control.

During my first weeks of Notion usage, I was repeatedly baffled that I never thought more about personal usage of relational databases, as they make so much sense for organizing notes, and many other things.

Having Notion, and also other alternatives, this does not seem viable anymore to me. For example, when I want to edit the "Post" column of my bookmarks database, where I relate blog posts to bookmarks, I just start to type and Notion will suggest matching posts. I can not think of any SQLite frontend that offers such usability features, which is not surprising, as they are clearly not built for using them as personal knowledge base GUIs.

Having database is easy, the hard bit is having tooling to manipulate it

Twinkle Notes (currently in beta) is close to what you just described. It has a local encrypted sqlite3 database. You need a host (paid) to sync across devices, otherwise it's free to use on any device: https://twinkle.app Please reach to us at support@twinkle.app if you need a free token to try syncing.

It might be an unpopular opinion, but I use Notion and think that it's user interface is quite bad. I doubt they will go in the right direction for what I need. I moved to Notion from Evernote (spammy, terrible, no innovation for years) hoping for a better experience, but I am still unsatisfied. I will soon be looking for another solution.

I have a postgres instance running somewhere that has a history of stuff that I've recorded.

The reason why one might choose not to do this is that a) it takes a bit of work to get syncing across devices, and b) it takes a bit of work to get markdown, images, search, videos etc working.

Any chance you could share your process?

Devonthink[1] seems like a good personal database, but it's Apple only. I'm in a situation where I'm trying to coordinate a few academic projects with people from multiple institutions and I have yet to find a tool that supports our need. If Basecamp had better integration with Google Docs and Google Sheets, it might work. However, the need to have multiple people review and comment on MS/Google Docs leaves the team with project artifacts scattered all over everywhere. I'll have to take a look at Notion.


I use airtable now when I need to have structured data. I use tiddly for note taking and enhanced bookmark management.

For managing a personal knowledge base, I highly recommend Tiddly Wiki[1]

It's a self-contained Wiki/Notebook/Journal with tags, it works instantly, and all of it is in a single HTML file with magic JS in it. Or you can use it with a server.


I use a tool called zim wiki. Its less of a wiki and more of a notes app with a tree sidebar for navigation and linking between pages. Everything is stored as txt files which works awesome.

I would love an alternative that is free/and or possible to self-host.

However, I do not see how this compares to Notion. How do I edit this HTML file on my phone? How do I add bookmarks to it via a hotkey? What is the equivalent of Notion's databases? Does it have Slash commands? This list of questions could go on for much longer. I really don't want to sound like a Notion shill, especially since I also use other tools (e.g. GitLab) for my knowledge base, but many of the suggested alternatives lack basically all the features that I describe in my post.

I am not sure you understood how Tiddly Wiki works.

>How do I edit this HTML file on my phone?

Open https://tiddlywiki.com/, click the pencil icon on any of the entries. Edit away. The edits can persist with a self-hosted solution. Or you can just download the edited HTML file.

>Does it have Slash commands?

You probably mean "keyboard shortcuts". Yes, a plenty. Get started here:


This doesn't cover all of it; there are shortcuts in the editor (Ctrl+B, for example to make text bold), and they are fully customizeable. Everything can be customized.

>How do I add bookmarks to it via a hotkey?

TiddlyWiki is tag-based. You can add tags with a keyboard. Tags define all the structure (table of contents and search).

>What is the equivalent of Notion's databases?

I am not familiar enough with Notion to answer this. If you ask "can I do ____ with TW", I can tell you.

>but many of the suggested alternatives lack basically all the features that I describe in my post

Sure, because they are different products. Feature-by-feature comparison doesn't make sense; if you want Notion, you need Notion.

If you want to organize your knowledge, Tiddly Wiki is one very fine tool to do that.

TW is not a CRM, it is not a Calendar or Google Sheets alternative (as Notion claims to be). It is not an all-in-one tool.

What it is, it's a personal knowledge base tool - and that's what the title of your post says. At that, it can do better than Notion. Or worse. User's call.

>This list of questions could go on for much longer.

The same can be true going from the other side. Here is one:

How do you access your knowledge in Notion 10 years after Notion-the-company goes the way of the dodo?

With slash commands, I do not mean keyboard shortcuts. The lack of customized keyboard shortcuts is one of my issues with Notion, but their slash commands are great.

It basically means that you can access all kinds of features by just typing ahead after a '/' without knowing a precise shortcut. E.g. if I want my text to appear blue I would type /blue. I would also get there with /color or even by just starting to type either of those words and Notion will make suggestions. Slack, Confluence, and others do it in a similar way.

I think I do understand Tiddly Wiki works, as I have quite a bit of experience with Wiki systems, didn't use TW in particular though.

Some of the features you put aside as not needed for personal knowledge management are actually very nice for personal knowledge management. You might want to look a bit into my post to see what those databases can do ;)

> How do you access your knowledge in Notion 10 years after Notion-the-company goes the way of the dodo?

This is a very valid concern, that was also raised by some other commenters. In my opinion, Notion's exporting capabilities are quite good and I use them regularly. However, I do hope that we will get more automated and sophisticated third-party backup solutions when they have an official API.

How do you recommend running it? The install guide lists a half-dozen different possibilities including PHP, Node, Ruby which all look to be vastly different solutions.

You just download the HTML file and open it to 'run' it. Then, once you've made changes, use your browsers "save as" feature to save a copy. Everything is in browser.

Other options do things like public hosting.

Personally, I just use the HTML file with the Timimi Firefox plugin installed for autosaving the current file.

> use your browsers "save as" feature to save a copy

> Don't attempt to use the browser File/Save menu option to save changes (it doesn't work)

It looks like TiddlyWiki doesn't recommend that exact method anymore.

I use TiddlyDesktop and open with it the HTML file that I downloaded from the site. The desktop app takes care of saving the file when you finished editing a post and save it.

Sorry for the confusion. There's a button you press in Tiddly Wiki that generates a new HTML.

You "Save As.." that.

I would recommend just downloading the HTML file and using that.

To save changes, click the red checkmark, and "save as..." the updated HTML over the old copy.

Switch to other possibilities if you use it enough to justify that.

That's the save method I'm using since one year now. It perfectly fits my needs as a single user of this wiki. No infrastructure to set up, it works out of the box. As a former ikiwiki user I won't go back.

How well does Tiddlywiki scale? Would this single file support a large knowledge base accumulated over decades?

Jeremy Ruston the creator, says a 100MB TiddlyWiki works fine on a 2013 era laptop, and that browsers are some of the best optimized VMs available to work on (in the video I linked in another comment).

Most of knowledge is text, maybe a picture here and there.

I, a lifetime worth of knowledge in reasonable HTML of Tiddly Wiki would take less toll on your PC then the current CNN front page.

you can use tiddywiki with a node backend.


if you want to host on arm device, https://github.com/djmaze/tiddlywiki-docker/issues/15

I strongly support this recommendation. Tiddly Wiki has enough features and extensibility that I wasn’t tempted to write my own bibliography management system after discovering it. Yes, I know there is dedicated software for that but they didn’t fit how I want to structure my notes.

This, together with github plugin. You edit/create a post, press save, the whole website is committed back to github. In a single html file. Painless

Joe Armstrong of Erlang fame was a fan of Tiddly Wiki, and there's a video of the Tiddly Wiki creator and him enthusing about it here https://www.youtube.com/watch?v=Uv1UfLPK7_Q

I use tiddlywiki with noteself on a self-hosted server.

I've been doing the same for a couple of months now, and I love it. Can access it from various devices through the browser and it syncs well.

Big thumbs up from me so far.

The hardest part was setting up a CouchDB instance for the backend, but I found some instructions and fiddled with it until it worked.

Link: https://noteself.github.io/

how do you secure it, is it a long obfuscated URL?

My setup requires a username and password to be entered, which I think is the intended authentication method, since it uses a database at the backend.

I'm a huge fan of Org-mode and so far it's been the most persistent way to keep track of my knowledge base.

I have a single file called "engineering-notebook" and I store pretty much everything in it:

- bookmarks to resources

- personal notes

- personal tutorials/step-by-steps

- cheatsheets

It works fantastically for me

Agree I use org-mode to capture notes/TODOs.

For Links/Bookmarks I use Google bookmarks - https://www.google.com/bookmarks/ which surprisingly has not been killed and looks like might be left alone by Google. Other bookmarking services delicious.com & posterous etc all died in due time.

By using browser extensions and a 3rd party Mobile App. I am able sync/store all the links. Though I would love to be able to sync bookmarks to my org-mode notes. Looking at org-capture now to see if that could possibly work (https://github.com/alphapapa/org-protocol-capture-html)

Chrome Extension - https://play.google.com/store/apps/details?id=com.amlegate.g...

Firefox extension - https://addons.mozilla.org/ru/firefox/addon/fess-google-book...

Android App for Google Bookmarks (3rd Party) https://play.google.com/store/apps/details?id=com.amlegate.g...

It’s all I need, I have tens of thousands of lines of notes at this point, can search them (text, regexes, tags, whatever) essentially instantly, can export to html or slides or pdf or $format, will never go down or close down or disappear behind a paywall, it’s perfect

Accessible mostly everywhere, but really exploring, manipulating content across mobile devices and operating systems gets pretty frustrating. Love it in theory, but DynaList is just so simple to use anywhere without a second thought.

Dynalist hits the right spot for me to manage my current work (tasks, quick notes…). Simple but flexible, extra features not overwhelming you if you don't need them. But their mobile app isn't great.

I also love that since it's plain text, I can still access it anywhere I want to without having to use Emacs (despite the limited functionality without it).

Why do you use one file? The ability to insert hyperlinks is one of the nice features of org-mode.

a single file is fairly manageable even when it gets huge. I feel like that's a big advantage of org-mode.

I use a separate file for non-technical stuff.

I also love org-mode, but I have to admit that asciidoc and asciidoctor has been growing on me, especially for anything I want others to contribute to.

TODO alone is a game changer imho.

I was jumping from one note taking app to another before realizing I have to develop my own to be fully satisfied: https://github.com/zadam/trilium

This looks amazing. Any thoughts on what it would take to turn it into a minimally usable PWA so that I could carry things I consider important with me even in offline mode?

No PWA, but there's electron desktop build which can work fully online (and then sync).

Looks great, I really like the note map feature.

That looks like an awesome software, and the doc is top notch. I'll try it during the next few weeks.

This is seriously an amazing piece of software! Thanks for that.

For this I use Notable[0] with its data directory located in a GitLab repo. A combination of Markdown and Git with Notable's impressive UI. You can also use any cloud drive, Cryptomator vault, etc. for the storage backend.

Notable's categories (tags) for me include things like manpage snippets, code snippets, workspaces/scratchpads, thoughts, etc.

Of note, a lot of the things in my head that I need to jot down on my mobile, even at length, are usually stored in Simplenote[1] then transferred to Notable if they're important enough, via Simplenote's desktop app.

[0] https://notable.md [1] https://simplenote.com

What I would love is a full personal CMS type system, where I can have a subject like "cybersecurity" or "cooking" and I can add bookmarks, notes, pdfs, and they're all treated equally and as first class objects in the system.

I would love a system that treats all media the same as just a source of knowledge instead of being hung up on the source type i.e. is it a video, or a book mark or a pdf....

I have been using a text file (actually an org mode file) since 2009. Before that I used a plain text file.

It doesn't handle video/bookmark/pdf. If I have to save those, I put them in a separate directory and make a note of it in the text file.

It's pretty well organized, thanks to org mode. It has a section that is by date, like journal entries, and a section that's by topic. Since it's just a single file, it's very easily searchable. It will never "go down". I don't need to run a database (I tried using wiki software for the same purpose). I can even "link" different sections, by having plain text labels. For example, I can refer to "ZFS Setup 2009-01-01" which is another place in the document I can search for.

I can "jump" to sections by using the orgmode annotation. Searching for "* Computers" will go to that top level section.

I liked this system so much that I also use it at work. People are very impressed that I can find things in my "notebook" so quickly. It's just text search.

Websites disappear. Web apps disappear. Apps disappear. My text file, does not.

> It doesn't handle video/bookmark/pdf. If I have to save those, I put them in a separate directory and make a note of it in the text file...

I've used a similar system in the past (though not based on org mode), but was too brittle if/when "links" broke or were altered. For an over simplistic example, if the references that you made to a video within your master text file(s) breaks because the video's containing folder name has changed even slightly, well that becomes annoying. For a small number of references, types of destination media types, etc., maybe not an issue, but after some point, it can become too much maintenance/annoyance. Is this an issue that you encounter? If so, do you have mitigations for avoiding/resolving the broken links? Just curious.

But really working on the go with Android or iOS is painful, no?

As someone with a similar setup, I have a Google Doc named "Weekly Scratchpad" with a shortcut on my Android home screen. At the end of the week, I copy anything worth saving into my plain text file(s) and delete the rest.

Not exactly CMS, but I'm working on a browser extension along a similar line of thought. It unifies all bookmarks/notes/chat messages etc you have for a certail URL treating them as first-class: https://github.com/karlicoss/promnesia#demo

P.S. It's also relevant to bookmark management as well: you get to see not only notes on the bookmark, but also on the 'child' URLs, e.g. if you open someone's twitter profile you'd be able to see their tweets you've favorited. Or if you open some blog, you'd see posts from that blog you saved on Reddit.

If you didn't try Notion before, you should probably not judge it solely by my post. It is very versatile and you could probably use it in a way that comes close to what you describe.

There is not much support for PDFs or videos though.


I've been using Zim [1] as a knowledge database for around 2 years. I don't like depending on online solutions which may suddenly disappear.

Together with some plugins for managing tasks, git and some script for synchronizing repositories, it has been working great.

There are some limitations on this approach (ie, two persons editing the same page concurrently is a no-no), but for my use-case it works perfectly.

I find it much more intuitive than org-mode, and the fact that it auto-generates a "global" task list based on items spread across the whole notebook makes it much easier to prioritize things.

[1] https://zim-wiki.org/

I like Zim. For my Batcave I switched from Tiddlywiki to Tomboy to Zim to Notecase Pro and finally switched from that over to files and folders using the Geany editor and a sync service. So far this is the combination that seems to survive software, OS, and environment changes best. Even on Android it's not perfect, but it makes me happier than e.g. trying to use Zim on Android :-)

The bookmark workflow is an impressive demonstration that browsers have a lot of room for improvement in their bookmarking tools.

I'm hoping (and optimistic) that the distributed web brings with it some epiphanies about how to do better local knowledge management!

Firefox sync with tags and knowing how to use the FF search bar to filter tags comes close for me

I didn't know we could search in the bookmarks from the address bar directly. It is indeed a nice feature. Here is the documentation for reference https://support.mozilla.org/en-US/kb/address-bar-autocomplet...

More specifically in the section Changing results on the fly

My note app hopping days ended the moment I first encountered Joplin, which I now use for almost all my writing.

Killer feature being the encrypted sync via more or less any kind of storage back end (where presently I use Box via WebDav, but that's easily switched). Seemless work across all my devices, all data in nice and portable markdown.

It fits my needs so well that I have taken the extraordinary step of allowing the desktop version on my pc's, even though it is Electron based.


I have a single God ReadMe file in which I try to record all the facts that I should know and 'recipes' that I might need to follow again. Such as that shutting down windows does not dispose of kernel state whereas restart does. Or how to do a certain task on the income tax site or how to set up an odbc datasource in Windows. I've been keeping this file for years and I periodically read through it as much to remind myself of what's in it as anything else. It's saved me a lot of frustration when finding myself having to do things again that I know it took me ages to figure out the previous time but I have since forgotten.

I like the Plain Text Project's philosophy. https://plaintextproject.online/

A tool is great for collaboration, but how much do you really need for a personal KB? And how much effort do you want to spend migrating?

Then there's vendor lock-in, learning curves, etc.

A well-organized directory structure with plain text files gives you a data store that won't become obsolete and can be used on any OS. I use CLI tools to search (e.g., grep) and that's all the features I need right there.

I've mentioned this before, but I'd love to have a note taking app as convenient as EverNote, but that lets me insert code cells a la Jupyter notebook. I can barely think without Jupyter any more. Maybe that's not a good thing, but it's sure handy.

Try Joplin, its a direct replacement for EverNote, can import its database, and has active development to add more similar function. The only one that I'm missing right now is OCR, not sure if its coming, though. I'm using dropbox to sync my phone with laptop clients.


+1 for Joplin. I've been using it for over a year now and I love how I maintain control over my notes because in the end, they're just Markdown stored on Dropbox. And you can change backend at any time.

What do you use Jupyter for? Feel like I'm missing out. :)

I'm a physicist by day, and work for a company that makes measurement equipment. A lot of my work is quantitative, so I'm frequently "thinking" with numbers, equations, graphs, etc. While I'm not employed as a programmer per se, I use scientific programming quite heavily.

Jupyter has helped me take notes when I need to "think" quantitatively, and be able to look at answers quickly, plus maintain a record that I can read later on.

Not OP but I also use jupyter for almost everything.

Jupyter notebooks are great for capturing thought process and showing intermediate state of whatever it is you're doing, which means coming back to them even years later it's a lot easier to recall your own thoughts and not have to reverse engineer a monolithic block of code.

Hmm, what is it you don't like about code blocks in Notion? Don't think it can get much more convenient. The only things I miss is auto-format and maybe automated language detection. I don't know of any web-based tool that can do those things though, except some that do only those things.

This has been a workflow that served me well for the last couple of years -

Most of my stuff is on dropbox/google drive for accessing them across different devices. I prefer tools that let me control where the data is and not the other way around.

For all academic papers, documents, I use Zotero. you can use your favorite pdf reader to annotate or take notes on pdf's and zotero will sync these. I also love the feature where Zotero can automatically extract all annotations from the pdf. (I some times save these as an org file)

If I am reading a longform web article or a blog post that I really think is useful and helpful, I also save these to Zotero. The push to kindle extension from fivefilters is an awesome tool that converts webpages to pdf.

I'm currently testing the memex extension from worldbrain to annotate and organize my browsing history.

For all notes, journals, random thoughts, ideas, (both work and personal), I use orgmode. (I recently switched from ZimWiki). Its been amazing so far. So many things are easier to do orgmode although the learning curve for emacs is pretty steep in the beginning.

On mobile, I use the orgzly app for accessing and taking notes! Its by far the best android app I use so far.

practices >> tools!

A little surprised that after 64 comments I've only seen 2 instances of Evernote mentioned. I recently went back to using it and have been making use of the clipper fairly often. https://twitter.com/dalevross/status/1214257684863692806 https://www.evernote.com/l/AVVHBrGz_9BPMbh_Oxgp8qpibzVtX0tRn...

I also found https://dabble.me/ the other day which seemed like a nice slot in for quick thoughts with the ability to email to your journal. I've heard that journaling manually works better but I really dislike writing unless necessary.

One of my biggest gripes is with how many places I have my data in currently, and as a dev, I should have done something already but that's another story.

I'll definitely have a look at Notion and org-mode having read the feedback here along with a few other mentions elsewhere.

I was a paying EverNote user for years, but they stopped evolving while other NoteTaking apps didn't. I forget which feature they removed that was the last straw, but the recipe front-end was hugely important to me.

I've worked with Evernote a lot, but handling source code snippets is really painful. Got me switching to Notion too.

Evernote was useful 10 years ago.

Would you like to elaborate on what changed over the last 10 years to make it no longer useful?

The problem with bookmarks is you're never going to have time to reread the whole thing. I make a summary and put the source as a hyperlink so that if I summarized wrong I can go read it again. Summaries are better because you can organize summaries and have everything you know about enums together in a few pages, rather than 20 bookmarks.

You don't need to re-read all of your bookmarks. They're basically a hash table keyed by some concept or some problem that you either list elsewhere or remember in your head, that you can quickly use to pull up the full detail content for the given key. Much like the actual physical bookmarks that are just quick jump references to content you already know the summary of.

Don't use bookmarks as a 'I might come back and read this later'. Use them as 'I will need this later'.

I fully agree with this sentiment. It is very important to not have the feeling that you need to revisit all your bookmarks. In that case, you will eventually feel overwhelmed and a bookmarks library will do more harm than good.

I'm a bit surprised nobody seem to have mentioned Bear Notes.

I've used Onenote, Org Mode in Emacs but I am very happy to have discovered Bear Notes (https://bear.app/) last. year. It has really changed how much I document, work with text, store info etc.

Bear handles images, lists etc inline. It allows for very easy tagging (creating hierarchies) etc. And it looks really nice. Even with good fonts, Org Mode never really looked pleasing to me [0]. And I found that easthetics is really important for me to actually use the tool.

Bear uses SQLite for storage, which can be accessed with any SQLite tool (I've tested it). And it can also do batch export to a number or formats. The exported files looks good and can easily be imported into other tools, for example Org Mode (I've tested it). Bear can also export to PDF with styles. I actually use this to export my notes as reports to the company board, customers etc. And I've gotten compliments on how good the reports looks. ;-)

Finally Bear syncs easily. I can now work with my notes on the phone, iPad and laptop.

If I only could get the tool to use a solid, non-blinking caret life would be a bliss. Go Bear!

I'm looking forward to try Bear once they have a web client. I need access from Windows/Linux.

See also Roam (https://roamresearch.com)

Just tried to sign-up on Roam. The sign-up page redirects to some tracking site (https://trackcmp.net/redir?actid=66602...), which is blocked by my pi-hole.

Not looking good.

Bummer. I signed up a few weeks ago (free), and have been enjoying it tremendously. To the point where I'm glad they'll take my money bc it's more than worth it, and I want to feel confident I can rely on it as a customer.

It's going to cost around 30 usd/mo so it's something for people that depend on it.

Interesting. Does anybody know more about the project and the organisation? How do they keep their lights on?

They’re going to a monthly subscription model when they get out of beta.


“Still working out specifics

Right now looking like it'll be $30/month - cheaper for annual - cheaper for students/non-profits/unemployed

I personally think unemployed autodidacts should always get student benefits - scam that they don't”

We have build Emvi[1] to solve many of the problems when it comes to managing a long living personal or teams knowledge base. The idea is, just as you describe in your blog post with notions tables, that if you can remember a fraction of the information you're looking for, you can find it using text search, filters and sorting. A fixed structure won't cut it when you have to maintain it for a long time and I believe it's better to add meta data (like tags) to make it searchable. Emvi has a free tier with way higher limits than notion, so that should work for most personal knowledge bases.

[1] https://emvi.com

I like plain text backed in google drive. I use The Archive [0] but Notational Velocity [1] is popular as well. I don't have a bookmark system though. This book is an excellent explanation of a plain text knowledge base system [2].

[0] https://zettelkasten.de/the-archive/

[1] https://brettterpstra.com/projects/nvalt/

[2] https://www.amazon.com/dp/1542866502/

Any macOS users should check out DevonThink.

I have been perpetually sitting on the fence about Devon.

My use case: before Google+ folded I saved all my history there, because 99% of this was links to pages on stuff I found interesting/fun/relevant.

So now I have a large amount of urls (some will probably be defunct by now but it is not relevant). What I would like to do is to feed these urls to something (DevonThink?) that could access the article, and index it so that if - as an example - I want to prepare a RPG campaign on modern day pirates I can just write in the search box ["pirates" "shipping" "modern"] and hopefully get a list of web articles that are relevant.

Now, I know that Devon has some sort of automatic indexing/clustering facility for documents you have on your HD, but it is not really clear to me if it works also with stuff you only have an URL to.

(If anyone has some alternatives to suggest I will be very interested - I toyed with the idea of putting together an ElasticSearch VM for this but it remained on the backburner for years).

Here is my original request here, btw: https://news.ycombinator.com/item?id=18882167

Thanks a lot, I will check these which are totally new to me.

+1 for DevonThink, both on mac and ios.

Also on iOS.

It's too much for me, I couldn't be productive when using few services to store notes. Too much distractions.

For my personal needs I use https://github.com/vimwiki/vimwiki for notes, ideas, todos, articles etc. I used Buku for few years, but then I realised that I'm not using any bookmarks at all, so moved to native Safari Bookmarks. :)

At the end I'm using GIT to have complete archive of each change. :)

I love writing an blogging. Started reading into org-mode recently. Wish I had a cleaner setup with my own website. I am pretty satisfied with fox's bookmarks export though. The json it generates is easy to work with and carries lots of useful meta data. I built a little processor to make my bookmarks easily readable and organized here: https://l-o-o-s-e-d.net/bookmarks

You should check out https://histre.com/ I've been working on it and imho it is a better solution. With tools like Evernote, Notion etc which are designed to do everything, you end up with a useless write-only "knowledge base" that will make a hoarder blush.

I use SuperMemo (http://www.supermemo.com).

I'm surprised no one has mentioned SuperMemo and incremental reading as a means of managing a knowledge base.

Ahh incremental reading in SuperMemo. It makes my Wikipedia addication so much more manageable.

For personal/private notes, I built https://www.build.my/logbook

For public notes, I use GitBook https://www.aizatto.com/why-gitbook

Great writeup!

I've got quite similar system in spirit and aims, just instead of lower Notion layer, using org-mode. It ends up searchable, synchronised with all devices, available offline and with great tooling for organizing and processing information.

> While documenting my configuration, especially my command-line workflows, I identified some shortcomings

Can't agree more! I've considerably simplified so many things while writing up on my setups and publishing code -- I guess it's easier to sneak in unnecessary complexity when it's all in your head.

> They are, however, not well suited for keeping extensive bookmarks libraries.

Agree state of browser bookmarking is a bit of a shame, interfaces are restricted and bloated with non-functional features. At first I switched to Pinboard [0], but after a while I realized even that was not enough for me and wrote Grasp [1], browser extension to capture stuff directly in org-mode.

> Examples of things that should not be Chrome bookmarks

Yep, eventually reached exactly the same conclusion! I often want to add private notes, more context etc and it's just not compatible with standard bookmark solutions.

> Taking Notes

Also big fan of notetaking, I basically write down any remotely meaningful thought if I don't have time to exercise it at the moment (via org-capture). Also using same file for everything and just processing it now and then. Same for ideas for new projects, blog updates, etc -- these are just entries in org-mode.

> ..it requires some discipline. I tag and annotate new entries, that I added via the web clipper, about twice a month

Yep, doing same, going through clipped links in org-mode, tagging and putting priorities. Then I can sort by priority and start reading/refiling etc. Eventually non-important stuff just sinks down as I don't have time to process everything, but at least it's searchable.

> For example, it is quite easy to search through questions you have answered on Stack Overflow. It is also not a problem to go back to your Hacker News posts or search through projects you starred on GitHub.

I actually find that even though in theory it's easy, in practice sometimes you don't remember where exactly you need to search for something, so you end up going through 'check reddit saves', 'check HN saves', 'check twitter faves' cycles, etc. So I'm automatically converting these into org-mode and they are searchable as any other org-mode file. I describe it in more detail here [2].

P.S. Great and clean design, especially the sticky navigation on the right! I might borrow the idea :) By the way, I tried in responsive mode and it seems to disappear alltogether, perhaps you could display it on top if the screen is too small?

0. https://pinboard.in

1. https://beepb00p.xyz/grasp.html

2. https://beepb00p.xyz/pkm-search.html#other

Thank you for the nice feedback, much appreciated!

> Ad using native bookmark sources, such as HN/Twitter/...

You are right, this is not ideal and it also happens to me that I am no longer sure where to look first. On the plus side, I do find it eventually, even if it's only the second native source I check. Atm, it seems to me that this is still better than duplicating everything in my other bookmarking layers. I will read your posts to find out more about your system, honestly don't know much about org-mode yet.

PS: Very glad to hear that! Especially, since I don't do much frontend work usually. I will think of a way to include a ToC also on small screens. Please, feel free to steal any idea you like. Then I don't have to feel as bad for looking into some of yours. I really like the pilcrows next to the headings and the dotted lines for highlighting sections on your site.

alphapapa wrote a nice way to capture browser content that is browser agnostic, and uses org-capture: https://github.com/alphapapa/org-protocol-capture-html

(as an alternative to grasp, because I'm on MacOS/Safari).

Thanks for the post. Will check out Notion for sure. Although I'm a little worried about the vendor lock-in - as others have mentioned as well. Although I think it is ok to surrender some robustness for the ease of use.

I also use notion, but I don't put my bookmarks in a database very long, its only for the one I don't have time to categorize, and I try to add them in a neatly organized tree structure.

A Todoist app, mac notes or OneNote and pocket to bookmark pages. Why over complicate things!? And if you want to have a framework to memorize things try PolarBookself and Anki flashcards.

What is difference of this and bear on iOS and macOS which I used everyday now.

Yes depend up server and paid subscription. But use tag and sync across multiple devices

I just use Atom with enhanced-markdown-preview and ascii-tree.

It's flexible enough for re-organisation and I can convert to lots of formats with pandoc.

Off topic sorry but why bother with the cookie agreement on the overlay footer like this? You are not giving users the ability to opt out so this just annoys people without making the website GDPR compliant.

I will soon disable this for non-EU visitors. A more sophisticated cookie agreement banner is not so trivial to do with static sites. As long as everybody is doing it this way, even big corporations, I will keep this approach for EU visitors.

It might as well not be there if there's no way to opt out of the cookie. It's not GDPR compliant either way.

I use folders full of Omni Outliner documents. I like it, but to me it's missing some features.

I think OneNote is the best option. I have been trying alternatives but nothing seems to come close.

Glad to see a shoutout to Workona here, I've really appreciated the ability to save bookmarks in workspaces and better separate and compartmentalize my browsing by project.

Their support team is great too, had an issue with an update using Brave recently and they were very helpful with fixing the error.

Google Keep is pretty awesome for notes. Has a good search, supports tagging, can pin, manually order notes, create calendar reminders from them, attach images, colors. Supports plain text notes and checklists. Also, there's not only a web app but desktop and phone ones too.

And it’s all gone, when they shut it down.

True. There is an export option for notes in Google Keep, and I don't think they will just shut the service down instantly without warning. I guess there's no silver bullets, every option sucks in a way. You go oldschool with a text document or directory structure, you loose out on usability, if you use any of the services there's a risk of them being shutdown, or you host some service yourself and take on costs of doing so.

Has anyone used Pocket for personal knowledge base?

Anyone used WikiJS?

Don't bookmark. Instead: Print to PDF.

I have every interesting article or web reference I've ever read since 1997, sitting in a PDF file alongside tens of thousands of others.

It is so convenient to be able to "ls -l | grep subject" and find all the pages related to that subject, and then to mine the data out of the PDF's for further reference.

Or, I just open the PDF and gain the knowledge again.

This works very well and doesn't require an active Internet connection. Every year I spend a few weeks up in a mountain retreat, going through the collection and getting an idea for the variety of topics I've read about for the year.

Its pretty neat to see, also, the changes in my interests over the years - and as well, its pretty wild to go back to the sites after time and realise I've got the only copy of the site - because its gone now.

I'd prefer to see more tools for manipulating PDF's become mainstream in the future. I can't recommend highly enough the productivity I've gained by being able to organise things this way. Its like having my own private Internet of 80,000+ pages, tailored to all the things I'm interested in ..

I've always disliked pdfs - they just seem too, i don't know, obscure or scrambly/messy, maybe? However, i have recently seriously considered beginning to archive instead of bookmarking. I would prefer archiving as plain text (for easier searching and compressing, etc.)...but enough content that i consume (and wish to archive) contains media (mostly imagery, not really much video), so i guess printing to pdf is more convenient for the "capture" phase. The one concern i have is storage...I mean how much storage would bloated pdfs have to use up i wonder (as opposed to, say, leaner plain text)?? Legitimate concern, not trying to be facetious. I've also heard there's some sort of web archive file format - .warc or something? If so, i hope that becomes an established thing - maybe bst of all worlds...at least for this archiving use-case.

> Don't bookmark. Instead: Print to PDF.

I "print" to markdown instead, even though pdfgrep sort of works I much prefer text files.

I'm also interested in how you go about this. Evernote and Notion have good clippers, but I'd rather not rely on them. I tried to use markdownifier[0] for a while, but images are hotlinked and will break if the site goes down. I should really just write my own clipper that does this, but I haven't found the time yet.

[0]: http://heckyesmarkdown.com/

That’s a great idea! How do you go about it?

I use markdown-clipper for Firefox, it does an OK job but sometimes requires a bit of fixing up. It does not save images, so I have to do this manually; I use VSCode for Markdown, which has an extension for copy/pasting images to Markdown, it inserts a link and saves the image in a subfolder.

A little friction, but if that's too much, then the article is not worth keeping. But yes, someone with the time on their hands could contribute to markdown-clipper, saving images locally.

I've always thought of PDF as an opaque format. How do you search and/or browse your collection? Does the subject show up in grep [e: without being diligent with filenames]?

I (*nix user) use a script that basically does:

    pdftotext -layout -eol unix -nopgbrk  $PDF | egrep ...
Many PDFs have compressed content streams, plain text utilities only see metadata in that case. Cached, compressed text-only output is usually tiny, and can be zgrep-ed.

pdfinfo shows document metadata (title, subject, keywords and more), but it's quite uncommon for these to be useful (Adobe and LᴬTᴇX-sourced PDFs tend to have this data).

Both come with xpdf.

This great; thanks for sharing!

I'd be interesting in knowing this too. It sounds like a good idea.

That sounds brilliant, and yet I can't remember having heard or read this idea before!

It kind of amazes me that you've been doing this for two decades already.

I always wonder why nobody else does it, but its a very pleasing experience to have an archive like this - and it has proven very valuable in production crunches and peaceful moments of downtime, alike.

Another thing it allows me to do is generalise my reading in real time - I know I can always come back to the subject of the PDF in a few days time, or whenever really, without needing to know where to find the details: I just grep my file tree, and mine the data that way.

Key thing is, though, that when saving the PDF, I always check that it is named a decent subject, derived from the <title> tags .. if there isn't one, I add it myself. This is the only place I pay attention to 'tagging an article' - in naming the file of a poorly-named page, but if the page title fits the subject, I don't change a thing. And it means I have a huge set of data also in the filenames, not just the contents themselves .. Periodically, I use 'detox' on my PDFArchive/ directory, also .. this helps with consistent-naming for regexes, and so on.

I keep getting the itch to put all this into a DB with proper indexing or whatever, but .. actually using plain ol' shell tools is turning out to be just as productive. Its a big set of data, but plain ol' files and pipes is proving to be all I really need to get the data mined ..

Where is source code for this blog?

You can get the general theme here:


My blog itself is not open source. I thought about this for a while, but I think the pressure to keep everything cleaned up, documented, and organized is not worth the benefits in this case. If you have some specific requests, feel free to get in touch and I might be able to help you.

I mean yeah, people might want you to do that, but you dont have to.

Its your life, and your code. You can post it, and keep it messy if you want to.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact