Hacker News new | past | comments | ask | show | jobs | submit login
Designing a Personal Knowledgebase (acuriousmix.com)
246 points by ingve on Sept 4, 2014 | hide | past | favorite | 117 comments

I used to think I had this problem, too. I developed an elaborate categorizing and indexing scheme. I tried to apply it outside of my personal knowledge, creating a crawler/indexer for research and web sites in one my areas of interest. I thought "if only we organize things better we can change the world!"

I realized over time that the collection wasn't the hard part. It was the categorizing and simplifying. The author hits on it a few times:

"It is the extraction and organization of the information that takes time... I know full automation is not feasible, since the imposition of meaning onto the raw information is something that I must do, not the computer."

In my experience, the simplifying is really where you get all the gain.

What is simplifying?

It's distilling a complex research paper into a few key data points. It's naming files well so that you can search them with Spotlight. It's learning to write more clearly. etc.

That last one -- clear writing -- should have been obvious. Good writers manage to convey so much information in so little space. How do they do it?

The parallel to programming should be obvious. When you name things well, they become very easy to find and use.

We can expand this to a bigger point: if you really want to get smarter, you need to be learning to simplify because your brain can only hold so much at once.

It's like learning the law of gravity rather than cataloging every time an apple falls from a tree.

The following unintuitive conclusion arises: you should be looking to make your "Personal Knowledgebase" more difficult to grow because it forces you to go through the simplifying process sooner.

I stole this idea from someone on HN: anytime I find an idea or quote that I think it important, I clip it into a Word Document and print it out. I keep these in a binder. I have gone back to these notes so many more times than anything that I have in any digital form. More significantly, these bits of information have influenced my life more often and more deeply.

This could be viewed as layered ontologies, moving from the largest dataset to the most abstracted. Our categories are necessarily personal, but they eventually overlap with generic ones. Presently we don't have good tools to associate personal categories with those of larger datasets, because categories span apps, platforms, orgs & license regimes.

  Classified data
  Corporate data
  Licensed data
  Open internet
  Personal notes/kb
  Human brain

Writing is the best and hardest way to think. "Drop by drop," as I've heard say.

I love the binder practice. I got a printer about a month ago, printed a bunch of files (notes, incomplete songs, etc), deleted them, then reduced the mess down to a few pages. That was a productive day.

I submit that we have a false equivalency—nay, a false superiority of digital files over paper. After all, that's what we're all about here, right?

As you say, the constraints of paper are a feature, not a bug. I would have made this comment shorter, but I didn't have time.

EDIT Here you go, from right now: https://news.ycombinator.com/item?id=8272394

I feel so opposite to this entire thread. I have everything on my computer neatly organized, indexed, and tagged properly. For quick input of ideas, I learned cursive and I learned to write it quickly and legibly, and I use this system: personally its flawless - http://bulletjournal.com/

I have a few filled up notebooks and anytime I need some information I look it up in the index, and flip to it. I even write down URL's and descriptions to videos I liked. If I'm watching a video, I write down the timestamp, url, and title, thats how I index media. Maybe if the contents are important like dialogue or lyrics to a song, I'll jot down an excerpt.

It's really cool looking back on old notebooks. Can't wait to virtually go through every day of my life on paper when I get older, show my kids too.

I have a master list document on my computer that has every entry in the table of contents categorized as well, so that makes finding stuff easier too.

THANK YOU! It really looks great!

Coppola interview on his binder for Godfather: https://www.youtube.com/watch?v=awce_j2myQw

> anytime I find an idea or quote that I think it important, I clip it into a Word Document and print it out. I keep these in a binder. I have gone back to these notes so many more times than anything that I have in any digital form.

I do the same but I do keep it in digital form because I just hate paper. There is a browser plug in called 'scrapbook' that works very well for me, I've tied it to a hotkey (shift-ctrl-B) to capture whatever is highlit on the screen without further confirmation or other interaction. I periodically dump the scrapbook and distill it.

I like it because it keeps the data locally rather than on some cloud service.

I wholeheartedly agree.

We had these things called RedBooks at MetaDesign in SF. A huge library of notes, sketches, diagrams and final work etc from each project. There is something about the tangability of a physical book when you want to retrieve information thats just much more intuitive than having some complex structure on some hard disk somewhere. It's simply the wrong way to use computers.

The force of digitalizing information is the ability to retrieve contextual information from a fuzzy collection of knowledge.

Exactly, I've posted similar remarks on HN: I keep little quotes and principles in my main "notes" file so that they reach my eyes frequently. I've also recently made the move to formalize my personal project data a little(Fossil repo, add tickets to build a todo list). I still do everything digitally, but it's increasingly with more of a process - a process that itself aims to be as small and transparent as possible.

Is it impossible a technological solution exists? There has been a lot of research on automatically categorizing documents, or reducing them to a few dimensions.

Even if it works well, I'm not sure how much utility this information would provide.

I'm a bit late to the party, but I agree with your sentiment that distilling (or synthesizing) is key to a long-term personal knowledgebase. I don't use paper, but I often refer to my digital notes.

I use an open-source mindmapping program called Freeplane to record most of my knowledge and thoughts: concepts I've encountered, products I've evaluated, snippets and webpages I've enjoyed, plans I've made, pieces of my writing. I have mindmaps like "Personal", "Ideas", "Projects", "Coding", "Devops", and "Ruby on Rails". I learned about Freeplane from a friend a few years ago and it's been amazing. The keyboard shortcuts are very fluid (use them!) and the program is very fast since it's text-based.

How it looks (at top level): http://imgur.com/QRkJ9vh

I make sure I summarize the takeaway from each link or snippet I put into my mindmaps - what I want is information that my brain can directly use when I refer to it in the future. If I want more detail, I can refer to the source snippet or page.

See the reviews for Freeplane, they're almost all 5 stars: http://sourceforge.net/projects/freeplane/reviews

Quoting myself from a year ago:


Regarding software, I use Freeplane to record the majority of the thoughts and information I want to preserve, and I think it works well for me. It doesn't have the sheer freedom of pen on paper - which I still use for mapping out thoughts that I'm not quite sure about or have complex relationships - but it serves me well in 1) categorizing random scribbles and steps into meaningful subcategories, and 2) crystallizing the final, synthesized thoughts I have on a matter.

Plus I keep my map files in my Dropbox so they're synced to all my computers (although I haven't set up the software to view them on my phone). Scalable, easily reorganized and expanded, cross-connectable, very fast to input, synced and backed up, free - what's not to like about software?

I agree with you on the usefulness of mapping for testing. It's definitely has helped me a lot in problem-solving. I'll throw out a few nodes that I think I need to investigate, explore each one a bit, write more items to consider, and whittle down or branch out as necessary. So after a while I resolve all the branchy, bushy sub-issues and have a reasonable game plan. Sometimes I dive into code halfway, but switch back to the map to record where I am and add new issues that come up that I need to resolve.

I also record most of the coding methods I find while working on tasks, in general form. So my mindmaps are also a web of how-to notes or a gigantic cheatsheet that details how to achieve any effect that I've previously worked through: from comparatively minor ones like the syntax for Rails migrations or opening a new window in JS, to larger ones like how to set up a Rails+postgres+nginx stack on Ubuntu, recording every action taken and issue encountered along the way. Comparisons of tools and databases and frameworks, mysterious bugs that I've run across, Sublime Text shortcuts - they all go into the maps.

I'm not sure I need to record every thought that goes through my head like the author suggests, but I think there's a lot to be said for keeping a comprehensive, organized knowledgebase.


edit: I should also mention that I use Anki for language learning. However, I don't find it necessary to use it to memorize other types of knowledge, because I don't actually need to memorize much knowledge - I only need to categorize and store it for easy retrieval.

For language learning, Anki also becomes less useful as you get better in a language - at a point, extensive reading becomes more important, and it acts as a "natural SRS". (This conclusion is from discussions on Chinese-Forums.com.)

I have to add docear[1] which is Freeplane and can also sync highlights from pdfs.

[1] http://www.docear.org

simplification helps only in the way you start looking at things later. But then when you communicate to another person, he still needs all the data to come to the same level of simplification you achieved. Being able to organize the way your brain looks at it at some point in time and later reconfiguring as things evolve is what may be needed.

I'd very much appreciate a system like the one described as well. And I've got some further ideas. When tagging, suggested tags should be very smart - like Amazon/Netflix recommendation system smart. It should be backed with a Bayesian categorizer or similar. This sort of categorization engine could also be used to recommend hyperlink targets between sources when a term is selected.

It definitely needs to integrate with the web browser. Highlighting a web page extract and saving it should both save the extract to the knowledgebase and also mirror the entire page in the KB so that later you can hit a 'context' button and be given the original source it came from. Trust as little as possible that the content will remain available and unchanged.

I'd also like for there to be a 'bulk' area where things like entire books for future reference material could be saved reachable by the searching functionality.

An automatic chronological index would also be helpful. If I save a page about fractal image compression and from there move to wavelet compression, being able to see the history of what order the topics were visited in would be beneficial.

Integration with learning assistance systems would be nice as well, things like the various apps that use intelligently timed repetition of information in order to help with memory.

I do think that building from a wiki could be a possible start. My biggest concern with using a wiki or other web based system is the terrible layout options forced upon us by HTML and CSS. They are designed such that arbitrary layout is quite difficult, and arbitrary layout would be fairly necessary for this.

This is a great text; saving to re-read another time.

Personally, I use Dropbox + org-mode based personal wiki. I store each important topic in its own file, and the whole thing gets synced between all my machines. It basically looks like this:


I heavily use links between documents, to outside sources, and back-up anything important by downloading PDFs/pictures and saving plaintext in an org file.

My current pain points:

- internal linking - I try to avoid linking from .org file to another, because I expect that if I move a piece of text to another file (or even subheading, for heading-relative links), I won't be able to find all the links that broke in the other files. So I created a "dispatcher" file, that maps links to destinations, so when I move something, I only have one place to check. Unfortunately, this requires two clicks to traverse to your destination, and is basically annoying. So I'm looking for another solution for broken incoming links.

- mobile viewing - I don't use the org-agenda based flow that is apparently required for org-mode, and the Android is too stupid to let Dropbox open my org files (without workarounds that require a lot of tapping on the screen), because there's nothing registered to open files with .org extension sigh. I'm yet to find a good solution for this (even just a simple text editor that could register itself as an app for .org files would be immensely helpful).

- web viewing - sometimes I'd like to browse my personal wiki from someone else's machine, or maybe even link to a particular page; for that I'd love to have a web-viewable version of my wiki. I want it to work seamlessly, i.e. without having to manually regenerate or commit everything. This could be doable with Dropbox and a bit of org-export scripting, but I'm yet to get around to do it.

- web writing - sometimes I'd like to dump some notes into my wiki from someone else's machine; I haven't figured out how to make it work.

That's a very extensive list of org files you have there :)

I bypass org-agenda for mobile viewing by scheduling a job to call org-html-export-to-html on each .org file. Dropbox mobile works well for viewing the exported HTML. There is zero hassle once the script is running.

Here's the rake script I use: https://gist.github.com/shoover/d75a58074be9894bfc54. The core emacs batch command for html export is simple enough. The rake-isms are there to find all the .org files in the given directories and check timestamps. There is optional elisp to deal with loading my latest org-mode and not the built-in one. The INDEX business was added to deal with MobileOrg but I no longer use that. Read-only HTML in Dropbox is good enough for me.

Agreed that internal linking is a pain point. I just use `C-c l` and suffer the dead links.

Agreed regarding remote editing. Gmail drafts "transport" are what I end up using. It's not terrible but could be better.

> That's a very extensive list of org files you have there :)

Oh well... ;). Though it might seem like it, it doesn't give me any problems. I usually remember more-less how I named a file with a particular thought I need, and if I can't, there's always grep to the rescue :).

Thanks for the script, I'll try to adapt to suit my needs :).

> Gmail drafts "transport" are what I end up using. It's not terrible but could be better.

Yeah, I usually send e-mails to myself.

I'm thinking, maybe I'll eventually get around to spin up a Dropbox daemon on a VPS and make it sync only the wiki/ directory, and then put some simple "save snippet" CGI script to have a browser capture enpoint.

> Oh well... ;). Though it might seem like it, it doesn't give me any problems. I usually remember more-less how I named a file with a particular thought I need, and if I can't, there's always grep to the rescue :).

Makes sense. ido-mode helps. Despite the fact that this is an anti-agenda thread, I should report that I also just discovered how easy it is to search across agenda files by word or regexp. It's like org aware grep. You would need some elisp to load all your .org files into org-agenda-files.

I've always used a handful of very large files, which probably doesn't have any particular advantage other than less buffer switching and facilitating isearch over grep. However, outline levels quickly get deep in large files.

> I'm thinking, maybe I'll eventually get around to spin up a Dropbox daemon on a VPS and make it sync only the wiki/ directory, and then put some simple "save snippet" CGI script to have a browser capture enpoint.

Sounds good.

> ido-mode helps

Especially with flx-ido (fuzzy matching for names) and ido-vertical-mode; now it's just fun to use :).

> this is an anti-agenda thread

I used to use agenda for doing GTD for over a year sometimes ago; it was awesome for aggregating calendars and TODO lists from multiple files, but I never got around to learning other interesting features. Maybe it's the time to try again :).

> However, outline levels quickly get deep in large files.

Yup. I still have a lot of deeply-nested long .org files - those are usually project-specific, and contain ideas, design notes, TODO lists, etc. But for personal wiki, it feels more natural for me to keep every topic in a separate file (mind you, some of them are quite long; my notes from "Pragmatic Thinking and Learning" are 750+ lines of 4-level deep nesting, and I'm nowhere near finished :)). I fear even thinking about coalescing them into a single file; when your file hits 10k lines (like one of my project notebooks at work), you find out you need to keep it fully expanded, because navigating with top headers folded gets kind of slow.

Since my edit window is long gone, I just want to add that thanks to rokhayakebe ( https://news.ycombinator.com/item?id=8273862 ) I just figured out a way to use GMail + IFTTT to send notes straight to my personal wiki :).

Nice one. This line of thinking could open up some possibilities.

For your web viewing needs, perhaps something like org-page[0] might help, though it requires the use of git.

[0]: https://github.com/kelvinh/org-page

After years of being a paying Evernote customer, I stopped using it several months ago. I am actively looking for alternatives. I found that I was spending too much time curating information in Evernote for the value I got from looking up material for reference.

One thing that I have been experimenting with is keeping a few top level subject directories in Dropbox and making notes in plain text markdown files. If I use a .txt file extension, then available iOS and Android editors that work with Dropbox give me coverage across tablet, phone, and laptop. Spotlight search on my laptop helps me find notes but finding the right information on mobile devices is a problem.

I also write notes in Google Drive and store potentially useful PDFs and purchased books in GD, and rely on GD search to find things.

I have tried org-mode but I no longer "live" in Emacs so org-mode is not ready at hand as it would have been ten or twenty years ago when I used Emacs for everything.

We have had very similar experiences with Evernote, then. I also used Google Notebook before the (in my opinion very efficient) product was discontinued. Google Docs on the other hand forces you to open entire documents. This doesn't work well when the most frequent use case is a quick-and-dirty append.

This may sound like a "solve everything with git" type comment, but honestly, I have found that keeping a set of markdown files in a git repository (frequently pushed to github) works well for me. I can have a quick script that appends a default journal file for thoughts I don't want to spend time organizing immediately. For everything else, I open up the relevant .md file and add a bullet point.

So far it's working better than anything I've experimented with over the last ten years.

I do the same; only with vimwiki (so today's diary is just two key strokes away). https://github.com/vimwiki/vimwiki

I recently found a blogpost describing a way to add 'commit hooks' to vimwiki so that my repository is automatically synchronised when I make edits. You can still fall back on command line git for fixing merges / anything else.


Have you looked at http://www.notebooksapp.com/? It stores data in plain text, supports markdown, will sync with Dropbox & WebDav, and does full text search & tagging. Available for OSX, iOS & Windows.

I've been using Arena for the last year or two: http://are.na.

It's conceptually based on blocks and channels, where a block is text, pdf, video, audio, image, etc. and a channel is an ordered set of blocks and other channels. Channels can be public, private, or private with collaborators. It's web-based and has a good API(1), which I use as a lightweight CMS for my own website. They plan(2) to eventually charge for a pro version, but for now they are accepting rewarded donations.

(1) http://dev.are.na/documentation/channels (2) http://future.are.na/

Are.na is blocked on my work network as malicious software... Are you sure it is clean?

I have also struggled for a long time to get what I want out of existing personal knowledgebase solutions, despite loving many individual elements of the products that are already out there like Evernote, Workflowy and Dropbox/.txt files etc. This pain was particularly acute in college while drinking from the firehose of information, and not having a tight solution, such that years on so much value was simply lost and forgotten. I have been building the product I wished I had - Naked Knowledge. It would be great to get interested parties' feedback. You can follow us and sign up for the beta here if you like: https://nakedknowledge.com

Agree with you. I have over 5000 notes on Evernote, but I only use its shortcut to write several content(about 10 notes). Now I am trying Weavi, but uploading files is not convenient like Dropbox. Fortunately, I find my files are less and less as I use more and more mobile devices. So what is need is pure content management not file management. If Weavi supports private records, and an extension like Evernote Clipper, it would be my best choice.

If what you have is mainly text, I highly recommend Workflowy. I have a top-level header called "Brain Dump" where I put any thoughts I have during the day, then I move them into subcategories when I have time.

The combination of Workflowy (with tagging), Pinboard, and Dropbox feels "right" to me.

Combination products are often hard and heavy to use. I would more like to choose simple concept product. You may try Weavi, (https://weavi.com/@/2739) I use Weavi+Workflowy to organize knowledge.

God this is hard. I've been trying to get at a solution to this exact problem for a very long time as well. It needs to be there when I'm walking down the street, sitting in a busy meeting, or at home in bed. A hybrid of mind-mapping, notes, and a document database. I'd also like to be able to publish and organize documents for others to view.

I wish I had a solution.

Org-mode is incredible. I use it to capture everything from vocabulary (which is automatically put into org-drill for spaced repetition), to my ideas, todos, shopping lists, EVERYTHING. Capture can happen on phone or laptop and syncs seamlessly.

Agenda for captured todos syncs with gcal. Publishing / printing to pdf / web / latex no problem. Index or search? No problem. Inline images and video, figured out. Snippet capture from web browser? Done.

Plus the incredible power to modify all this using emacs lisp. It's not python but it's powerful and every other emacs mode uses it.

There zillions of modes and tweaks to the emacs environment that add to this.

Org-mode and emacs are the single most powerful organizational tools I've ever encountered.

The workflow proposed in the first link looks quite interesting: Emacs + org-mode + python in reproducible research; SciPy 2013


And here are some more reviews, opinions, advices and experiences I just stumbled upon:



What transport is used for mobile sync? Is a browser plugin used for snippet capture?

I'm not sure if you're being sarcastic to make a point or actually ignorant :)

org-mode, while amazing, is emacs-only. There is no browser interface, no cloud (unless you store the files on Dropbox or GDrive or whatever) and no mobile app.

And that's its greatest failing in today's mobile/cloud world. Gee, it's almost like there's an opportunity here... ;)

Semi-ignorant :) I knew of org-mode as emacs only, but the grandparent post mentioned mobile & browser snippets. Just found two apps which edit & sync org-mode files, and templates for bookmarks.

Android app uses ssh, https://play.google.com/store/apps/details?id=com.matburt.mo...

iOS app uses dropbox, https://itunes.apple.com/us/app/mobileorg/id634225528?mt=8

Blog post on replacing other svcs/apps with org-mode, http://www.swaroopch.com/2013/01/16/orgmode/

Web bookmark sync on mobile, http://karl-voit.at/2014/08/10/bookmarks-with-orgmode/

A browser interface for reading can easily be done by exporting to html. There is an option to output in a texinfo-like interface,with prev next and up navigation. Latex equations, charts, images and graphs (if you want to learn a little R) can all be integrated easily. Bibliographies can be handled using BibTex.

There is a learning curve to be sure, but the payoff is incredible since you can then directly build research papers and books that output to LaTex/PDF....

And a browser interface for recording and linking web pages in your Org files can be found here (although there are other ways to do it): http://orgmode.org/worg/org-contrib/org-annotation-helper.ht...

Also, because Org files are at heart just some pretty simply formatted plain text, it would be easy enough to write Org entries in any simple text editor and add the file to your main Org repository of Org files at a later time, possibly by using the 'refile' functionality to have each entry in the "scratch" file routed to correct spot in permanent files.

I think Org-mode is probably one of the best knowledgebase solutions, but only for a limited range of users, who must be pretty savvy power users and also willing to devote a bit of time to get setup to their liking so they can then leverage it to their benefit. It's pretty hard to wrap your head around all the different things that Org-mode can do, and then a bit harder to create a workflow that's easy and fast to use. But many have done it, some for the knowledgebase scenario and some for the wide variety of things the very flexible Org-mode app can do.

Not entirely true about no mobile app, there is always https://github.com/MobileOrg/mobileorg but it hasn't been updated since Febuary and there doesn't seem to be anyone actively involved with the project anymore.

I have one, but it's not public -- at least, not yet. One of the hardest things to get right here is search. It's easy to drop in a full-text search solution, quite another thing to design search that works well.

My system is a tiny app that automatically grabs pretty much everything I do on the Internet and indexes it. Github stars, HN posts, Facebook likes, bookmarks, you name it.

I can optionally tag or add my own sticky-notes to anything, and there's some other neat tricks that make fuzzy searching a lot easier. Human memories are very story-based, and so if I only remember that I looked at something important about ducks when I was at a Meetup, I can actually search for that.

I'm considering productizing this -- email's in the profile if you'd like to chat. :)

Sounds like a neat project. I started putting something like that together but it was more text focused. I like the idea of automatically gathering data from other sources.

On a side note, I use mizuno for a project[1] and I've never had any problems with it. Thanks!

[1]: https://github.com/docverter/docverter

I designed historian https://github.com/smcnally/historian to extend browser history to include annotation and tags. Browser history is majority of what you do online; chrome search is pretty decent.

FWIW - I’ve always thought these guys were close to something (but not there yet) - https://gingkoapp.com/p/future-of-text - the idea of a quick dive-down/up through knowledge with potentially tagging/linking thrown in would seem pretty stellar from a knowledge management system.

I don't want to say this is a rip-off, but it does share a remarkable resemblance to Ted Nelson's pre-Berners-Lee hypertext Project Xanadu (http://en.wikipedia.org/wiki/Project_Xanadu). Here's a recent screenshot http://www.open-hypervideo.org/img/xanadu.jpg.

That man has been trying to get traction on it since 1960. He's quite remarkable.

Actually, reading their blurb is like reading his work from then. If you know the story of Nelson and Xanadu, the lack of credit to him is a little unforgivable.

A bit like the visions of Nelson & Engelbart, http://thoughtvectors.net/

Inkling is working on related challenges with ebook authoring, https://www.inkling.com/blog/2014/06/problem-of-structured-a... .

InfinitePDF on iOS allows non-linear navigation paths to be overlaid on an existing PDF, http://www.148apps.com/reviews/infinite-pdf-review/

The challenge for all of these solutions is migration/import/rewrite of existing content.

Thanks for pointing out Gingko - it looks like it could have the right mix of features that I miss in Workflowy, i.e. going beyond lists/outlines and having richer content in leaf nodes. Excited to try it out!

On daily work for outlines, I use Workflowy, which is easy to edit; and on systematize knowledge or stuff, I use Weavi, which is a decent product.

Thanks for the pointer. Tried it out - it's similar to Gingko but seems much less polished, and I couldn't find the keyboard shortcuts! I guess the advantage is that it's free. I really hope this kind of text manipulation will get developed further.

Agreed that this form of data capture seems powerful.

Where does evernote fall short for you?

OP said he wanted an open source solution. Other than that, I can find no faults in using Evernote for this purpose. I use it for my personal stuff.

Where Evernote falls down, and this is by design according to CEO Phil Libin, is in sharing across many people. Libin says Evernote is not social, it's anti-social. As such, it doesn't work when you need to collaborate on notes. But see Evernote Business which seems to contradict this assertion.

OP was talking about a personal solution so the social aspect doesn't apply.

Working within the editor. As a save and reference Evernote is great but for working within the document its awful. Just try to add/remove columns in a table or rearrange photos within a note.

How about the new OneNote for OSX?

Doesn't work for confidential data.

What do you use today?

I use a hybrid of Omnifocus, mediawiki, and pen and paper.

Semi-automated organization is difficult, not because we don't have the algorithms to classify things into categories, but because everyone has a different idea about 1) which things belong where and 2) how much of it should be automated.

This is why everybody else's organization scheme feels alien to us, and why our own organization scheme, no matter how much sense we think it makes, will inevitably feel alien to everybody else.

If you build a tool to automate part of the process, and then release it to the whole wild world, people will complain that it files things in all the wrong places. If someone else builds a similar tool, and you use it, you will complain that it files things in all the wrong places. Because each of us not only has a different ontology, but also has different ideas about how we should approach ontology in the first place.

You could make your tool highly flexible so that it can accommodate all sorts of different ontologies, but then everyone will complain that there are too many options. It really sucks.

Oh well, I guess that's why we've ended up with a gazillion incompatible tools, each of which is probably good enough for whoever built it, but none of which really suits anybody else's needs.

Gene Smith's book has a good overview of tagging in early "social bookmarking" projects, http://epublishersweekly.blogspot.com/2008/09/tagging-people...

Random is an iOS app/svc attempting to infer "associative ontologies" from navigation patterns, http://www.datascienceweekly.org/data-scientist-interviews/r...

"The app also allows you to connect things freely thus letting you express both your rational and irrational self. There are no universal categories or connections between different things - rather it's about an individual's own "ontology" that's created through usage. The "associative ontology" evolves continuously both through the actions of the individual and other people using the system. "

I don't think 'mistaken' associations need to be considered a bad thing, in fact I think they are a positive. Those things might stimulate thoughts and ideas not previously considered.

OneNote isn't a bad choice, but it doesn't hit all the marks here. It does shine in a few places for me:

- Search: Search is very very fast and also does OCR of text in images

- Cross-platform: Works on Windows desktop and on Android/iOS phones

- Online: Can also share OneNote pages as a link to those people who don't have OneNote

I'd be surprised if the format isn't propietary, and there's no CLI. But if those things aren't important to you, then basic usage of the application is awesome.

(My minor nit with it is that navigating Sections can be difficult, and "finding where you are" in a large notebook is also harder than it should be).

I started with a very elaborate mediawiki solution, but over time I've moved to org-mode.

It's has two distinct advantages: the filesystem is the database (plus a plain-text one!), and everything is scriptable using Emacs Lisp.

What I would like is an open source browser that autorecords all my non incognito browsing history, including videos, etc within a transparent database / file format.

Even with Google, I often have problems finding interesting things that I've read in the past.

How it would work: For each page, any time the DOM changes, that is saved as a revision of the document. This way, I can get back to any state of a website without javascript complexity by browsing the revision list.

Bookmarks / Favorites would still be useful for saving notable stuff, and ideally there would be a powerful data management facilities for search (eg. Solr), deletion and merging datasets. A future project, perhaps...

I've wanted a similar thing for a very long time. In effect, an infinite web browser that remembers everything. Rather than 'surfing' the web it would be more like mapping it out and exploring it, with more than just your memory to return to. I like the idea of a vertical tab bar on the side, and I think if the user interface were done right, one where the tabs of things which interest you never actually 'close' (closing a tab would be equivalent to dismissing something as irrelevant or uninteresting) but the list of tabs just continues growing, scrolling off the top of the screen but recallable (and searchable) at a moments notice. It would be major rethinking of how a browser works and what it does and it would possibly get confusing with things like web apps (shoehorning applications into a document presentation system bites us again). I imagine lots of people would presume a proposal like this, keeping everything you ever browse or at least a very significant subset of it, as slightly insane... but that just makes it more alluring to me.

Would you be willing to run a dedicated home server+storage for this purpose, e.g. Dell T20 or Lenovo TS140 ($300) + 4-6 disks in RAID1 config? It could run ESXi / KVM / Xen, e.g. FreeNAS in one VM, knowledge base in another.

For UI prototyping, some options are Qt/Webkit, a browser extension, or http://breach.cc/

Yes, this.

When I say I regularly accumulate close to 1000 open tabs on Firefox (I use the Tree Style Tab plugin to put them on a column to the left) people think I'm crazy, but I just think it's a better way to recall what I was doing and go back and position myself in time with regards to what I was reading and what I will read next.

In my mind, I should only close tabs I want nothing to do with, and I keep open the stuff I'd like to revisit at some point. The fact that Firefox keeps open pages in memory is incidental and I don't let it affect my flow. Every day I crash my Firefox and reopen it with the 1000 tabs again (that's the only way to make Firefox understand I want the tabs loaded, not the content) to keep on browsing as keeping it open for long stresses the GC (I assume) making it sluggish to use.

So yes, I think you and I see "surfing the web" in a different way. I've tried organizing myself with Bookmarks (but then you get lost in folders and categorizing), Pinboard (but I never go there to search for things), Evernote (I never search there either) and so on. Those solutions are not natural to me. If something is not in front of my face, I'll forget it. That's why I want to have a running collection of all the tabs/pages/notes I care about and it should be there at all times, organizable like the plugin I mentioned, but without taking RAM.

Here's an interesting article by a philosopher I enjoy. He argues you and I are "horizontal organizers": http://www.structuredprocrastination.com/light/organization_...

I dont know about all the other items on the wishlist but you might be able to find what you were doing/watching when. https://github.com/gurgeh/selfspy

Wasn't WebMynd (now trigger.io) trying to do this? A Tivo for the web.



I would just like to dump tons of stuff into it. Scraps of thought, sketches, images, archived web pages, pdfs etc... then have it organise itself, finding links and connections between things, perhaps with some way of shaping and guiding it. I just don't have the patience to tag things or organise things into a hierarchy. I inevitably end up with a huge blob of 'uncategorised' stuff.

I wouldn't mind if the algorithm for making connections made some mistakes - the important thing would be helping you connect things together in novel and unexpected ways, based on your curated collection of stuff. Serendipity is super important.

I'm surprised he doesn't talk about Zotero. https://www.zotero.org/ Especially for a grad student, it seems like a better option than any of the other things he's tried/discussed.

While you're at it, please add a spaced repetition learning feature à la Anki. Our brains are the best PKB, we just need the right tools to efficiently upload data to it (anki)

Org-mode has all of this. Non-proprietary? Easy capture? Indexing? Search? Agenda? Mobile-capture and sync? Org-drill for spaced repetition? Integration with snippet capture in browser?

Every single one checked.

I used MobileOrg for a while but couldn't figure out a seamless workflow to eliminate manually pushing and pulling on the desktop. Automatic HTML export to dropbox works fine for mobile viewing, though.

How do you do the indexing part?

Is it possible to record audio with org-mode (and then categorize/annotate)? And do quick drawings with a stylus on the screen?

> Is it possible to record audio with org-mode (and then categorize/annotate)?

I haven't seen any plugin to do audio recording from Emacs, but it should be easy (if you're willing to dig into elisp a bit; it's not hard) to invoke any kind of CLI tool for that, and then a/ move the output recording to a proper place, and b/ capture the link for org-mode.

> And do quick drawings with a stylus on the screen?

There are picture-mode and artist-mode in Emacs for that, but I haven't tested any of them with a stylus and a touchscreen, and draw in text mode.

Images and video?

Sure, Emacs can display images inline if you want it to, or just open it in another buffer ("tab") or with an external application. You just store the image in a separate file and make a link in your org-mode text file.

As for videos, I don't recall Emacs being able to play them directly, but you can easily invoke an external player.

images are important, that's how I learn countries (point on map) and birds

Agree on the brains comment for lowest latency cache, http://artofmemory.com/ - are you using Anki for language learning? Desktop or mobile? Does each user create their own cards or are these available as pre-existing content (free or paid)?

I don't learn a new lang atm but I've heard it works great for that.

There are shared decks [1] and you can create your own very easily. Whenever I stumble over a new fact throughout the day, I (try to) add it to my deck. To test the system, I learned all european countries with capitals and all US states in 4 hours total (spread over weeks, just 1-2 minutes spent every other day). It's really easy and fun.

I use both the desktop and the mobile apps, they keep in sync.

[1] https://ankiweb.net/shared/decks/

Unbelievable list of decks & add-ons. Why doesn't someone give Anki $$$$$? This is how humans will compete & interoperate with robots/algos. The format should be at least as important as ePub.

I don't see any licensing or copyright terms in the list of shared decks, are those usually specified within each deck? A web search turned up this discussion, http://lesswrong.com/lw/k7a/proposal_community_curated_anki_...

The deck sharing web app could need some serious polish (if not rewrite as a modern web app). Something like fork/pull requests and diff would be awesome. That would encourage people to contribute back and fix bugs in the decks.

I don't know about licensing/copyright of the decks.

If there was a cross-platform, open database that could be used by commercial or open-source apps, then we could have a competitive market on "organizing UX" without worrying that a policy change at our favorite indie developer or global conglomerate would orphan the data. E.g. WebDAV, CalDAV, OPML, a dublin core metadata file which accompanies artifact files (photo, pdf, warc), sqlite db, camlistore, git-annex, ..? Currently using a combination of these tools for cloud-neutral kb:

  Mind mapping:   iThoughts HD on iOS, export to PDF
  Idea capture:  OmniFocus on iOS, export to XML
  Contextual & tag search with preview snippets:  xapian + recoll on Linux
  Web clipping:  Print To PDF (wkhtmltopdf) on Firefox or Epub
  File tagging:  Calibre on Windows, plus SumatraPDF for viewing epub, pdf, djvu
  OCR:  Abbyy on Windows

Nepomuk was/is kind of a solution like that, though mostly targeted at the desktop. It would store all the data in an RDF database (with layers to speak other protocols, like IMAP) using standard formats, and applications would interact with it instead of having their own storage layer.

Supposedly RDF caused poor performance and it was replaced by https://community.kde.org/Baloo which uses sqlite+xapian. The nepomuk ontologies live on in the Digital.ME research project, which seems to be a research prototype in Java for a "cross-device social semantic desktop", but there's no active dev community.


http://www.dime-project.eu/ & http://dime-project.github.io/

Edit: recent article on Baloo: http://xmodulo.com/2014/07/kde-semantic-desktop-nepomuk-balo...

Maybe the semantic desktop should be a semantic server? Run it on a dedicated PC that also serves as NAS. It would be a relatively small price to pay for a secure pkb with cross-platform clients.

As for the research part, I think one major issue is that the workflow is too personal and not (necessarily) compatible with other people.

In our research group we ended up doing annotated bibliography in a common repo. bibtex format and git. This way the kbr grows over the time with distributed effort.

There is a common shared structure. You have to provide well written bibtex ref and a paragraph or 2 of annotations. The first paragraph has a strict structure where essentially you reply to a few questions. Other paragraphs can be added freely by each person (with your initials).

Anyway, this is just specific to research and a subset of the kbr, but I'd think also to ways where what you collect can be useful to others, and viceversa others can add to your kbr without generating mess.

IMHO I don't think we lack knowledge capture and retrieval tools as much as we lack integration between those tools. I use Zim as a desktop wiki and todo list tool, Freemind as a mind mapper, TaskJuggler 3 as a project planning tool and Zotereo as a reference capture tool - and they are all very good at what they do but they just don't work together very well. A high level of integration between tools like this would be the killer app in this space. I have hopes that Camlistore would be part of this solution.

There is a real opportunity here for evernote and even google. What he needs a personalized wikipedia, a "facebook" of googleable archive of his selected internet content.

If facebook/myspace are the social networks for people's offline life, how about a social network for people's brain in online life.

It is much more than a personal knowledgeabase per se. This may very well be a knowledge graph in google's scale but with personalized context and relativity.

looks like tiddlywiki got a refresh from the classic version, so that it now is able to run on node.js

totally. the author needs tw5 http://tiddlywiki.com/

The Scholars extension has potential for a shared note-collection / brief summary of research papers.

TW5-Scholars direct link: http://tw5.scholars.tiddlyspot.com/

I use the Outline View in MS Word to organize my bookmarks. I use 4 levels: Heading - Subheading -- Link --- Text. (Note that each level is a separate line in MS Word Outline View.)

An example:

Education - Assessment -- http://www.nytimes.com/2014/09/07/magazine/why-flunking-exam... --- test and test often to focus the learners' minds

Business - Insights --http://thecodist.com/article/lessons_from_a_lifetime_of_bein... --- find dumb people with lots of money

I find this to be a simple, easy system to organize what I find on the internet.

Hi all,

I'm Alex, the author of the blog post. I'm so honored that my post made it here to HN. :) Obviously the itch I'm trying to scratch is one that a lot people are.

The suggestions here are very helpful. I think all of us have learned to cobble together a suite of tools to do what we want, but it seems that few are totally satisfied. I don't think what I've proposed, or what other have proposed is insurmountable - we do far more complex things with computers. But given the idiosyncrasies of our individual workflows, maybe it's unrealistic to think we'll find a solution that everyone likes.

I'm going to compile a lot of the suggestions from this thread and around the web in a follow up post on the blog.

I want to quickly address some of the comments on Anki. I've been using Anki for a few years now, mostly to handle the massive amount of knowledge in medical school. I've written about it here. Anki has caught on in med school significantly.


There is no much interest in Anki for knowledge management (and retention), that I'm working on an eBook for med students about it - http://www.learningmedicinebook.com/

I used to think that everything I came across should live in my brain. And so, I went a little crazy with Anki in the beginning, capturing EVERYTHING I read. That quickly wore me down, and I had to become more discriminating. A lot of novice Anki users fall into a similar trap. I realized that not everything is worth occupying my headspace. I've been trying to come up with criteria for what should and shouldn't live in my head, but regardless, I've come to point where I want to offload most of the heavy lifting to my PKB. The really high yield bits from my PKB, however, will become Anki cards, so that the most important things I remember, and serve as 'crumbs' back to the details in my PKB. I'm going to flesh this out further in future posts. I love the enthusiasm here for spaced repetition. It's very powerful.

Last thing, regarding the comments on a better collaborative environment for Anki decks. I completely agree! I love Anki as much as the next guy, but there are some serious deficiencies. Collaboration being one of the them. I still think Anki is the best in class right now, but there is a new tool on the horizon that I'm excited about, and I think it will overtake Anki eventually.

It's called Memorang: https://www.memorangapp.com/. I'm enthusiastic about this app, and while it doesn't have everything I need yet (I think the scheduling could be better), the collaborative environment is excellent. It's worth checking it out. Perhaps one day I can integrate with my PKB.

Anyway, thanks again for checking out my post. The discussion this post has spurred is very fruitful.

Thanks for the post Alex. It's very informative and I'm looking forward to the follow up blog post. I agree with your number one use case, but it's not the most important use case there is.

The biggest reward of a PKB for me has been that sitting down to write a summary of what I read ingrains the new information in my mind. It's the writing that is the learning. I tend to reproduce the same thoughts I wrote later, even if I don't go back to read what I wrote. A PKB with little ability to organize is just as useful for me.

Building a PKB isn't even the most important problem in my life. Few people on their deathbed will say: I wish my PKB had feature x. A more important problem is you'll wish you had changed yourself to have been strong enough to make different decisions throughout your life.

So I'm going to go on a limb here and suggest something slightly preposterous about building a personal knowledgebase: you don't need one. What you need is what people fear the most: change.

I've been in grad school too, and used a PKB religiously, and came out the other side realizing I made a big mistake: I shouldn't have gone through grad school. Not only did having a PKB not help me with that, but it also distracted me from facing the real problem that I was avoiding. Organizing a PKB better was rounding error compared to that bad decision.

As mentioned by several in these comments, Org-mode is a powerful and flexible knowledgebase tool. So I wasn't surprised to find that it includes an extension that does Anki-style note drills, while allowing these notes to coexist in same knowledgebase or even same file or main heading as notes that have nothing to do with the Anki-style stuff: http://orgmode.org/worg/org-contrib/org-drill.html

> Org-mode is a powerful and flexible knowledgebase tool.

Well, this is Emacs, it has everything, kitchen sink, coffee brewing protocol client, built-in psychiatrist and a modeline cat. One just have to get used to it ;).

From my experience, org-drill is a bit rough at the edges, it takes time to figure out how to use it properly. I ended up eventually using Anki for spaced repetition, but I'm thinking about moving back to org-mode for that, as I start using Emacs much more than before.

I've had some success using Confluence hosted from Atlassian. The front-end is well-polished and the document management features work well (mostly drag and drop). There's also a back-end API to extract JSON and XML data. Paying $10/month for the hosted service avoids having to spend time on maintenance. Confluence clearly doesn't meet all of the author's requirements, but it's a nice trade-off for someone interested in a simple pkb.

Suggest people take a look at CoLearnr (http://www.colearnr.com), a project I have been working on for the last 15 months. I'm linking a mindmap to a pinboard. The pinboard supports all kinds of media, links, discussions and annotations. Don't want to hijack this thread anymore, but will be happy to have a chat prabhu at colearnr dot com

The frustrating thing is that organising information like is what the web was designed for. I think two things would make it realisable: - Browsers that supported page-editing of the page you were looking at in some sane way - A linkback/trackback solution that let you see what (in your space) linked to the page you were on.

How about a unique email account, mynameknowledge@gmail.com. Email it everything you need to know/remember. There.

.... this is not a bad idea!

I actually just went and combined that with IFTTT and DropBox, resulting in a very crude way to file org-mode notes from GMail straigth to my personal wiki.

http://imgur.com/rfEDgHn + http://imgur.com/ACekwNf.

Not perfect, but just good enough to be useful.

I use Evernote for this, as well as a couple of journals which when they're filled live on a bookcase. I tried a few more complicated systems and found that I spent more time filing the information than I was using it at times and just went for a pretty basic one.

Between Google Sheet (using a bookmarklet to dump data), WorkFlowy, Gmail and the combination of Dropbox/Onedrive. I am okay. But, there is certainly an open-source and unified solution needed.

A client would be OK, if not open-source. We may continue to use it on our own PC/Phone.

I used to use MediaWiki running on my home PC for this, but now I'm migrating to a home-built markdown-based wiki.

Quake-console accessible irl organization

I am working on this in the nyc area if anyone is interested. Definitely need a cofounder

Care to share a bit about your approach, e.g. opensource/proprietary/hybrid, local/cloud, native/web?

I want to allow rich open data formats but provide grossly superior tools for their creation, viewing, and editing in a single slick interface. People dont want a personal wiki. Or to use org mode or whatever. They want the computer from the star trek enterprise, tailored to them. Simple, clean, interactive, adaptive

I have a unique (so far) webstack that is enables alot of magic but some of the tech is new so theres alot of exploring atm.

I'm in NYC. Would love to talk it over in person. My email is on my profile page.

In my humble opinion, Weavi (weavi.com) would be the best choice for him.

Agree. Weavi is a neat product and good at collecting knowledge into a systematic view. I have been used workflowy for private outlines and Weavi for public knowledgebase. And I hear that Weavi is going to support private weavis. Not sure for premium account or not.

I use vimwiki for some lightweight knowledge.

Pen and paper

I have a solution for this: Anki. Over the past few months I've created decks for topics ranging from electronics to mechanics to convex optimization.

They're simple Q&A format and have tags, making them searchable. Best part? Export it all to the Android app. So when I'm on the bus reading a paper and I forget how to transform a second order cone program into a semidefinite program, it's a 5 second process: search "socp,sdp,transform" boom, done.

The catch, of course, is you need to take the extraordinary amount of time to sit and digitize it all. And I'm barely 10% done what I've wanted to. Still, a bit everyday pays dividends.

I'm going to throw my weight behind Anki as well. Apart from being a knowledge database, it actively works on increasing the cache for recalling knowledge, that is to say, your memory. People usually have requirements like needs to have a mobile app, needs to be open source, etc., but I'd argue that spaced repetition should also be a requirement for everyone. What's the point of accumulating knowledge if you aren't actively working on it and internalizing it?

Like the parent said, Anki takes a lot of time to digitize everything, but I'd argue that that's where the learning happens. The process of distilling knowledge into a series of flash cards is extremely personal and involved.

My workflow:

- Out in the wild, when I come across a word I don't know or trivia, I write it down or put it in my phone via a text editor, or note taking app like Google Keep.

- I'll write notes down in a notebook if I'm learning about something new, since writing helps facilitate the acquisition of knowledge.

- When I get home, or if I have time with my laptop around, I'll open up Anki, and convert everything I've learned into corresponding Decks, subdecks and cards.

- I usually do this at the end of the day, and start with "small" knowledge, like new words or trivia. After I've put them into Anki, I wipe Google Keep and whatever else is on my phone. Similarly, I then go through my notes, and since everything is still fresh, with the help of the context behind the notes, I put everything into Anki.

...and my favourite part...

- The studying phase. Every now and then, when you get a free moment, open up Anki on your Desktop or phone, and do your reviews for that day. Anki will keep statistics and even show you what time of day you're better at recalling knowledge (I seem to be a noon kind of guy). Some people like setting aside time from their day to do reviews, but I like doing them on my phone on the go.

(And since this is HN, Anki lets you customize every aspect of the spaced repetition process, for those of you that like tinkering. Although, I'd recommend reading a bit about the process itself before changing anything.)

For the lazy: http://ankisrs.net/

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact