Hacker News new | past | comments | ask | show | jobs | submit login
How to annotate everything (beepb00p.xyz)
315 points by pcr910303 on Nov 27, 2019 | hide | past | favorite | 76 comments

One thing I hate about annotation in general is everything is siloed.

For example, I have an app on my iPad called PDF Viewer (I think) which is pretty good and has lots of annotating features. I've often used it to highlight pieces of text I'm interested in, or references I might want to follow up on

The follow ups never happen though because the annotations are locked in this app, I can't say, look on my mac to see what I've 'clipped' recently from the app on my ipad, or most other apps for that matter.

Ideally what I'd like is some sort of central place where all my 'clippings' go in a seamless manner, be it from PDFs, photos, websites, whatever.

Hypothes.is is actually a big champion of open standards. Here's their post https://web.hypothes.is/blog/annotation-is-now-a-web-standar... on the W3C Annotation standard, and you can read the standard to see how to peer your own data.

I was a little confused about the PDF annotation part of the discussion since this works for me. The author does say that he doesn't have a mac and this might be the issue. It works across all of my devices and all of my apps.

Annotation itself is part of the PDF standard. It is defined well enough that all of the apps I use preserve annotation.

The parts that are missing are both provided by Evernote.

Evernote inserts the major annotations at the start of the file. You can then read a summary of annotations in any PDF app.

Evernote was also the first that automatically converted image based pdfs to "searchable pdfs". When this feature first came out it was awesome. Now I see it everywhere. My scanner automatically converts to searchable pdfs and I have an OCR converter that does the same.

I guess I focussed too much on the PDF aspect, my desire is some sort of system where if I annotate something, be it a PDF, website, image, text file or whatever, I'd like the contents of that annotation (with maybe some context/metadata around where that annotation was from) to go to some central place.

It's tricky though because a lot of apps are siloed, and the interaction ideally would be almost seamless. Like taking a screenshot.

Yep this is what I use evernote for. When I was working I had to consume and organize a lot of proprietary information and evernote was the only thing that worked. I use evernote clipper in my browser and forward from email. You can put additional information in the note. The generic "web aggregators" don't work for me because they don't work behind a fire wall.

This is a really good idea. It means there needs to be standard way to specify the relevant location for a note inside each media format.

There has been active work on this:


It says web annotation but it was born out of an open annotation initiative. The fragment selector supports the most common media types


I'm actually partly there with that workflow. I directly annotate PDFs and then let a server side script extract relevant annotations that can be used to place the file in the right folder.

Similarly, I extract out highlighted text, but right now I don't know what to do with it, because I don't know how this workflow should function, especially with regard to updated highlights.

Microsoft Courier was designed to solve this ten years ago, by being essentially a digital version of a notebook[1]. You'd fill your notebook with digital clippings from any media and freely annotate them.

[1] https://www.youtube.com/watch?v=UmIgNfp-MdI

Yes, this siloing stops me from using annotation features "native" to each app and format as such as well.

Instead, I prefer to exfiltrate information from the silos(apps, formats, etc) and put them into my note taking system. Then I can do highlights, annotations, etc. on my own terms and also get the benefits of centralization such as searching and linking(the OP has another post describing their own system, which is pretty cool[1]). Currently I'm using Notion, which is also a silo of its own, but it's one that gives me a lot of control over how I lay my information out(and an escape plan from).

There are a lot of perspectives on this issue with data silos and walled gardens. But I'm of the opinion that it's a fairly bad state for all of us "end users". Computers to me are about infinite flexibility and malleability, but ironically the tools we have for annotation and remixing are in practice worse than what we have in the physical world. Reading a book in the physical world, I can converse with the author simply by jotting marginalia with my pencil. It's fluid, intuitive, and the medium of paper encourages it(in fact it can't help but be mutated by my use!: pages get bended, stained, torn, etc.). If I want to go further I can add post-it notes to mark interesting passages, I can xerox some pages and create subsections, if it's a magazine I can just tear them all out! That kind of flexibility just isn't available on a computer.

I think it's worth thinking really hard why we're in this state, especially since computing pioneers were actually very optimistic that data and computing would be way more personally malleable than it is now(I've been working on a small comic on this theme myself[2]). For example, check out this short demo[3] of Smalltalk where Alan Kay hooks up a single frame from an animation of a bouncing ball to a painting program, to modify that one frame while also monitoring the loop. Smarter than paper, but way more flexible.

My own thinking lately has been that developers need to think more about how their apps, like your PDF viewer, could cooperate with other apps to achieve our goals. All sorts of deep questions spring forth from here: "what is the best inter-communication system for them to cooperate with?", "how do you design them to intuitive?", "how can you make the UX as good as 'packaged apps'?". And looking at the history of personal computing, these are fairly old questions. The Unix Philosophy provides us with some clues, and its success, even in the smaller world of developer-oriented computing, gives us some hope.

Personally, I'm excited to one day live in a world where my desktop and smartphone and other devices -my computing spaces- feel less like a collection of walled gardens that refuse to intermingle, and more like one big beautiful garden, an ecosystem: lots of small, useful programs, chatting and cooperating, data freely flowing between them, each new program multiplying their collective potential and creating a new ecology, that I can adapt to my psyche and my needs, helping me be a better human.

[1] https://beepb00p.xyz/pkm-search.html

[2] https://twitter.com/yoshikischmitz/status/118845556004515840...

[3] https://youtu.be/AnrlSqtpOkw?t=607

Hey, author here! That's exciting, you basically mirror my thoughts here :)

I agree with what you're saying about siloes, that's especially sad considering that having all this stuff unified and interacting is not some sort of mad science fiction, it's totally possible with technology that we have. It's just tedious for various reasons (one of which is that demand from users isn't high in the first place).

I'm working on a browser extension that unifies annotations and highlights from different sources like pocket, instapaper, hypothesis, or even plaintext notes: https://github.com/karlicoss/promnesia . I've been using it for more than a year, hope to release it soon (few things are specific to my setup, so I need to make them simpler/clearer for other people to use).

Hey that tool looks awesome, I'm excited to see where it goes!

Can I do similar magic as shown in the demo [0] today?

I mean .. is there a system/tool/platform which allows that kind of interaction?

I know about Smalltalk instances like Squeak[1], Etoys[2], Pharo[3] .. Are these capable of what is shown in the demo?

[0] https://youtu.be/AnrlSqtpOkw?t=607

[1] https://squeak.org

[2] http://www.squeakland.org

[3] https://pharo.org

I'm not familiar enough with any of those systems to definitively say if that demo could be reproduced in them(though I'm very interested in getting acquainted with them). My impression is that all of those environments are quite powerful in different ways though.

Some other systems worth learning more about:

- https://github.com/kenperlin/chalktalk - https://dynamicland.org/

Currently I'm using Diigo for inter-platform annotation. It annotates web pages as well as PDFs, with a solid free plan.

Microsoft OneNote should be somewhere on that list. You can annotate PDF, Powerpoint or images easily. There is an extension that lets you annotate a complete webpage or a portion as well. You can annotate with your keyboard and your pen. You can also share to have collaboration with many people as well as use it across devices (Android, Mac, Windows).

Can you save your notebook in a portable format for import/export now?

I think that is a basic feature of a notebook and always felt that it was intentionally missing to suck people into the MS cloud.

Not saying a lot of other apps don't make the same mistake.

Yep, this 100%.

I 2nd OneNote. I used it extensively in college. You can migrate in chapters of a pdf textbook, OCR the text, and then highlight and annotate to your hearts content. It supports hyperlinking, which I used to make an interlinked glossary that spanned textbooks. You can 'print' any document/webpage to onenote as well as take screencaps. I still use win+shift+s to clip portions of the screen.

OneNote would be ideal but I wish I could keep the PDF file in my local directory rather than in the cloud. This is because I use Zotero where I link my PDF files stored in a local directory.

Edit: just played with OneNote once again. I realised it's also lacking bookmarking and thumbnail views.

OneNote is awesome. I have 2 problems with it though:

1 - When you link another page, that link can be broken if you move that page to another collection/folder/whatever

2 - No way to add tags to pages =/

Hey, author here. I don't have any windows computers, so haven't had chance to use Microsoft products much. I'll mention onenote anyway to prompt people, thanks!

How to annotate code? I've once in a while wanted some tool to annotate code as I explain it to new developers, or as I explore code I'm unfamiliar with, without having to write comments directly in the source code. The closest I've found to do this is bookmarks in IntelliJ using the name of the bookmark as the annotation.

Sounds like a job for Literate Programming.[1]

This could also be done, for instance, in org-mode, with embedded source-code blocks.

Normally, the source in the org's source code blocks live directly in the org document itself, but there's no reason it couldn't pull in the source from external documents instead. I wouldn't be surprised if there's isn't already an extension or org feature that does this.

[1] - https://en.wikipedia.org/wiki/Literate_programming

Why not just actually use comments? If it's worth explaining to new developers it'll likely be useful later when you've simply forgotten why something is the way it is.

Seriously. Especially when something seems like a bizarre choice in the code, I'll provide a link if there is a corresponding GitHub issue or code regarding it.

I've never had an issue with GitHub issues regarding linkrot, but it happens quite frequently when linking actual source code, especially code not in your control. Not sure what a better solution is for that, unfortunately.

I generally justify it by the anecdotal observation that the code is generally not being used anymore by the time the link dies out, but that is far from an ideal justification.

Assuming you use GitHub for source control:

- Go to GitHub.com repo and find the file (You can use 't' keyboard shortcut to find it)

- Press 'y' on keyboard to replace 'master' with its SHA1 in the URL

- Click on the interesting line of code

- You have a forever-valid link (assuming no push --force) to the exact line of code

I thought a push --force would change the hash, anyway.

I meant that push --force may make the given commit disappear if it's not referenced by any branch and the repo gets garbage collected on the backend, then the URL will 404.

Because such annotations are for you to help you comprehend, not something you want to actually leave in the codebase.

Not trying to be a smartarse, but isn't that what comments are for? If they aid your comprehension, there's a reasonable chance they'll also help someone else in the future.

There's overlap, but annotations are more throwaway and task-oriented. Once done with your code study, you wouldn't want to commit to the main repo all the little yellow highlights, the "!!!" notes, the hand-drawn arrows connecting code lines, the "does X, but ask Fred" remarks, etc.

Also, separating these out would let one to design tooling that's more suitable for reading and annotating than editing. When studying code, I'd prefer to do it on a tablet, with an experience closer to a PDF reader than IDE.

That makes sense, thanks.

I second it. And not for just teaching someone - but for understanding unfamiliar codebases. I'd like a IDE/editor-like tool that would open code read-only, but allow me to annotate it - attach notes and crosslinks to other locations in code, highlight and cross out code fragments, draw on top of it, etc. All of that on top of navigation utilities you'd expect from a code editor - text search, jump to definition, etc. A tool designed for reading, not for writing.

The closest thing I know to that experience, which I occasionally use, is printing code from editor/IDE into PDF, and using a quality PDF viewer. While useful for a couple of files I want to focus on, this doesn't help explore the codebase.

Hey, author here! You want to search for "literate programming". I'm a big fan of it, and working on a post overviewing available options for it. Stay tuned :)

Some editors allow you to create regions of text which you can collapse/expand. You can do this in Vim, for example. So you could use these regions for comments which are not visible when you're coding.

Are there tools good for aggregating and querying annotations across services for less technical users? Is there a market for better usage of annotations?

I've been avid about annotating key tools I use that offer that functionality, and inevitably creating a separate spreadsheet log across things. I'd love something that could be a hub for all my annotations.

I'm actually working on a tool that aggregates annotations across different services https://github.com/karlicoss/promnesia (screenshot is a bit old, e.g. currently it's capable of displaying inline highlights as well). I've been using it for almost a year now, but it needs few final touches before I can release it.

The 'less technical users' bit is tricky one. I am trying to make it as effortless as possible, but there is some inevitable overhead for running export scripts, configuring etc. I'm also working on a post exploring this overhead and thinking of ways to make it better :)

the main issue I have with native PDF annotation is that it mutates the file, which changes the hash. I've been thinking about building a collaborative PDF reader (using native PDF toolkits, not pdf.js) that fingerprints the file locally and pulls/pushes annotations to a central repository (either self hosted by individuals, research teams) or globally. Would love to see annotations on popular research papers or textbooks.

What you're suggesting would be amazing.

For what it's worth, the popular "Skim" PDF reader on OS X stores annotations in filesystem extended attributes, so the hash of the raw file data doesn't change (but of course this comes with the drawback of more limited interoperability with other tools).

There's Fermat's Library [1] which allows for collaborative annotation on pdfs. The only issue I have with them is that they lack highlighting and aren't open source.

With PDF annotation, why not separate the content from the content representation? Have an editor for the original content, and export to pdf. If people want to annotate, give the original content or a copy. Easier to version control this way as well.

[1] https://fermatslibrary.com/

Hey, author here! Polar Bookshelf does exactly what you want, I recommend trying it out. It's mentioned in the article here https://beepb00p.xyz/annotating.html#polar

At least in principle it is possible to annotate a PDF file while leaving the original file as a prefix of the end result. The directory of objects is at the end of the file, so the annotation software can build a new tree of objects, referencing any unchanged parts of the original, and append its own directory which the reader software will find. I don't know of any PDF editors that do this, though.

this really depends on the annotation app. ISO-conformant PDF viewers support XFDF annotations, which can either be burned into the PDF, or stored separately. the behaviour is entirely viewer-specific.

hypothes.is does exactly this, except it uses pdf.js

It doesn't matter where you load a PDF from (locally or any online source) the public annotations are overlaid.

I'd like you to consider doodledocs.com, which is my side project. I recently submitted it as a show HN [1].

It has a web annotation feature in which you can draw over a website with your stylus. I haven't seen the combination of a pressure sensitive stylus web annotation collaborative doodle app.

It's still very much an MVP though, so the collaborative aspect doesn't always work. The annotation aspect should work a lot better.

I wish I could pursue this project as a small company (I have quite a lot of ideas in this area), but I'm having a hard time finding the audience who might enjoy this (currently that says more about my marketing skills than anything else). So side project it is.

Edit: Bluebeam, Good Notes and Apple Books are missing [2]. It seems that there is no consideration for stylus-based annotation.

[1] https://news.ycombinator.com/item?id=21399910

[2] As an aside, I recommend Good Notes over Apple Books, but it's close. Bluebeam is a good application if you have a Microsoft Surface.

My company, Gold Fig(https://goldfiglabs.com), is working on something just adjacent to annotating webpages: we enable you to annotate changes you make on the web. Since so much of our work is powered by online platforms, we wanted a way to keep track of changes we were making in the tools we were using. Similar to how you would annotate a code change with a commit message, you should be able to annotate a CMS change, a platform settings change, or anything else you might be tweaking as you go about your work.

We'd love feedback from anyone who is interested in adding annotations, and especially those working on teams who could benefit from being able to share annotations.

We definitely understand that a lot of data is currently siloed, so to that end we're also interested in the annotation formats that we should be looking at. We currently export our own json format, but we would love to work well with existing tools.

Will your system work with Netflix in the browser?

Hmm, maybe. Can you provide more details about what specifically you are trying to accomplish?

Annotate movies. The introduction of the protagonist, the end of act I, document great lighting or an exceptional score.

I've created Quoter[0] to address this point in particular. I read a fair amount of materials on daily basis (books, blogs, discussions) and some times I want a to save a sentence. With Quoter it's just one click away.

To get reminded about the quotes (or annotations) that I've saved: I've configure it to show a random quote every 3 hours.

This quote was shown when I was writing this comment :)

>Every time you create something – whether it's a website, a client app, a blog post, a powerpoint presentation, or an email – ask yourself, "so what?" If you can't answer that question convincingly, reformulate and try again.

It also have the ability to export saved quotes, I a script to process the output to a page on my website[1].

[0] https://getquoter.app/ [1] https://mhasbini.com/highlights.html

Cool. It would be nice if it could also show a link to the source of the quote in the export.

Thanks, I'm working on adding this.

I think you could add the browser add-on SingleFile [1] to the list which allows you to annotate archive of webpages saved in HTML and is 100% free.

[1] https://github.com/gildas-lormeau/SingleFile

I've been pleased with Memex WorldBrain, which was recently demonstrated here on HN: you can annotate or star your pages, and every page starred/bookmarked, or visited for some seconds is searchable.

If you do a web search on DDG, your Memex results also pop up at the same time.

For videos, I have found this: https://ant.umn.edu/ , but haven't used it yet. It seems a bit limited in its annotation abilities, but nevertheless something that could be useful

Seems kind of funny and sad given how much fanfare and money was associated with them, that in both TFA and this entire discussion, no one has mentioned genius.com (http://nymag.com/daily/intelligencer/2014/12/genius-minus-th...).

any word on what a16z has this investment marked at?

Windows applications are conspicuously missing from the list.

Others mentioned OneNote. For PDF annotations, Edge browser isn't all that bad. Foxit had some features too, AFAIR (and is cross-platform). Personally, I currently use Drawboard PDF - it's UWP, but a very nice and stylus-friendly viewer with a lot of annotation capabilities and a circular menu that's extra convenient when using a stylus.

Hey, author here! Reason is I don't have any windows computers, so have no idea what's going on in windows world! (should have mentioned that) Thanks for tools listed in your comment, I'll mention these in the article.

One of the first features I built for https://histre.com/ is annotating websites. I find tagging, grouping and sharing those notes to be incredibly useful. My girlfriend was trying to shortlist Airbnbs and places to visit for a recent trip and it was satisfying to see how Histre make that really easy.

Coincidentally we're about to embark on adding annotations & workflow management to our automated visual sitemaps platform: https://visualsitemaps.com/annotations/

Does anyone know of a good guide or reference of what to write in annotations or notes? Every notetaking guide I’ve seen seems to talk exclusively about mechanics and not content, which seems like the harder part to me.

How about trying to "explain like I'm five", or writing an explanation of what you just t read from memory until you hit a wall. The wall signifies something you should go back and try to understand better.

I was hoping for something more along the lines of a book, blog, or academic paper that goes into the subject in some detail.

One thing that seems to be missing from the list is org-capture, which is lightly documented bit can duplicate quite a bit of the hypothes.is functionality, though the tweaking required is quite high.

Okular was my favorite PDF annotator until I got my Surface Pro a couple of days ago. I still haven't figured out what's the best PDF reader with pen input, any recommendations?

Could have used this list while doing research for my Masters! However, I don't think I was as well-versed in emacs back then either, ha ha.

If you're interested in annotation, you may also enjoy these standards/communities, which aren't referenced in the article:

- Open Annotation Data Model[1]

- Web Annotation Vocabulary[2]: I built a thing to convert kindle highlights/notes into web annotation vocabulary jsonld[3]

- Web Annotation Protocol[4]

Great to see https://hypothes.is mentioned. Great people on that team, and a great example of a dedicated tech nonprofit.

Just this week I have been fulfilling our first customer order to provide hosting for hypothesis annotation server on permanent.cloud[5].

One area of needed development, however, is being able to use the browser extension[6], which is great, with your own installation and not just the main https:/hypothes.is domain. I'll probably contribute that sooner or later, or else continue work on this other simpler browser extension I started[7]

Lastly, there are some interesting tricky CS/NLP problems of 'annotating everything', e.g. how to render annotations on the right 'target' of what was annotated, especially if that target has changed a little bit, e.g. fixing a typo or something, which is common in web publishing / news. In 2014 I worked for a company that made an annotation-as-a-service product, and I wrote a bit about how Locality-sensitive Hashing can be applied[8]

[1]: http://www.openannotation.org/spec/core/

[2]: https://www.w3.org/TR/annotation-vocab/

[3]: https://github.com/gobengo/kindle-web-annotations

[4]: https://www.w3.org/TR/annotation-protocol/

[5]: https://permanent.cloud/apps/hypothesis

[5]: https://github.com/hypothesis/browser-extension

[6]: https://github.com/gobengo/kindle-web-annotations

[7]: https://github.com/gobengo/web-annotation-extension

[8]: https://gobengo.tumblr.com/day/2014/04/22

Are there any active projects building around Web Annotation? It seemed interesting, but when it was released it also seemed like a lot of talk with not even experimental implementations (to the point where it seemed not enough to actually be released as a standard). Hypothesis was loudly praising it, all the while keeping their product on their own protocols.

What about annotating e-mails? Any tools that can aggregate such annotations with other types (e.g. web)?

I suggest looking into org-mode in emacs to take notes on emails. However, I have not yet figured out an annotation method that works. Would be good to have one, but given that the emails are not long, may be a little unnecessary.

I would suggest "Article Reader" on Android. I have been using it for a while now.

Helpful list, thanks for sharing.

I really appreciate this resource, though it largely underlines the fact that there really aren't any good comprehensive solutions.

Zotero appears conspicuously missing. It's an articles / references management tool. I've not found it especially useful myself, though it has its fans.

My own previous explorations, somewhat disorganised:

"Sources and tools for references, particularly document scanning and conversion"


"Organising and planning research activities"


Regarding digital formats generally:

- Most formats are themelves opaque / resistant to annotations, in some way, shape, or form.

- Tools that enable annotations of one type of reference ... often don't support others. As several comments note, this results in various application or service-based silos of notes.

- Index cards remain remarkably useful. I use a modified POIC / Zettelkasten method:


Mostly, I've been looking to a generalised system for organising and working with materials under the working titles of KFC (Krell Functional/Fine Context), docFS (as in "Document Filesystem"), and webFS (as in Web filesystem, see: "What if the Web was fileystem accessible" https://old.reddit.com/r/dredmorbius/comments/6bgowu/what_if...).

The idea of creating not an application, but an OS-level (and ultimately, hopefully, OS-agnostic) system for integrating metadata and relationships amongst documents, is at its heart.

This does seem like a surprisingly common problem, and one with numerous partial solutions, but no generally-available, general-purpose system seems to exist. Given that this notion dates back to (or before!) Vannevar Bush's Memex discussion, this is ... both surprising and disappointing.

I'm about 98.2331% certain that copyright has a major share of the blame, as an effective system for dealing with digitised documents would all but certainly have to involve duplicating and reverse-engineering them in multiple regards. My solution to the HTML / PDF / ePub, etc., document formats is to recompose them as some minimally sufficient document format (often Markdown, occasionally more advanced formats, with LaTeX being nearly always sufficient). This has resulted in a significant detour through questions concerning typography and just what a document is, though in a huge fraction of cases, there's little reason to go beyond paragraphs, the occasional italic/bold emphasis, and section or chapter markings.

In trying to decompose Web content, it's almost always simplest to simply dump the document to plain ASCII, then re-introduce any needed markup. Parsing HTML itself to a normalised form is a fool's errand. (As a fool, I've been on that errand many times.)

It's possible to start from raw ASCII text for a book-length work and re-introduce Markdown sufficient to create HTML, PDF, and ePub endpoints in an hour or so, for a fictional work with no significant typographic concerns. I've even resorted to hand-typing works on occasion. Excessive of itself, but with the added bonus of being an effective active-reading technique.

That said: none of my methods, nor those listed here, satisfy me.

Though Wallabag deserves a closer look.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact