For example, I have an app on my iPad called PDF Viewer (I think) which is pretty good and has lots of annotating features. I've often used it to highlight pieces of text I'm interested in, or references I might want to follow up on
The follow ups never happen though because the annotations are locked in this app, I can't say, look on my mac to see what I've 'clipped' recently from the app on my ipad, or most other apps for that matter.
Ideally what I'd like is some sort of central place where all my 'clippings' go in a seamless manner, be it from PDFs, photos, websites, whatever.
Annotation itself is part of the PDF standard. It is defined well enough that all of the apps I use preserve annotation.
The parts that are missing are both provided by Evernote.
Evernote inserts the major annotations at the start of the file. You can then read a summary of annotations in any PDF app.
Evernote was also the first that automatically converted image based pdfs to "searchable pdfs". When this feature first came out it was awesome. Now I see it everywhere. My scanner automatically converts to searchable pdfs and I have an OCR converter that does the same.
It's tricky though because a lot of apps are siloed, and the interaction ideally would be almost seamless. Like taking a screenshot.
It says web annotation but it was born out of an open annotation initiative. The fragment selector supports the most common media types
Similarly, I extract out highlighted text, but right now I don't know what to do with it, because I don't know how this workflow should function, especially with regard to updated highlights.
Instead, I prefer to exfiltrate information from the silos(apps, formats, etc) and put them into my note taking system. Then I can do highlights, annotations, etc. on my own terms and also get the benefits of centralization such as searching and linking(the OP has another post describing their own system, which is pretty cool). Currently I'm using Notion, which is also a silo of its own, but it's one that gives me a lot of control over how I lay my information out(and an escape plan from).
There are a lot of perspectives on this issue with data silos and walled gardens. But I'm of the opinion that it's a fairly bad state for all of us "end users". Computers to me are about infinite flexibility and malleability, but ironically the tools we have for annotation and remixing are in practice worse than what we have in the physical world. Reading a book in the physical world, I can converse with the author simply by jotting marginalia with my pencil. It's fluid, intuitive, and the medium of paper encourages it(in fact it can't help but be mutated by my use!: pages get bended, stained, torn, etc.). If I want to go further I can add post-it notes to mark interesting passages, I can xerox some pages and create subsections, if it's a magazine I can just tear them all out! That kind of flexibility just isn't available on a computer.
I think it's worth thinking really hard why we're in this state, especially since computing pioneers were actually very optimistic that data and computing would be way more personally malleable than it is now(I've been working on a small comic on this theme myself). For example, check out this short demo of Smalltalk where Alan Kay hooks up a single frame from an animation of a bouncing ball to a painting program, to modify that one frame while also monitoring the loop. Smarter than paper, but way more flexible.
My own thinking lately has been that developers need to think more about how their apps, like your PDF viewer, could cooperate with other apps to achieve our goals. All sorts of deep questions spring forth from here: "what is the best inter-communication system for them to cooperate with?", "how do you design them to intuitive?", "how can you make the UX as good as 'packaged apps'?". And looking at the history of personal computing, these are fairly old questions. The Unix Philosophy provides us with some clues, and its success, even in the smaller world of developer-oriented computing, gives us some hope.
Personally, I'm excited to one day live in a world where my desktop and smartphone and other devices -my computing spaces- feel less like a collection of walled gardens that refuse to intermingle, and more like one big beautiful garden, an ecosystem: lots of small, useful programs, chatting and cooperating, data freely flowing between them, each new program multiplying their collective potential and creating a new ecology, that I can adapt to my psyche and my needs, helping me be a better human.
I agree with what you're saying about siloes, that's especially sad considering that having all this stuff unified and interacting is not some sort of mad science fiction, it's totally possible with technology that we have. It's just tedious for various reasons (one of which is that demand from users isn't high in the first place).
I'm working on a browser extension that unifies annotations and highlights from different sources like pocket, instapaper, hypothesis, or even plaintext notes: https://github.com/karlicoss/promnesia . I've been using it for more than a year, hope to release it soon (few things are specific to my setup, so I need to make them simpler/clearer for other people to use).
I mean .. is there a system/tool/platform which allows that kind of interaction?
I know about Smalltalk instances like Squeak, Etoys, Pharo .. Are these capable of what is shown in the demo?
Some other systems worth learning more about:
I think that is a basic feature of a notebook and always felt that it was intentionally missing to suck people into the MS cloud.
Not saying a lot of other apps don't make the same mistake.
Edit: just played with OneNote once again. I realised it's also lacking bookmarking and thumbnail views.
1 - When you link another page, that link can be broken if you move that page to another collection/folder/whatever
2 - No way to add tags to pages =/
This could also be done, for instance, in org-mode, with embedded source-code blocks.
Normally, the source in the org's source code blocks live directly in the org document itself, but there's no reason it couldn't pull in the source from external documents instead. I wouldn't be surprised if there's isn't already an extension or org feature that does this.
 - https://en.wikipedia.org/wiki/Literate_programming
The closest thing I know to that experience, which I occasionally use, is printing code from editor/IDE into PDF, and using a quality PDF viewer. While useful for a couple of files I want to focus on, this doesn't help explore the codebase.
I've never had an issue with GitHub issues regarding linkrot, but it happens quite frequently when linking actual source code, especially code not in your control. Not sure what a better solution is for that, unfortunately.
I generally justify it by the anecdotal observation that the code is generally not being used anymore by the time the link dies out, but that is far from an ideal justification.
- Go to GitHub.com repo and find the file (You can use 't' keyboard shortcut to find it)
- Press 'y' on keyboard to replace 'master' with its SHA1 in the URL
- Click on the interesting line of code
- You have a forever-valid link (assuming no push --force) to the exact line of code
Also, separating these out would let one to design tooling that's more suitable for reading and annotating than editing. When studying code, I'd prefer to do it on a tablet, with an experience closer to a PDF reader than IDE.
I've been avid about annotating key tools I use that offer that functionality, and inevitably creating a separate spreadsheet log across things. I'd love something that could be a hub for all my annotations.
The 'less technical users' bit is tricky one. I am trying to make it as effortless as possible, but there is some inevitable overhead for running export scripts, configuring etc. I'm also working on a post exploring this overhead and thinking of ways to make it better :)
For what it's worth, the popular "Skim" PDF reader on OS X stores annotations in filesystem extended attributes, so the hash of the raw file data doesn't change (but of course this comes with the drawback of more limited interoperability with other tools).
With PDF annotation, why not separate the content from the content representation? Have an editor for the original content, and export to pdf. If people want to annotate, give the original content or a copy. Easier to version control this way as well.
It doesn't matter where you load a PDF from (locally or any online source) the public annotations are overlaid.
It has a web annotation feature in which you can draw over a website with your stylus. I haven't seen the combination of a pressure sensitive stylus web annotation collaborative doodle app.
It's still very much an MVP though, so the collaborative aspect doesn't always work. The annotation aspect should work a lot better.
I wish I could pursue this project as a small company (I have quite a lot of ideas in this area), but I'm having a hard time finding the audience who might enjoy this (currently that says more about my marketing skills than anything else). So side project it is.
Edit: Bluebeam, Good Notes and Apple Books are missing . It seems that there is no consideration for stylus-based annotation.
 As an aside, I recommend Good Notes over Apple Books, but it's close. Bluebeam is a good application if you have a Microsoft Surface.
We'd love feedback from anyone who is interested in adding annotations, and especially those working on teams who could benefit from being able to share annotations.
We definitely understand that a lot of data is currently siloed, so to that end we're also interested in the annotation formats that we should be looking at. We currently export our own json format, but we would love to work well with existing tools.
To get reminded about the quotes (or annotations) that I've saved: I've configure it to show a random quote every 3 hours.
This quote was shown when I was writing this comment :)
>Every time you create something – whether it's a website, a client app, a blog post, a powerpoint presentation, or an email – ask yourself, "so what?" If you can't answer that question convincingly, reformulate and try again.
It also have the ability to export saved quotes, I a script to process the output to a page on my website.
If you do a web search on DDG, your Memex results also pop up at the same time.
Others mentioned OneNote. For PDF annotations, Edge browser isn't all that bad. Foxit had some features too, AFAIR (and is cross-platform). Personally, I currently use Drawboard PDF - it's UWP, but a very nice and stylus-friendly viewer with a lot of annotation capabilities and a circular menu that's extra convenient when using a stylus.
- Open Annotation Data Model
- Web Annotation Vocabulary: I built a thing to convert kindle highlights/notes into web annotation vocabulary jsonld
- Web Annotation Protocol
Great to see https://hypothes.is mentioned. Great people on that team, and a great example of a dedicated tech nonprofit.
Just this week I have been fulfilling our first customer order to provide hosting for hypothesis annotation server on permanent.cloud.
One area of needed development, however, is being able to use the browser extension, which is great, with your own installation and not just the main https:/hypothes.is domain. I'll probably contribute that sooner or later, or else continue work on this other simpler browser extension I started
Lastly, there are some interesting tricky CS/NLP problems of 'annotating everything', e.g. how to render annotations on the right 'target' of what was annotated, especially if that target has changed a little bit, e.g. fixing a typo or something, which is common in web publishing / news. In 2014 I worked for a company that made an annotation-as-a-service product, and I wrote a bit about how Locality-sensitive Hashing can be applied
Zotero appears conspicuously missing. It's an articles / references management tool. I've not found it especially useful myself, though it has its fans.
My own previous explorations, somewhat disorganised:
"Sources and tools for references, particularly document scanning and conversion"
"Organising and planning research activities"
Regarding digital formats generally:
- Most formats are themelves opaque / resistant to annotations, in some way, shape, or form.
- Tools that enable annotations of one type of reference ... often don't support others. As several comments note, this results in various application or service-based silos of notes.
- Index cards remain remarkably useful. I use a modified POIC / Zettelkasten method:
Mostly, I've been looking to a generalised system for organising and working with materials under the working titles of KFC (Krell Functional/Fine Context), docFS (as in "Document Filesystem"), and webFS (as in Web filesystem, see: "What if the Web was fileystem accessible" https://old.reddit.com/r/dredmorbius/comments/6bgowu/what_if...).
The idea of creating not an application, but an OS-level (and ultimately, hopefully, OS-agnostic) system for integrating metadata and relationships amongst documents, is at its heart.
This does seem like a surprisingly common problem, and one with numerous partial solutions, but no generally-available, general-purpose system seems to exist. Given that this notion dates back to (or before!) Vannevar Bush's Memex discussion, this is ... both surprising and disappointing.
I'm about 98.2331% certain that copyright has a major share of the blame, as an effective system for dealing with digitised documents would all but certainly have to involve duplicating and reverse-engineering them in multiple regards. My solution to the HTML / PDF / ePub, etc., document formats is to recompose them as some minimally sufficient document format (often Markdown, occasionally more advanced formats, with LaTeX being nearly always sufficient). This has resulted in a significant detour through questions concerning typography and just what a document is, though in a huge fraction of cases, there's little reason to go beyond paragraphs, the occasional italic/bold emphasis, and section or chapter markings.
In trying to decompose Web content, it's almost always simplest to simply dump the document to plain ASCII, then re-introduce any needed markup. Parsing HTML itself to a normalised form is a fool's errand. (As a fool, I've been on that errand many times.)
It's possible to start from raw ASCII text for a book-length work and re-introduce Markdown sufficient to create HTML, PDF, and ePub endpoints in an hour or so, for a fictional work with no significant typographic concerns. I've even resorted to hand-typing works on occasion. Excessive of itself, but with the added bonus of being an effective active-reading technique.
That said: none of my methods, nor those listed here, satisfy me.
Though Wallabag deserves a closer look.