Hacker News new | past | comments | ask | show | jobs | submit login
Annotation is now a web standard (hypothes.is)
627 points by kawera on Feb 26, 2017 | hide | past | favorite | 157 comments

Is anyone using these? The blog post is put out by a company I've never heard of, and the credits listed on it include nobody who works on a browser, no major websites, and no comment-widget company.

Historically, web standards where a committee gets together and decides how a feature is going to look without the buy-in of users or browser vendors have a very poor track record of adoption. The way actually-successful web features get standardized is that users start clamoring for it, which leads someone to build a hacked-up JS implementation of it, which leads to a company founded around that hacked-up JS implementation, which leads to competition, which leads to browser vendors building it into the browser, which leads to an open standard.

Trying to skip steps doesn't seem to work. If you build the feature without users who want it, nobody will use it. If you build the company without the prototype, you won't get a working implementation. If you build it into the browser when there's a dominant monopoly company, people will continue to use the company rather than the browser's version (this is the story of Google vs. IE+Bing & Facebook vs. RSS & semantic web). If you standardize it before it's been adopted by multiple browsers, people will ignore the standard (this is the story of RDF, the semantic web, and countless other W3C features that have fallen into the dustbin of history).

And if any one of those parties are not at the table when the standard is written, they'll ignore the standard anyway.

> The way actually-successful web features get standardized is that users start clamoring for it, which leads someone to build a hacked-up JS implementation of it, which leads to a company founded around that hacked-up JS implementation, which leads to competition, which leads to browser vendors building it into the browser, which leads to an open standard.

I'm more pessimistic. I'm wondering: Has there actually been a case where the "company founded around that hacked-up JS implementation" for a non-trivial feature did not happen to be one of the browser vendors themselves?

I used to lurk on the WHATWG mailing list and while the amounts of improvements they did to the web cannot be overstated, their descision-making process always seemed to be opaque to put it mildly: Features brought up by users where often debated back and forth without any clear goal until the spec editor basically said "we'll add it" or "we won't add it" with no clear reasons given. Other features "hit the ground running" that were apparently decided elsewhere and were only presented in the mailing list to discuss implementation details.

Sounds familiar to me. I started the Selenium project (original version was written in JS), started a company around it (Sauce Labs). Browser vendors implemented the Selenium WebDriver protocol, and now it's a W3C standard for browser automation. So, yeah.

(Note: Massive generalization of the project history though, including the part where I and other key project members worked at the browser vendors at various times. Having insiders at the browser vendors does massively help move a standard forward.)

> Is anyone using these?

Annotations? Definitely. At least one member of the Hypothesis team came from Genius (formerly Rap Genius) which is the largest annotation service on the web.

Largest on the Web sure. But have you ever seen them on a non-scummy site? I haven't.

Some sites integrate the Genius JS code directly but they don't have to. As a user, you can freely annotate (nearly) any page on the web with the bookmarklet. I made numerous annotations on the famous Bloomberg "Code" article which you can see on the page at [1] or in summary from the page at [2].

For [1] to work, you need to either run the bookmarklet or Chrome extension. This isn't the norm though; most annotated pages can be viewed directly from the genius link, and you can even link directly to a specific annotation, for example [3].

[1]: https://www.bloomberg.com/graphics/2015-paul-ford-what-is-co...

[2]: https://genius.com/summary/www.bloomberg.com%2Fgraphics%2F20...

[3]: http://genius.it/7163911/www.theverge.com/2015/6/24/8836087/...

Genius is a non-scummy site, turned itself around ages ago.

Yes, either the NYT or WaPo uses them.

I've seen genius annotations of a Trump speech on WaPo.

Why do you classify genius as scummy?

For one, the Genius proxy server violates copyright and does not provide an opt-out.

How does a proxy server violate copyright? Maybe I don't understand what exactly Genius is doing, but how is this different from say blocking ad servers in my hosts file?

Derivative works of course being protected by fair use.

And if Genius is hosting the page that's annotated, that's one thing. If they're simply proxying it on demand and adding annotations, then I don't see how it violates copyright unless every proxy server violates copyright.

Derivative works are what are banned by copyright. They are the things that are not covered by fair use. Transformative works that really change a work aren't covered by copyright because they don't really count as a copy anymore. Fair use is this other thing https://en.wikipedia.org/wiki/Fair_use#U.S._fair_use_factors

The key difference is that proxy servers don't copy content. They reflect it. You can put any URL on the internet behind a proxy server-- that does not mean that the proxy server has already copied the entire internet. Nor does it mean that once a URL has been proxied by a proxy server does the content remain cached or copied-- since in our case at Hypothesis, and we presume at Genius, it does not.

This is a key distinction that separates proxies from caches and archives.

This does more than proxy, the genius proxy alters not just the underlying code that makes the content, it alters the way the content is intended to be consumed.

I'm on mobile, and all I'm getting are AMP links (ugh), but there are a lot of results when you google "Rap Genius controversy".

They tried dirty SEO tactics that Google ultimately punished them for, and their staff have made off-color remarks in the past. There's a lot more if you search for it.

And they promptly fired Mahbod:


Things at Genius have been pretty controversy-free since 2014.

> Is anyone using these? The blog post is put out by a company I've never heard of

I spent time trying to figure out if hypothes.is even uses the protocol and data model in their own software. I'm not confident enough in my analysis to say they absolutely aren't, but I'm also not convinced they are really using it. They would probably counter that it wasn't even a standard yet, but I'm not even sure how close their current model is to this.

They did have some of their employees in the W3C working group. Several of the others seemed to be in the "library" field.

They definitely implement it; they're one of the implementations used to show implementation experience of the Web Annotations Model per the implementation report linked to from the W3C Recommendation.

As I mentioned in another comment, these tests can be trivially passed. You just paste in some JSON data. To pass "all" the tests you actually only need to implement a small handful of the fields. The rest of the tests are qualified by something like "if present" -- so you pass those by just not having anything present in your test case.

When I use the hypothes.is extension, I see queries in my browser's developer console that do not look anything at all like they are querying annotation collections as defined in the spec.

The data returned from these queries is close to the spec but includes many fields that aren't in it. These extra fields seem to be conveying important information for their application.

Edit: I ran one of their annotations through the W3C tests. Five failed tests. Including one of the most important fields:

          "name": "1:2 Implements **Annotation _id_ key** which has a **single value** that is a **string of format uri** - [model 3.1](https://www.w3.org/TR/annotation-model/#annotations)",
          "status": "FAIL",
          "message": "assert_true: ERROR: Annotation is missing id key or its value is not a single string of format uri.; expected true got false"
This is why I said I'm not confident they are actually using it.

As far as I'm aware, there's nothing in the spec that prohibits extensions to the set of fields, provided they use the required fields in their required way.

Of course, you run the risk that you end up with de-facto standards and everyone having to implement one implementation's extensions (oh, hi WebKit-prefixed properties!).

> there's nothing in the spec that prohibits extensions to the set of fields

Correct. But since it has all this "JSON-LD" stuff baked in, you need to also carefully define and publish the extra fields.

Only place I've seen them used has been to annotate the notes of a deep learning course at Stanford. http://cs231n.github.io/

I use it because I like to highlight text whether its a news article, online text book, or HN comment. Then the next time I reference the material I already have the content I need.

First, this would require that the browser vendors adopt the standard, right? How long time does that usually take? Is it a few months for Chrome/Firefox and a few years for Safari/IE, or how would that usually play out?

I wouldn't be so pessimistic. Safari has one year release cycle for major versions. IE isn't getting new features, but Edge is updated much more frequently than IE ever was.

It's varies rather extremely. No simple answer to this one. Generally, it correlates well to the interests of the browser vendor.

IE adopt a standard? Now I've heard everything. EDIT: To whomever downvoted this - I have one word: CSS. EDIT 2: To whomever downvoted after I edited the first time - I have to say, you're not making a case for your side of the argument.

> The blog post is put out by a company I've never heard of, and the credits listed on it include nobody who works on a browser…

By the way, the Genius team is backed by Marc Andreessen. While Genius and Hypothesis aren't formally related, the two annotation products are very similar just for different audiences. Tim Berners-Lee is even featured in the video at the end of the post.

This is kind of a crazy comment to have to make when the two biggest creators of the web browser support annotations. People who've worked on browsers absolutely have a vested interest in the web annotation standard.


I spent about 40 hours in December and January implementing a browser extension for Chrome and a server that speak the web annotation protocol and use the web annotation data model in this specification.

It was very easy to pass the tests the W3C working group used to verify that they had two working implementations of the data model and protocol. Most of the test default to passing if the specified tag is not present. Basically, it's not clear whether a serious, real attempt to use this has been made. I'm unconvinced that the specification is robust enough to be useful without ending up with a lot of vendor lock-in.

The toy extension was playing around with using these annotations to alert publishers and potentially other users of typos in their articles and pages. It would be nice to have a side channel to report typos other than just using the comment section or trying to find an email address. Will the "meta web" ever catch on?

I never published it but I still might add a page about my experience on my website. I have posted about the idea there before.

I'm not sure where else to complain so I will just leave this example here. I was frustrated reading the specification because it contradicts itself.

The data model has a required field called `id` which is an IRI (like a URI) that is basically a globally unique identifier.

The protocol allows an annotation to be transmitted without the `id` field attached.

Why? Is the field required or not.

In my toy implementation I had my browser client attach a v4 UUID as the `id` field before sending it to my server. But it would have still been valid without it.

> I'm not sure where else to complain so I will just leave this example here.

All of the specs, in their "Status of the Document" section, say:

> This document was published by the Web Annotation Working Group as a Recommendation. If you wish to make comments regarding this document, please send them to public-annotation@w3.org (subscribe, archives). All comments are welcome.

I'd guess, therefore, you should complain there (I have no idea if you have; I haven't looked through the archives!). Of course, there are plenty of W3C groups where specs have became pretty much totally abandoned as soon as they've reached REC, so it's totally plausible nothing will happen. :( (This tends to come about because groups are chartered to work on specs and bring them along the REC-track till they reach REC; unless a further version is being worked on there isn't necessarily any group actually with maintenance of the spec in-scope.)

I tried emailing two of the names listed at the time, for two different purposes. I got a little bit of a response from one, nothing from the other.

I read the "rules" for W3C groups and in order to actually participate you need to be a member of a big organization and all this other stuff. I'm not sure anyone would have listened.

Almost all W3C groups nowadays do almost all their work in public on public mailing lists; actually being a member of the WG is rarely a requirement for participation.

And if it was sent between the spec becoming a Candidate Recommendation and going to Proposed Recommendation, it must (in theory) have been addressed. If not, something's gone wrong process-wise with how the group was operating (and from poking around a bit, it seems likely it did). Le sigh. :\

Thank you for the explanation. This has been my only experience with the W3C system and as an outsider it was very intimidating. I'm certainly willing to accept the idea that I wasn't going about things the right way or understanding what I was reading!

Many of the smaller, newer groups with fewer people with a background in the W3C end up being somewhat dysfunctional and with odd processes and that almost certainly makes it feel harder to participate than it should be.

From prodding around a bit (notably [0], which sadly is in Member-only space, but plenty of administrivia is there, and in principle no technical work for almost all groups), it seems like every issue reported to the Working Group (regardless of where) should have ended up with a GitHub issue, with [1] being meant to have been all issues while the specs were in CR.

Pointing in the specs to a mailing list to report issues, and then relying on someone to copy them into GitHub, seems doomed to fail: it's far, far too easy for one thing to not get copied. Really the "Status of the Document" should've pointed to GitHub for new issues being filed (possibly with a fallback to the mailing list for those unable to use GitHub for organisational or other reasons).

[0]: https://lists.w3.org/Archives/Member/chairs/2016OctDec/0143.... [1]: https://github.com/w3c/web-annotation/milestone/3?closed=1

I've had this exact same urge for years now and have had a couple of false starts at tackling the issue. Glad to see 1) I wasn't the only one wanting to make those simple copy edits and 2) someone followed it through and did something about it.

I know Dan Whaley, the author of the post, personally. This is not about promoting a company. This is about allowing people with knowledge to combat fake news. He has been working to make annotation a web standard for years. The fake news that he, in particular, is worried about is climate change denial. The pages of the WSJ and much of the Web are riddled with BS. This annotation enablement will allow, for example, climate scientists to set up channels that annotate the falsehoods and point to credible sources and facts.

Thank you for saying so.

The climate change effort using Hypothesis is located at http://climatefeedback.org/

Also, it may not be readily apparent, but Hypothesis is a non-profit, particularly because we think this technology if ever widely deployed and integrated should be done in a way that aligns with the interests of citizens over the long term.

How will these annotations be seen by anyone not already on board with the message? Maybe that's all that it's for? If not, how do you keep it from being overloaded by trolls?

It would be interesting to have some kind of wikipedia system together with the annotation stuff. Community edited fact checking. "Wikifacts"

This is the first I've heard of this annotation initiative, so maybe I'm misunderstanding... Annotations are tied to a particular location within the content but maintained independently of the content and publisher?

What happens if the content changes? Random example: Someone highlights a picture of salad and notes "my favorite food!" and then the publisher changes the image to show roadkill instead of salad.

There is no built-in detection of content changes in the standard, but it is designed to be potentially robust to changes.

It depends what type of specifier you use. The data model provides a number of specifier types. A "text position selector" would lock in the annotation at a certain point in the text like the 142nd character. An "xpath selector" would use a DOM-like notation to place the annotation.

If your annotation is a "highlight" then you would need to use these selectors within a "range selector" with a specified start and end point.

If you want your annotations to be robust to content changes, you will probably need to use multiple specifier types. This is allowed by the specification but it felt very clumsy to me to implement.

If the target is available at multiple URLs, there is something available to handle it, but the hosts of the content need to add links to the canonical URL so your annotation software can use that.

Our implementation at Hypothesis uses a robust anchoring approach we call Fuzzy Anchoring. https://hypothes.is/blog/fuzzy-anchoring/

Federated comments/annotations sound really cool from a developer's point of view, but also seem like a nightmare for publishers. If you cede administrative control over the comments on your site, how do you control trolls/attacks/spam/etc.?

Services like hypothes.is can do some filtering automatically, but this is missing the level above that - editorial privileges on comments on your own domain.

In terms of controlling for trolls/spam, etc annotation service providers will need to use the same combination of automatic filtering and editorial or community moderation that are relied on widely elsewhere. The success of any individual service will presumably at least partly depend on how good of a job they do.

Ultimately annotation is something that readers choose to enable by adding to their browser, and by choosing the service they prefer-- just as they may elect to have a discussion about the page on reddit, twitter, facebook, etc.

With regard to whether sites should be able to have editorial privileges on their own domain, or to what degree should page owners have consent over annotation, we've blogged about that here: https://hypothes.is/blog/involving-page-owners-in-annotation...

There's a wide diversity of opinion. Some suggested a flag where site owners could register their preference not to be annotated. Overriding that flag would invite additional scrutiny from moderators. Others have suggested that such a flag not be able to be overridden.

We also facilitated a panel discussion on this discussion at our last I Annotate conference in 2016. https://www.youtube.com/watch?v=i2yFnu_pCGI&list=PLmuJEyeapl...

Perhaps the filtering would also be done by a third party. This isn't entirely foreign now: if you write an article online, you can't control the comments on reddit, HN or 4chan. However, users have some control over which comments they see, in that they can select which sources of comments they want, and each source has different moderation policies. The roles of publishing content and moderating comments are totally de-coupled from each other.

This is a moderator's worst nightmare. I don't think anybody on this team has ever had to deal with spam or brigading before.

It just decouples commenting communities from publishers of the content being commented on, it doesn't change (increase or decrease) the work required or tools available for moderation. In effect, it just provides a standard structure for consumer-driven discovery of what already happens in off-site discussions that already happen (e.g., via sharing and dicussing the source on social media.)

It increases the discoverability significantly. The negative information or spam that's not on your site is suddenly...on your site.

It's like making it mandatory for restaurants to have a live updating Yelp! review display scrolling at the front door.

(assuming I understand the proposal correctly...I didn't see much control there for website owners)

Certainly if browsers integrate annotation natively, it would increase the discoverability of conversation considerably which is of course the point.

I personally think if that is done it should require the user to purposely choose an annotation service as well. The actual act of inviting the conversation to pages should involve a user choice.

See further thoughts on the question of page owner consent-to-be-annotated in another of my replies above.

> It's like making it mandatory for restaurants to have a live updating Yelp! review display scrolling at the front door.

It's more like AR technology existing and allowing consumers to have any review source they choose accessible and popping up live reviews as they walk by businesses (actually, without the actual AR part, mobile virtual assistance like Google Now provide that today based on geolocation, so much it's really something that already exists in physical space extending into cyberspace); but, yes, you control your content, but not what other people say about your content, and not how other people find and share what other people have said about your content.

I'm assuming browsers will implement this natively, with a default provider, much like they do search engines.

I'm not saying it's necessarily bad.

It is, though, more discoverable than anything preceding it in this space, assuming browsers leverage it. It would likely create a fair amount of churn.

It's more about preventing offensive remarks than honest critical reviews.

I guess you'll have to choose wisely your annotations sources... If you see too much spam on 1 source, just unsubscribe. As said earlier, it's not so different of reddit/HN discuussions abouut content. The publisher may not even be aware of the discussion.

Again, it's about discoverability. If browsers support this natively, visiting a website would put up a virtual "link" of some sort to see the discussions for every url, as you visit them. There's a difference between discussion exists somewhere, and discussion is tied to your website/url.

I think it's more like customers always being able to display Yelp reviews for the restaurant they're currently standing in front of on their phones or AR glasses or whatever.

In a world where AR glasses are ubiquitous, and each pair does this by default, sure.

A lot of startups/websites exist solely and sometimes to a large extend because there is no standard annotation capability on the web, e.g. Stumbleupon, Reddit, Hackernews! (Medium highlights), comment sections on any page (even NY Times articles!) Sites not having direct control over which comments should stay and surface to the top and which ones should go is going to be huge.

In my opinion this will open up the web immensely and make the web much more democratic, will be interesting to see how major players react.

I would argue that the value provided by hackernews and reddit (and possibly stumbleupon? I'm not familiar with it) are content discovery AND comments, a huge value provided by both sites is in the comments, but being able to annotate web pages alone doesn't cover the dicoverability provided by the sites.

I think it's mainly because search engines so far have not had access to annotations, therefore the discovery had to rely primarily on what the consensus is from other properties (to a large extend property owners) about what's important as opposed to what other annotators think is important. So a new and relatively simple search engine could solve that problem.

Or maybe 4chan will have fun with it for a while, and then everyone else will decide it's a cesspit and turn it off? I don't see anything in the proposal about how comments get moderated.

In the meantime, we can keep using Hacker News.

Maybe moderation should be outsourced, e.g, AdBlock lists are crowdsourced, but there are different lists you can subscribe to. Wikipedia, Reddit, HN, FB, Youtube, Twitter, each have their own moderation policy, one could simply opt-in to a policy they choose to not necessarily maintained by a single party or an organization, but possible if desired.

Should the author have the privelage of moderating their comments? They could shave off all the comments that threaten the authors argument. Suddenly its become a single sided debate.

Debates don't have to happen on the original website. We have debates about all sorts of web pages on Hacker News (and Reddit, and Twitter, and so on), and it works out fine. Everyone knows how this works.

Authors can moderate what appears on the same web page (their own site's comments, if any) but not what appears in other places (Twitter, Reddit, Hacker News, and so on).

Putting comments from a different community on the same page as the original article is a mashup of sorts; you're mixing content from different places. This can create confusion.

It seems like if such mashups exist, they should have their own URL? Suppose that when following a link from Hacker News, you get a web page with Hacker News annotations.

What URL should that be? Maybe it should go on ycombinator.com, sort of like when you embed a YouTube video.

It's a complex subject, isn't it? I can see that possibly be an issue, and some might abuse it, just as others may use annotations in an abusive fashion as well. I can sympathize with sites wanting to some level of moderation, at least to remove abuse. How one does that is a challenge.

As said in others answers, there will be multiple annotations sources so each one will have it's own moderation to do and users will have the choice to subscribe to any or all of them.

Therefore the moderation will not have anything to do with the content owner. It's comparable to reddit or HN : would you complain that content owner have no control over discussion about their content on those sites ?

Congrats to all the hypothes.is folks and everyone who worked on this!

I have been working with the hypothes.is folks for almost 2 years and have been using hypothes.is for manual tagging and automated annotation so I'm a bit biased. I have seen criticism that the standardization process was premature but given how hard it is to get browser vendors to implement things I think this could make a difference. That said, the way Microsoft did their annotation in Edge was just to take pictures of sites.

One of my hopes is that things like annotation can pull us back from the brink of the javascript apocalypse since it is very hard to annotate arbitrary states of a running program.

I first met someone working on hypothes.is at a party about, I don't know, eight years ago? Ten years ago? They pitched me more or less this exact idea. It seemed interesting at the time.

Is this really how long it takes to realize something like this? Sort of boggles my mind.

Given how many years Ted Nelson spent on (the admittedly more ambitious) Xanadu concept, it doesn't overly surprise me.

I've been wanting to provide a high-fidelity many-to-many commenting system inside of a text editor or browser since I was in college. My thought was that if you could annotate something as complex as Shakespeare:


then you could annotate legal documents, code, and other high-density texts as well.

I've long felt that existing solutions fall down in a few ways:

1. UX -- this is a HARD UX problem because you are potentially managing a lot of information on screen at once. Anybody staring at a blizzard of comments in Word or Acrobat knows how bad this can get.

2. One-to-one -- Most existing exegesis solutions like genius.com only let you mark of one portion of text for threaded commentary, which is not ideal because complex text like the above example can have multiple patterns working in it at the same time:

http://imgur.com/x6BKKQW (a crude attempt to map assonance and consonance)

Really, what a robust commentary system needs is to map many comments to many units of text, so that the same portion of text can be annotated multiply (as this solution attempts) but also so that the same comment can be used to describe multiple portions of text as well.

3. Relationships between comments -- It's great that this solution gives threaded comments as a first-class feature, but you also want to be able to group comments together in arbitrary ways and be able to show and hide them. In my examples above, there are two systems at work: the ideational similarities between words, and the patterns of assonance / consonance. You could also add additional systems on top of this: glossing what words or phrases mean (and in Shakespeare, these are often multiple), or providing meta-commentary on textual content relative to other content, or even social commentary on the commentaries. You need a way to manage hierarchies or groups of content to do this effectively. No existing solution that I am aware of attempts this.

I literally just hired somebody yesterday to start work on a text editor that attempts to resolve some of this, but it's an exceedingly hard problem to solve with technology.

> Really, what a robust commentary system needs is to map many comments to many units of text

This is actually built into this specification. From the Web Annotation Data Model [0]:

  - Annotations have 0 or more Bodies.
  - Annotations have 1 or more Targets.
So one "Annotation" object can have multiple bodies (descriptions) attached to multiple targets.

> 3. Relationships between comments

This sounds more like an implementation detail of a client than part of the protocol or data model put forth by the W3C group.

However, I believe this can kind of be done server-side with the Web Annotation Protocol [1]'s idea of Annotation Containers. Your server can map a single annotation to multiple containers. So perhaps you have an endpoint like `http://example.com/a/` and you want to arrange a hierarchy of comments. You could provide a filtered set of the annotations at `http://example.com/a/romeo/consonance/`, and similar endpoints.

So basically what I'm saying is it seems like the protocol here isn't going to get in your way, it's just incentive to use this particular model for storing and transferring your data.

[0]: https://www.w3.org/TR/annotation-model/#web-annotation-princ...

[1]: https://www.w3.org/TR/annotation-protocol/#container-retriev...

If the protocol supports these features, then that's great and I'd love to see it adopted.

This is pretty interesting. So, personal story:

* About a year and a half ago, I thought about getting into this field. I built http://lederboard.com as a result - it works pretty well, actually (plenty of bugs behind the curtain) but the idea was to try and open it up as a standard.

* If I do pick lederboard up again, I will likely convert it to use this open standard.

* My goal was always to have the 'features' of lederboard not be in the annotations themselves, but in the moderator controls, the ability to follow sites and specific users, etc., and to basically act like reddit-enhancement-suite for an internet-wide commenting system.

* However, I realized this was a truly tremendous mountain to climb. Like, crazily huge. So I wound up going in a different direction.

In any event, I think that the guys at Genius should take note of this and consider it very seriously. They raised a whole lot of money and, as far as I can tell, this is a direct shot across their bow and it has the backing of W3C, which is huge. I am pretty happy I didn't wind up in the middle of that fight. Though maybe I might get back into at some point.

In the meantime, I am focusing on easy-to-use encryption: http://gibber.it . I think that is probably a little more important right now. For background, I am a practicing attorney with a pretty substantial practice in software, startups, corporate finance and information law.

Comments you can't turn off and can't moderate and from which you can't ban misbehaving users seem to me like they will turn immediately into a cesspool of hate, bullying and stupidity.

You'd think we'd have learned our lesson by now. Free speech, by awful people, is overrated and can result in disasters.

Its a double edged sword. I'm actually working on building an anonymous posting board for facebook where users can switch between their news feed and the anonymous board. Creating a non hateful, troll filled anonymous environment is a mighty challenge, but I dont believe its impossible and thats part of why im attempting to take this on. There are benefits and detriments to every annotation/commenting system out there. Moderation is a high priority, but i havnt been able to come up with a solution. This article also failed to mention how they imagine moderation fitting in. Is it possible to crowd source moderation, rather than centralize it? And can it be done with anonymity?

Crowdsourced moderation, like downvote buttons on certain websites, hasn't worked particularly well. Wikipedia probably has the most appropriate moderation model I've come across. I'd mostly be concerned that annotation could become a stalker tool unless services are vetted for moderation.

Crowdsourced moderation always results in groupthink, without exception. If you don't forcibly specify and ruthlessly enforce the groupthink as something respectful, considerate, reasoned and elevated, then you will get Nazis, and you will turn into Reddit. This is an iron law of the internet.

But that's generally not how annotator.js is ever implemented. It has user authentication backends (several last I checked). And it is usually used within small teams in apps like http://www.annotationstudio.org/

I wonder how Genius feels about this news. But they were the ones rooting for this future, so probably they will be happy I think.

Genius has done a lot of work on annotation pinning. I remember hearing Tom Lehman discuss it in their dev meetup series when they expanded to annotating the whole web. Still, I'm not sure that it actually took off on the greater web as lately it seems they're doubling down on rap content and industry connections.

I haven't dug into how the standard addresses all of its edge cases yet, but I hope that it handles pinning well ie, if the underlying text is edited or deleted and it's unclear whether the annotation should persist.

I really enjoyed the WaPo article that was a Trump speech with genius annotations. What that means for genius, no idea.

I like the idea of an inherent ability for annotations to exist, but I think the glue will still be annotation (read: comment) vendors. My head hurts trying to conceive of existing commenting platforms facilitating this - especially since they exist in large part due to these ease of integration thanks to their walled central storage. That said, the door for disruption is much more open than before.

Imagine if you could share annotations on a page between a certain facebook group you have, so instead of seeing a public comment annotation section you see only annotations from your group. Doesn't have to be facebook either, I'm guessing a browser plugin will come along that allows groups of people to share annotations, maybe you could even pick and choose whose annotations you want to see.

At least that's how I've imagined what they are trying to get at, haven't honestly looked too hard into the concept.

The online book Real World Haskell[1] has used a more integrated form of annotation via comments directly under each passage. It's pretty fun to observe how the comments play into the content...sometimes they're wonderfully constructive and other times they derail more useful discussion. It'll be exciting to see how these social norms evolve with the technology.

[1] sample chapter, http://book.realworldhaskell.org/read/programming-with-monad...

So who manages Annotation Central? Disqus? Google? Facebook? The State Administration of Press, Publication, Radio, Film and Television of the People's Republic of China?

Part of the goal of making it a standard, and one in particular where annotation clients could listen to multiple annotation service providers, is precisely to avoid "Annotation Central". Our client does not currently operate this way, but our intention is to move that direction this year. Many underlying architectural changes are being made now to support that change.

Okay, to make this proposal work in practice initially, we do need some annotation service provider(s).

Who is both (a) eager to be one and do the integration before there is a market/large community and (b) is trustworthy enough to be included by default in the largest likely annotation clients such as the major web browsers?

To state the obvious-- Hypothesis is eager, and is running one. If we can be so bold, we think we are also trustworthy. We knew that this could not work unless we also created a service.

We welcome all others.

What do you need a central entity for?

Both the sending and receiving parties must subscribe to the same annotation service, or the annotation won't be visible. This assumes annotation is not being controlled by the site hosting the web page.

Third-party annotation has been tried, and web site operators hated it. Look up "Third Voice".

I'd say not having a central annotation service that everybody uses, but being able to pick and choose (RSS-feed like?) is quite an important part of it, if you don't want total overload.

There's a network effect. You have to be on the same annotation service as the people whose annotations you want to see. This leads to centralization.

I think it's more akin to clustering. There might be an HN annotation service that lawers would rarely use and vice versa.

I don't really feel so - as long as there's overlap in reputation, the network effects would favour "HN annotation service" and "lawyer annotation service" to be done by the same provider, simply different people would annotate different websites simply they each visit a different subset of them.

Kind of like the Stackexchange network of sites - many different communities, but with significant overlap as many users are active in multiple domains.

I think this is great news. As someone who blogs irregularly, I don't want to spend a ton of time integrating each discussion area into my site. This seems like it could lead to a very elligent way to automatically get that integration. It would be great to also get notified as discussion is happening in the various sites, without reading the spec, it's not clear to me if that's part of the standard.

> It would be great to also get notified as discussion is happening in the various sites, without reading the spec, it's not clear to me if that's part of the standard.

Not really.

"The Publisher receives a notification for any annotations made on their site which are explicitly published publicly. There are two proposed mechanisms for this:

1) a client-side event which informs the page of the annotation's publication URLs;

2) a server-side notification sent to the publisher's “well-known” contact address.

Either or both of these mechanisms may be used, each with its own use cases." https://www.w3.org/annotation/diagrams/annotation-architectu...

Read the W3C Recommendations for the protocol and data model. Neither has any mention whatsoever of a notification or alert:



Any type of notification would be an extra feature performed by annotation software, but is not a part of the specification.

Your annotation client could query the annotation server at a known endpoint for new annotations. The results should be ordered and paginated, but the order and pagination used seem to be at the discretion of individual implementations.

Hm... that would be really sad.

For notification of comments on other pages you might be interested in the Webmention spec (which is a lot simpler and has some active use already), which is a modernized take on pingbacks by the community around indieweb.org: https://www.w3.org/TR/webmention (There even are tools already to get notifications about twitter mentions etc. via it)

webmention standard can be integrated with static HTML through webmention.io and brid.gy; it's great for no-nonsense comment integration.

Cool. I'll check that out.

Annotation itself is great, but there are other (unsolved?) problems - I just recently came across this very implementation thanks to a HN comment - after trying it out it suffers from not being able to tied to revisions of pages - install the plugin and go to the home page of any major news outlet - there are comments from years ago - that works fine if its a news article - but not on a page that changes every day if not hourly. Also to get rid of comment widgets on pages you need to be able to subscribe which I don't see any way of doing.

Does it do anything to prevent spam?

Standard annotation providers will have the same spam-prevention challenges as existing comment/annotation providers; the standard is just a way to link annotations to the annoyed material when the former is maintained with a separate party from the latter.

That would be up to the endpoints hosting annotations. They would need to build anti-spam or reputation into their server system. Not part of the specification.

If this is going to be used they will have to figure out how to deal with spam, vandalism, and harassment.

I'm especially worried about harassment, honestly. The fact that your domain can be marked up in a browser with no moderation capabilities is just going to lead to worse gamergate style brigading and harassment.


We agree that this is an issue that deserves attention. We've discussed it here.


This sounds like a great way to reduce spam and trolling. It would give you the choice to see discussion by friends_and_family group, your professional_colleagues group, your casual_social_friends group or whatever instead of by the random_youtube_comment_trolls group. A possible downside would be that the filter bubble and confirmation bias would be web-wide if a user only selects groups that they agree with (as many would be likely to do).

It'd be good for there to be a way for each site to suggest recommended annotation services.

Reputation management companies are going to love this.

Currently, negative information (even if true) isn't as easily discoverable. This ties it all to every one of your pages. With, as far as I can tell, no direct control over moderation.

I suspect many website owners are more concerned about legit complaints that aren't easily discovered than they are about spam.

And, once reputation mgmt creeps in, that good (but negative) information will be buried with astroturfed annotations.

Why care?

All your points could also be made about Twitter.

Avoiding astroturfing is a serious problem, but one that extends way beyond imaginations for this web standard.

If this plays out as outlined, browsers would natively tie the comments to your site. That's the difference.

Mentioned above, like making restaurants display live scrolling Yelp reviews at the front door.

If not, then, perhaps more like Twitter.

Only if you opt-in to a widely astroturfed (likely general-interest) public annotations provider. I know I wouldn't. I don't think there's a browser vendor who'd go for this experience as default.

Perhaps I'm not understanding. Website owners will attempt to AstroTurf whatever annotation system is associated with negative comments on their site.

If I leave an annotation with this standard must that annotation be public? Are there options for private annotations?

How much private data about my browser and my host am I leaving when an annotation is created?

Is there a practical way to delete these both from the page and the public record, or would they be stored in perpetuity?

> If I leave an annotation with this standard must that annotation be public? Are there options for private annotations?

The protocol specifies that annotations are shared over HTTP, you can have them behind whatever kind of locks you want.

> How much private data about my browser and my host am I leaving when an annotation is created?

The specification doesn't store anything like this. But since everything happens over HTTP, the client/server you use may include things like that. Since it's a standard, you should be able to use whatever programs you wish.

There are fields specified to include things like author name, creation date, etc. It will depend on your client how they are used. They aren't required.

It looks like the model has interoperability for both shared and private, but there is only one reference I can find to private annotations:

Source: https://www.w3.org/TR/annotation-model/

Hypothesis supports private "only me" annotations, as well as the ability to create ad-hoc private groups. Only about 24% of annotations made through our service are public.

This is such a great project. The potential here is just immense. I wish dwhly and everyone at hypothes.is the best of luck.

P.S. This url - https://hypothes.is/register - accessible from most pages by clicking "sign up" in the top-right corner, presents an error and doesn't redirect anywhere. https://hypothes.is/signup works fine, however.

The hypothesis team gives for their product only dev install instructions, there is only an old docker recipe, the offline install seemed to go through their website for authentication and when I asked on their IRC for proper installation instructions they said its on theirr TODO (last year).

I think that proper installation instructions,perhaps with docker compose, are more important than blog posts about annotation importance.

This is interesting. I hope it doesn't slow things down too much or become another spam vector.

Aside: That interactive SVG slide-show is pretty impressive in itself.

> That interactive SVG slide-show is pretty impressive in itself.

That is @shepazu's work.

I'm curious to see how the legal front proceeds when this becomes more popular.

Somebody will post a slanderous comment on a company's website, the company will be very unhappy, and sue the comment provider for blending the comment into the company's website.

Is that free speech? Or is the comment not protected, because it's shown on the company's website, and thus should be under the company's control?

To be honest, this sounds horrid:


An easy prediction: with wide usage of this, any page that generates a non-trivial amount of traffic will be in such a state as to make reading the annotations pointless at best.

I disagree. Dealing with large volumes of annotations is a UX challenge, but a very solvable one. Certainly current implementations don't handle pages w/ 10s of thousands or millions of annotations (think: the bible), but neither do traditional comment widgets.

Happy to have a more thoughtful discussion if you're interested.

Man, normally I hate it when people on HN talk about an article's layout or font kerning rather than its content.

However, this thing is just completely illegible without reading glasses and 150% zoom... and it's still uncomfortable even then.

I would be surprised if this company has anyone age 40 or up who actually looks at their own website on a regular basis.

I've been wanting this for ages. Didn't have time to get down to implement it myself. Hopefully somebody can finally implement it well. Guess one difficulty is commercial model. But just as Pocket and Instapaper were acquired for their data, hopefully this company (or anybody out there) can do a similar thing.

From Vannevar Bush's celebrated 1945 article, "As We May Think"[1], imagining the "memex" that is recognized as the conceptual forebear of hypertext and the web:

First, the core concept of associative indexing:

Our ineptitude in getting at the record is largely caused by the artificiality of systems of indexing. When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass... The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain.

Introducing the memex:

Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.

Associating one item with another is the essence of the memex:

This is the essential feature of the memex. The process of tying two items together is the important thing.

When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard. Before him are the two items to be joined, projected onto adjacent viewing positions. At the bottom of each there are a number of blank code spaces, and a pointer is set to indicate one of these on each item.

Adding one's own annotations and links, and then sharing them to colleagues, is the vision:

First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together. Thus he goes, building a trail of many items. Occasionally he inserts a comment of his own, either linking it into the main trail or joining it by a side trail to a particular item. When it becomes evident that the elastic properties of available materials had a great deal to do with the bow, he branches off on a side trail which takes him through textbooks on elasticity and tables of physical constants. He inserts a page of longhand analysis of his own. Thus he builds a trail of his interest through the maze of materials available to him.

And his trails do not fade. Several years later, his talk with a friend turns to the queer ways in which a people resist innovations, even of vital interest. He has an example, in the fact that the outraged Europeans still failed to adopt the Turkish bow. In fact he has a trail on it. A touch brings up the code book. Tapping a few keys projects the head of the trail. A lever runs through it at will, stopping at interesting items, going off on side excursions. It is an interesting trail, pertinent to the discussion. So he sets a reproducer in action, photographs the whole trail out, and passes it to his friend for insertion in his own memex, there to be linked into the more general trail.

Arguably we still do not have a satisfactory realization of the memex. The Web is not quite it; nor the personal Wiki, nor the personal mind-mapper, though each comes close. Perhaps the web with annotations will realize the dream? Though note that Tim Berners-Lee recognized in 1995 that even with a Memex, we might fail to organize our larger technical and social structures: "We have access to information: but have we been solving problems?"

[1] https://www.theatlantic.com/magazine/archive/1945/07/as-we-m...

[2] https://www.w3.org/Talks/9510_Bush/Talk.html

"Arguably we still do not have a satisfactory realization of the memex."


A related issue: Ted Nelson's original idea for hyperlinks had them working both ways. When one document linked to a second document, the second document would automatically get a link back to the first. His idea also had what he called "transclusion" -- sort of like block-quoting someone else's text, but with the feature that when the quoted text was updated, any document which had transcluded it would also automatically get updated.

Of course, there are some practical issues there, not the least of which is (as several others have mentioned) vulnerability to spam.

I'm using a self-made web page annotation extension on chrome everyday. It let's you mark the important information on any webpage, which is very useful for docs that you will come back and visit frequently.

this will fundamentally change the internet

This could be great news for platform interoperability and the open web.

fundamentally change how?

My guess is that It's going to improve the transparency and correctness of ideas. Every website will be like some sort of Reddit but more transparent because the publisher will not be able to censorship you. Also, cross identity. People will start having 1 public profile for every website. I can be Demi on hacker news, Reddit, Youtube, etc.

I like your thought except for having single identity. While its convenient, it could also be detrimental to the user. China is rolling out strict real name registration policies, for instance where you could become accountable for everything you say online.

Thaat would be a really wonderful thing. Although im not as much a fan as 1 profile universally.

It can be a good thing. Imagine Donald Trump saying that Obama wasn't born in USA. Obama or anyone could create an annotation with a witty response on that particular sentence or video time frame and a link to the PDF with his birth certificate. Trump could also defend himself inside the "fake news" by commenting directly and exposing the malicious attempt of manipulation.

You just described twitter.

(if its used)

This sort of reminds me of Google Sidewiki: https://en.wikipedia.org/wiki/Google_Sidewiki

This is fantastic! hypothes.is is such an inspiring project, thank you!

Anyone know of a way to annotate online the way Microsoft Word does? Where it highlights the content and points an arrow to it's annotation kept on the right side of the page?

If this is going to be such a fundamental part of the web as claimed and integral layer, Annotation seems like a peripheral term and not whatever this ends up as deserves.

I profess I don't know much about the company, but this effort is a continuation or an application of the W3C Linked Data Platform [1] initiative that are attempts to put Tim Berners-Lee's ideas [2] about the Semantic Web into practice, with renewed vigor and buy-in from many interested parties, and not speccing for its own sake.

Adoption is always the question that matters most to the public; arguably TBL's mid-2000s vision for the web as a Giant Global Graph [3] has been neatly cloned and co-opted by Facebook's concrete, incompatible, and inward-flowing Hotel California implementation [4], but if a new wave of startups and bigcorps can create a rich ecosystem using community-designed standards, the outcome may be different this time. Or maybe not, but I applaud and support them in trying, and I will evangelize the same.

What's different from the mid-2000s, you ask? For one, the ideas behind REST, despite often imperfectly or incompletely applied, have nonetheless entered community consciousness. Hard-fail-if-invalid attitudes have been replaced by a tolerance for imperfections, both in the community's rejection of XML-derived data formats, and an acceptance of the web's often haphazard, something-is-better-than-nothing nature. APIs implemented using HTTP over the Web are a mainstay instead of experimental integratons, and a new wave of commercial players is eager to exploit whatever competitive advantage against the incumbents.

The big content gardens have all pushed incompatible "protocols" (we call them APIs, but they behave like protocols) [5], which gives them network effects but also locks them (deliberately) out of the open web (i.e. a Facebook comment on a Facebook post that was spawned by sharing a web link is not a comment on the link; it's a comment on that Facebook post). Meanwhile, systems that can build on top of these standards to implement two-way data flow -- both inward and out -- can present richer experiences, while not precluding the usual business models and monetization schemes that are in use today. And even if commercially this all flops, we'll have nice specs and vocabularies to use where metadata is paramount: science, research, government, and the like.

[1] https://www.w3.org/TR/ldp/ [2] https://www.w3.org/DesignIssues/LinkedData.html [3] http://dig.csail.mit.edu/breadcrumbs/node/215 [4] https://developers.facebook.com/docs/graph-api/overview/ [5] https://news.ycombinator.com/item?id=12893852

Where will be the annotation data stored?

One of the specs is "Web Annotation Protocol—describes the transport mechanisms for creating and managing annotations in a method that is consistent with the Web Architecture and REST best practices."


So I guess they imagine you get an account with some service somewhere and then store your annotations there?

Well, I think that is the entire problem.

An open protocol gives at least a shot for compatible self-hosted/locally running solutions, assuming client makers actually implement it.

Yes, ok, you're right.

I think, however, that doing that for all the things in the world each people would like to store, is too complicated to be true.

I only say that because I was an enthusiast of things like the remoteStorage protocol -- not what it has become actually, but the idea behind it. I would prefer something like remoteStorage to be standardized instead of a different protocol for each thing.

Perhaps Urbit is in the same space as remoteStorage, only much more complicated.

The ghost of Third Voice awakes...

double edged swords, could be good for trolls and also good to fight trolls (if moderators use it)

The annotation system of the web is reddit.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact