Historically, web standards where a committee gets together and decides how a feature is going to look without the buy-in of users or browser vendors have a very poor track record of adoption. The way actually-successful web features get standardized is that users start clamoring for it, which leads someone to build a hacked-up JS implementation of it, which leads to a company founded around that hacked-up JS implementation, which leads to competition, which leads to browser vendors building it into the browser, which leads to an open standard.
Trying to skip steps doesn't seem to work. If you build the feature without users who want it, nobody will use it. If you build the company without the prototype, you won't get a working implementation. If you build it into the browser when there's a dominant monopoly company, people will continue to use the company rather than the browser's version (this is the story of Google vs. IE+Bing & Facebook vs. RSS & semantic web). If you standardize it before it's been adopted by multiple browsers, people will ignore the standard (this is the story of RDF, the semantic web, and countless other W3C features that have fallen into the dustbin of history).
And if any one of those parties are not at the table when the standard is written, they'll ignore the standard anyway.
I'm more pessimistic. I'm wondering: Has there actually been a case where the "company founded around that hacked-up JS implementation" for a non-trivial feature did not happen to be one of the browser vendors themselves?
I used to lurk on the WHATWG mailing list and while the amounts of improvements they did to the web cannot be overstated, their descision-making process always seemed to be opaque to put it mildly: Features brought up by users where often debated back and forth without any clear goal until the spec editor basically said "we'll add it" or "we won't add it" with no clear reasons given. Other features "hit the ground running" that were apparently decided elsewhere and were only presented in the mailing list to discuss implementation details.
(Note: Massive generalization of the project history though, including the part where I and other key project members worked at the browser vendors at various times. Having insiders at the browser vendors does massively help move a standard forward.)
Annotations? Definitely. At least one member of the Hypothesis team came from Genius (formerly Rap Genius) which is the largest annotation service on the web.
For  to work, you need to either run the bookmarklet or Chrome extension. This isn't the norm though; most annotated pages can be viewed directly from the genius link, and you can even link directly to a specific annotation, for example .
And if Genius is hosting the page that's annotated, that's one thing. If they're simply proxying it on demand and adding annotations, then I don't see how it violates copyright unless every proxy server violates copyright.
This is a key distinction that separates proxies from caches and archives.
They tried dirty SEO tactics that Google ultimately punished them for, and their staff have made off-color remarks in the past. There's a lot more if you search for it.
Things at Genius have been pretty controversy-free since 2014.
I spent time trying to figure out if hypothes.is even uses the protocol and data model in their own software. I'm not confident enough in my analysis to say they absolutely aren't, but I'm also not convinced they are really using it. They would probably counter that it wasn't even a standard yet, but I'm not even sure how close their current model is to this.
They did have some of their employees in the W3C working group. Several of the others seemed to be in the "library" field.
When I use the hypothes.is extension, I see queries in my browser's developer console that do not look anything at all like they are querying annotation collections as defined in the spec.
The data returned from these queries is close to the spec but includes many fields that aren't in it. These extra fields seem to be conveying important information for their application.
Edit: I ran one of their annotations through the W3C tests. Five failed tests. Including one of the most important fields:
"name": "1:2 Implements **Annotation _id_ key** which has a **single value** that is a **string of format uri** - [model 3.1](https://www.w3.org/TR/annotation-model/#annotations)",
"message": "assert_true: ERROR: Annotation is missing id key or its value is not a single string of format uri.; expected true got false"
Of course, you run the risk that you end up with de-facto standards and everyone having to implement one implementation's extensions (oh, hi WebKit-prefixed properties!).
Correct. But since it has all this "JSON-LD" stuff baked in, you need to also carefully define and publish the extra fields.
By the way, the Genius team is backed by Marc Andreessen. While Genius and Hypothesis aren't formally related, the two annotation products are very similar just for different audiences. Tim Berners-Lee is even featured in the video at the end of the post.
This is kind of a crazy comment to have to make when the two biggest creators of the web browser support annotations. People who've worked on browsers absolutely have a vested interest in the web annotation standard.
It was very easy to pass the tests the W3C working group used to verify that they had two working implementations of the data model and protocol. Most of the test default to passing if the specified tag is not present. Basically, it's not clear whether a serious, real attempt to use this has been made. I'm unconvinced that the specification is robust enough to be useful without ending up with a lot of vendor lock-in.
The toy extension was playing around with using these annotations to alert publishers and potentially other users of typos in their articles and pages. It would be nice to have a side channel to report typos other than just using the comment section or trying to find an email address. Will the "meta web" ever catch on?
I never published it but I still might add a page about my experience on my website. I have posted about the idea there before.
The data model has a required field called `id` which is an IRI (like a URI) that is basically a globally unique identifier.
The protocol allows an annotation to be transmitted without the `id` field attached.
Why? Is the field required or not.
In my toy implementation I had my browser client attach a v4 UUID as the `id` field before sending it to my server. But it would have still been valid without it.
All of the specs, in their "Status of the Document" section, say:
> This document was published by the Web Annotation Working Group as a Recommendation. If you wish to make comments regarding this document, please send them to firstname.lastname@example.org (subscribe, archives). All comments are welcome.
I'd guess, therefore, you should complain there (I have no idea if you have; I haven't looked through the archives!). Of course, there are plenty of W3C groups where specs have became pretty much totally abandoned as soon as they've reached REC, so it's totally plausible nothing will happen. :( (This tends to come about because groups are chartered to work on specs and bring them along the REC-track till they reach REC; unless a further version is being worked on there isn't necessarily any group actually with maintenance of the spec in-scope.)
I read the "rules" for W3C groups and in order to actually participate you need to be a member of a big organization and all this other stuff. I'm not sure anyone would have listened.
And if it was sent between the spec becoming a Candidate Recommendation and going to Proposed Recommendation, it must (in theory) have been addressed. If not, something's gone wrong process-wise with how the group was operating (and from poking around a bit, it seems likely it did). Le sigh. :\
From prodding around a bit (notably , which sadly is in Member-only space, but plenty of administrivia is there, and in principle no technical work for almost all groups), it seems like every issue reported to the Working Group (regardless of where) should have ended up with a GitHub issue, with  being meant to have been all issues while the specs were in CR.
Pointing in the specs to a mailing list to report issues, and then relying on someone to copy them into GitHub, seems doomed to fail: it's far, far too easy for one thing to not get copied. Really the "Status of the Document" should've pointed to GitHub for new issues being filed (possibly with a fallback to the mailing list for those unable to use GitHub for organisational or other reasons).
The climate change effort using Hypothesis is located at http://climatefeedback.org/
Also, it may not be readily apparent, but Hypothesis is a non-profit, particularly because we think this technology if ever widely deployed and integrated should be done in a way that aligns with the interests of citizens over the long term.
What happens if the content changes? Random example: Someone highlights a picture of salad and notes "my favorite food!" and then the publisher changes the image to show roadkill instead of salad.
It depends what type of specifier you use. The data model provides a number of specifier types. A "text position selector" would lock in the annotation at a certain point in the text like the 142nd character. An "xpath selector" would use a DOM-like notation to place the annotation.
If your annotation is a "highlight" then you would need to use these selectors within a "range selector" with a specified start and end point.
If you want your annotations to be robust to content changes, you will probably need to use multiple specifier types. This is allowed by the specification but it felt very clumsy to me to implement.
If the target is available at multiple URLs, there is something available to handle it, but the hosts of the content need to add links to the canonical URL so your annotation software can use that.
Services like hypothes.is can do some filtering automatically, but this is missing the level above that - editorial privileges on comments on your own domain.
Ultimately annotation is something that readers choose to enable by adding to their browser, and by choosing the service they prefer-- just as they may elect to have a discussion about the page on reddit, twitter, facebook, etc.
With regard to whether sites should be able to have editorial privileges on their own domain, or to what degree should page owners have consent over annotation, we've blogged about that here: https://hypothes.is/blog/involving-page-owners-in-annotation...
There's a wide diversity of opinion. Some suggested a flag where site owners could register their preference not to be annotated. Overriding that flag would invite additional scrutiny from moderators. Others have suggested that such a flag not be able to be overridden.
We also facilitated a panel discussion on this discussion at our last I Annotate conference in 2016. https://www.youtube.com/watch?v=i2yFnu_pCGI&list=PLmuJEyeapl...
It's like making it mandatory for restaurants to have a live updating Yelp! review display scrolling at the front door.
(assuming I understand the proposal correctly...I didn't see much control there for website owners)
I personally think if that is done it should require the user to purposely choose an annotation service as well. The actual act of inviting the conversation to pages should involve a user choice.
See further thoughts on the question of page owner consent-to-be-annotated in another of my replies above.
It's more like AR technology existing and allowing consumers to have any review source they choose accessible and popping up live reviews as they walk by businesses (actually, without the actual AR part, mobile virtual assistance like Google Now provide that today based on geolocation, so much it's really something that already exists in physical space extending into cyberspace); but, yes, you control your content, but not what other people say about your content, and not how other people find and share what other people have said about your content.
I'm not saying it's necessarily bad.
It is, though, more discoverable than anything preceding it in this space, assuming browsers leverage it. It would likely create a fair amount of churn.
In my opinion this will open up the web immensely and make the web much more democratic, will be interesting to see how major players react.
In the meantime, we can keep using Hacker News.
Authors can moderate what appears on the same web page (their own site's comments, if any) but not what appears in other places (Twitter, Reddit, Hacker News, and so on).
Putting comments from a different community on the same page as the original article is a mashup of sorts; you're mixing content from different places. This can create confusion.
It seems like if such mashups exist, they should have their own URL? Suppose that when following a link from Hacker News, you get a web page with Hacker News annotations.
What URL should that be? Maybe it should go on ycombinator.com, sort of like when you embed a YouTube video.
Therefore the moderation will not have anything to do with the content owner. It's comparable to reddit or HN : would you complain that content owner have no control over discussion about their content on those sites ?
I have been working with the hypothes.is folks for almost 2 years and have been using hypothes.is for manual tagging and automated annotation so I'm a bit biased. I have seen criticism that the standardization process was premature but given how hard it is to get browser vendors to implement things I think this could make a difference. That said, the way Microsoft did their annotation in Edge was just to take pictures of sites.
Is this really how long it takes to realize something like this? Sort of boggles my mind.
then you could annotate legal documents, code, and other high-density texts as well.
I've long felt that existing solutions fall down in a few ways:
1. UX -- this is a HARD UX problem because you are potentially managing a lot of information on screen at once. Anybody staring at a blizzard of comments in Word or Acrobat knows how bad this can get.
2. One-to-one -- Most existing exegesis solutions like genius.com only let you mark of one portion of text for threaded commentary, which is not ideal because complex text like the above example can have multiple patterns working in it at the same time:
(a crude attempt to map assonance and consonance)
Really, what a robust commentary system needs is to map many comments to many units of text, so that the same portion of text can be annotated multiply (as this solution attempts) but also so that the same comment can be used to describe multiple portions of text as well.
3. Relationships between comments -- It's great that this solution gives threaded comments as a first-class feature, but you also want to be able to group comments together in arbitrary ways and be able to show and hide them. In my examples above, there are two systems at work: the ideational similarities between words, and the patterns of assonance / consonance. You could also add additional systems on top of this: glossing what words or phrases mean (and in Shakespeare, these are often multiple), or providing meta-commentary on textual content relative to other content, or even social commentary on the commentaries. You need a way to manage hierarchies or groups of content to do this effectively. No existing solution that I am aware of attempts this.
I literally just hired somebody yesterday to start work on a text editor that attempts to resolve some of this, but it's an exceedingly hard problem to solve with technology.
This is actually built into this specification. From the Web Annotation Data Model :
- Annotations have 0 or more Bodies.
- Annotations have 1 or more Targets.
> 3. Relationships between comments
This sounds more like an implementation detail of a client than part of the protocol or data model put forth by the W3C group.
However, I believe this can kind of be done server-side with the Web Annotation Protocol 's idea of Annotation Containers. Your server can map a single annotation to multiple containers. So perhaps you have an endpoint like `http://example.com/a/` and you want to arrange a hierarchy of comments. You could provide a filtered set of the annotations at `http://example.com/a/romeo/consonance/`, and similar endpoints.
So basically what I'm saying is it seems like the protocol here isn't going to get in your way, it's just incentive to use this particular model for storing and transferring your data.
* About a year and a half ago, I thought about getting into this field. I built http://lederboard.com as a result - it works pretty well, actually (plenty of bugs behind the curtain) but the idea was to try and open it up as a standard.
* If I do pick lederboard up again, I will likely convert it to use this open standard.
* My goal was always to have the 'features' of lederboard not be in the annotations themselves, but in the moderator controls, the ability to follow sites and specific users, etc., and to basically act like reddit-enhancement-suite for an internet-wide commenting system.
* However, I realized this was a truly tremendous mountain to climb. Like, crazily huge. So I wound up going in a different direction.
In any event, I think that the guys at Genius should take note of this and consider it very seriously. They raised a whole lot of money and, as far as I can tell, this is a direct shot across their bow and it has the backing of W3C, which is huge. I am pretty happy I didn't wind up in the middle of that fight. Though maybe I might get back into at some point.
In the meantime, I am focusing on easy-to-use encryption: http://gibber.it . I think that is probably a little more important right now. For background, I am a practicing attorney with a pretty substantial practice in software, startups, corporate finance and information law.
You'd think we'd have learned our lesson by now. Free speech, by awful people, is overrated and can result in disasters.
I haven't dug into how the standard addresses all of its edge cases yet, but I hope that it handles pinning well ie, if the underlying text is edited or deleted and it's unclear whether the annotation should persist.
At least that's how I've imagined what they are trying to get at, haven't honestly looked too hard into the concept.
 sample chapter, http://book.realworldhaskell.org/read/programming-with-monad...
Who is both (a) eager to be one and do the integration before there is a market/large community and (b) is trustworthy enough to be included by default in the largest likely annotation clients such as the major web browsers?
We welcome all others.
Third-party annotation has been tried, and web site operators hated it. Look up "Third Voice".
Kind of like the Stackexchange network of sites - many different communities, but with significant overlap as many users are active in multiple domains.
1) a client-side event which informs the page of the annotation's publication URLs;
2) a server-side notification sent to the publisher's “well-known” contact address.
Either or both of these mechanisms may be used, each with its own use cases." https://www.w3.org/annotation/diagrams/annotation-architectu...
Any type of notification would be an extra feature performed by annotation software, but is not a part of the specification.
Your annotation client could query the annotation server at a known endpoint for new annotations. The results should be ordered and paginated, but the order and pagination used seem to be at the discretion of individual implementations.
Currently, negative information (even if true) isn't as easily discoverable. This ties it all to every one of your pages. With, as far as I can tell, no direct control over moderation.
I suspect many website owners are more concerned about legit complaints that aren't easily discovered than they are about spam.
And, once reputation mgmt creeps in, that good (but negative) information will be buried with astroturfed annotations.
All your points could also be made about Twitter.
Avoiding astroturfing is a serious problem, but one that extends way beyond imaginations for this web standard.
Mentioned above, like making restaurants display live scrolling Yelp reviews at the front door.
If not, then, perhaps more like Twitter.
How much private data about my browser and my host am I leaving when an annotation is created?
Is there a practical way to delete these both from the page and the public record, or would they be stored in perpetuity?
The protocol specifies that annotations are shared over HTTP, you can have them behind whatever kind of locks you want.
> How much private data about my browser and my host am I leaving when an annotation is created?
The specification doesn't store anything like this. But since everything happens over HTTP, the client/server you use may include things like that. Since it's a standard, you should be able to use whatever programs you wish.
There are fields specified to include things like author name, creation date, etc. It will depend on your client how they are used. They aren't required.
P.S. This url - https://hypothes.is/register - accessible from most pages by clicking "sign up" in the top-right corner, presents an error and doesn't redirect anywhere. https://hypothes.is/signup works fine, however.
I think that proper installation instructions,perhaps with docker compose, are more important than blog posts about annotation importance.
An easy prediction: with wide usage of this, any page that generates a non-trivial amount of traffic will be in such a state as to make reading the annotations pointless at best.
Happy to have a more thoughtful discussion if you're interested.
Somebody will post a slanderous comment on a company's website, the company will be very unhappy, and sue the comment provider for blending the comment into the company's website.
Is that free speech? Or is the comment not protected, because it's shown on the company's website, and thus should be under the company's control?
Aside: That interactive SVG slide-show is pretty impressive in itself.
That is @shepazu's work.
However, this thing is just completely illegible without reading glasses and 150% zoom... and it's still uncomfortable even then.
I would be surprised if this company has anyone age 40 or up who actually looks at their own website on a regular basis.
First, the core concept of associative indexing:
Our ineptitude in getting at the record is largely caused by the artificiality of systems of indexing. When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass... The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain.
Introducing the memex:
Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and, to coin one at random, "memex" will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.
Associating one item with another is the essence of the memex:
This is the essential feature of the memex. The process of tying two items together is the important thing.
When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard. Before him are the two items to be joined, projected onto adjacent viewing positions. At the bottom of each there are a number of blank code spaces, and a pointer is set to indicate one of these on each item.
Adding one's own annotations and links, and then sharing them to colleagues, is the vision:
First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together. Thus he goes, building a trail of many items. Occasionally he inserts a comment of his own, either linking it into the main trail or joining it by a side trail to a particular item. When it becomes evident that the elastic properties of available materials had a great deal to do with the bow, he branches off on a side trail which takes him through textbooks on elasticity and tables of physical constants. He inserts a page of longhand analysis of his own. Thus he builds a trail of his interest through the maze of materials available to him.
And his trails do not fade. Several years later, his talk with a friend turns to the queer ways in which a people resist innovations, even of vital interest. He has an example, in the fact that the outraged Europeans still failed to adopt the Turkish bow. In fact he has a trail on it. A touch brings up the code book. Tapping a few keys projects the head of the trail. A lever runs through it at will, stopping at interesting items, going off on side excursions. It is an interesting trail, pertinent to the discussion. So he sets a reproducer in action, photographs the whole trail out, and passes it to his friend for insertion in his own memex, there to be linked into the more general trail.
Arguably we still do not have a satisfactory realization of the memex. The Web is not quite it; nor the personal Wiki, nor the personal mind-mapper, though each comes close. Perhaps the web with annotations will realize the dream? Though note that Tim Berners-Lee recognized in 1995 that even with a Memex, we might fail to organize our larger technical and social structures: "We have access to information: but have we been solving problems?"
A related issue: Ted Nelson's original idea for hyperlinks had them working both ways. When one document linked to a second document, the second document would automatically get a link back to the first. His idea also had what he called "transclusion" -- sort of like block-quoting someone else's text, but with the feature that when the quoted text was updated, any document which had transcluded it would also automatically get updated.
Of course, there are some practical issues there, not the least of which is (as several others have mentioned) vulnerability to spam.
Adoption is always the question that matters most to the public; arguably TBL's mid-2000s vision for the web as a Giant Global Graph  has been neatly cloned and co-opted by Facebook's concrete, incompatible, and inward-flowing Hotel California implementation , but if a new wave of startups and bigcorps can create a rich ecosystem using community-designed standards, the outcome may be different this time. Or maybe not, but I applaud and support them in trying, and I will evangelize the same.
What's different from the mid-2000s, you ask? For one, the ideas behind REST, despite often imperfectly or incompletely applied, have nonetheless entered community consciousness. Hard-fail-if-invalid attitudes have been replaced by a tolerance for imperfections, both in the community's rejection of XML-derived data formats, and an acceptance of the web's often haphazard, something-is-better-than-nothing nature. APIs implemented using HTTP over the Web are a mainstay instead of experimental integratons, and a new wave of commercial players is eager to exploit whatever competitive advantage against the incumbents.
The big content gardens have all pushed incompatible "protocols" (we call them APIs, but they behave like protocols) , which gives them network effects but also locks them (deliberately) out of the open web (i.e. a Facebook comment on a Facebook post that was spawned by sharing a web link is not a comment on the link; it's a comment on that Facebook post). Meanwhile, systems that can build on top of these standards to implement two-way data flow -- both inward and out -- can present richer experiences, while not precluding the usual business models and monetization schemes that are in use today. And even if commercially this all flops, we'll have nice specs and vocabularies to use where metadata is paramount: science, research, government, and the like.
So I guess they imagine you get an account with some service somewhere and then store your annotations there?
I think, however, that doing that for all the things in the world each people would like to store, is too complicated to be true.
I only say that because I was an enthusiast of things like the remoteStorage protocol -- not what it has become actually, but the idea behind it. I would prefer something like remoteStorage to be standardized instead of a different protocol for each thing.
Perhaps Urbit is in the same space as remoteStorage, only much more complicated.