
Ask HN: Does the web need a `license` tag? - DarkWiiPlayer
Okay, it doesn&#x27;t have to be a tag, nor be called license. The real, somewhat longer question would be: Do HTML and similar technologies need a mechanism to encode licensing &#x2F; copyright information in a uniform, machine-readable way? This could take the shape of a tag in the header of a document, an attribute that can be used with any or just a few tags, etc.<p>The web has shifted from people hosting their own content on a website or blog, with a copyright notice in the footer, to submitting larger platforms, be it imgur and similar, social media, or sites that focus more heavily on text as their content, even including hacker news itself.<p>How should a crawler know, if it may or may not take some random text from a website and display it somewhere else? How can it know, who to credit, or whether to credit someone at all?<p>Searching google hasn&#x27;t brought up any way to encode any of this in HTML, maybe I&#x27;m missing something? If there really isn&#x27;t any such mechanism, should there be one? What should it look like?
======
mftrhu
There actually is a microformat specification for this, rel="license" [1],
which you can use like this:

    
    
      <!-- In the footer -->
      [...] <a rel="license" href="https://creativecommons.org/licenses/by-sa/4.0/">CC BY-SA 4.0</a> [...]
    

Alternatively, as the microformats page also points out, you could use
`<meta>` or `<link>` tags [2] with a well-known ontology like Dublin Core [3]:

    
    
      <!-- In the head -->
      <link rel="DCTERMS.license" href="https://creativecommons.org/licenses/by-sa/4.0/" />
    

Which I'm fairly sure would be recognized and correctly extracted by scrapers
like Zotero's.

[1] [http://microformats.org/wiki/rel-
license](http://microformats.org/wiki/rel-license)

[2] [http://www.dublincore.org/specifications/dublin-core/dcq-
htm...](http://www.dublincore.org/specifications/dublin-core/dcq-html/)

[2] [http://purl.org/dc/terms/](http://purl.org/dc/terms/)

------
clintonb
What problem do you think a licensing tag would solve?

> How should a crawler know, if it may or may not take some random text from a
> website and display it somewhere else?

The recommendation I follow is to use robots.txt to disable crawling of
specific pages or paths. If I really want information protected I either (a)
put it behind authentication or (b) don't put it on a public website.

> How can it know, who to credit, or whether to credit someone at all?

Why does the content need credit at the search engine level? If the reader
cares, the reader will navigate to the page and determine the author.

~~~
DarkWiiPlayer
> What problem do you think a licensing tag would solve?

Crawlers could, with more granularity, decide what they may embed and how.

> Why does the content need credit at the search engine level?

It's not just about search engines though.

> The recommendation I follow is to use robots.txt to disable crawling of
> specific pages or paths.

Thay way you're just locking bots out. It's a completely different thing. This
is not about protection; it's about communication.

> If I really want information protected I either (a) put it behind
> authentication or (b) don't put it on a public website.

If you want to avoid people stealing things they shouldn't, yeah, that's your
only option. If all you want is to communicate whether or not they _may_ take
your content, you don't really have any tools to do so. Again: Communication;
not protection.

> If the reader cares, the reader will navigate to the page and determine the
> author.

No. That's not how it works. You can't expect users to click through several
links to find the author of a resource. Unless you link directly to the
resource on the authors site, the author should in many cases be credited
directly next to where the resource is embedded.

------
Adamantcheese
Kinda like robots.txt, there's nothing preventing a crawler from ignoring it
or omitting it and just doing whatever it wants. Slapping in a massive header
comment might work.

------
krapp
That's what meta tags are already useful for:

    
    
        <meta name="license" value="..." />

------
Spooky23
If you need fine grained rights management, it doesn’t belong on the public
web.

~~~
gtsteve
I don't believe OP is talking about rights management like DRM but more of an
automated way to declare what license something is published under. We have
human readable methods of doing this but I don't believe there is a standard
for an automated system to identify this.

