Hacker News new | past | comments | ask | show | jobs | submit login
Aberdeenshire memorial inscriptions records removed from Ancestry (scottishgenes.blogspot.com)
58 points by ilamont 14 days ago | hide | past | favorite | 26 comments



The headline seems a bit sensationalized.

From what I can tell Ancestry has a webcrawler that respects robots.txt files, crawls publically available webpages, and they offer search engine results that link to the original page.

This seems exactly like what Google and every other search engine does?


Exactly. These sites are kind of specialized search engines that work with lots of public records. Using the word "stealing" for what's already public information doesn't seem right. It's not even like we're talking about copyrightable text. These are facts.


After reading the article, I can only conclude that Ancestry is being a perfectly good actor on the web? They're respecting robots.txt files, they're properly summarizing with source citation instead of verbatim copying, etc. and they make that information available for free. That's literally doing what every search engine on the web has ever done. Better, honestly, because not all search engines respect robots.txt in the slightest.

If anything, both the isle of man and aberdeenshire memorial folks seem to misunderstand what the web is, and what putting data on the web means.

(And from a legal perspective, I am amazed they can lay any sort of claim to the genealogical facts; you can certainly own the copyright for "an entire work", but you can't own the fact that person X and person Y begot person Z)


I'm not a huge fan of these types of databases (for profit), but I think the title is a bit "clickbaity."

The original title is "Aberdeenshire memorial inscriptions records removed from Ancestry"


Ancestry seems to be moving toward tiered and segmented offerings.

Ancestry bought newspaper.com and rolled it into their data but Ancestry members need to pay extra (or extra x3) to access that data.

Existing NPcom subscribers clip articles and make them available. However, I often see those clips get downsampled to the point where they're ~unreadable. It's so common I suspect Ancestry is purposefully crapping on people's good will.

But then yesterday, I followed the 'free trial' breadcrumb bait (card req) from a result and unexpectedly found an OCR of the clipping I needed. The text was NPcom's usual globawful machine-reading gibberish - but I could suss out the info I needed.

All that said, building out a free tier seems to be part of Ancestry's strategy - to bait as many people as possible to the subscriber door. There are more subscriber doors behind it but prospective users won't know that.

Meanwhile FamilySearch is free but it's one-community-tree. I (many users) make a point of working out difficult/confusing families and recording them there.

FS records are worth the cost of admission though. The existing profiles are no worse than you'll find on Ancestry (generally better).

source: 25k profiles on Ancestry, 15k on FS.


It’s not stealing to make and use copies of data that is published to the entire web for anyone to see and download.

By that damaged logic, Google is “stealing” the entire web’s data by making a copy of it.


Seriously, this is what the internet is all about. Ancestry did nothing wrong. Why do these other website owners have to make the world a worse place for no tangible benefit and cause a stink? Just to feel powerful and spite a "big guy" for being big?


Totally agree, if ancestry was republishing these resources or they weren't freely available then it would be copyright infringement/IP theft.

But indexing the content so those records can be searched and linked to on ancestry's platform? Totally fair game. If you don't want people indexing your content put it behind a log-in, you don't even have to charge for it.


I think there's a fair debate to be had around google's use of 3rd pty content especially in an age of AMP links and rich results.

Though I agree ancestry.com is following pretty standard practices here (they're not charging, robots.txt is followed, sources are attributed and linked).


If the material is copyrighted then reuse should be licensed. Otherwise I believe you’re in breach of copyright — whether that technically falls into the category of ‘theft’, I’m not sure, but it’s not legal (outside of ‘fair use’)


American here - agree that copyright is important and has been trampled in the digital age.. however no one is innocent in this case, rather it appears to be a dog pile of commercial interest in public records. Ancestry dot com are crass commercial managers who took advantage of the commercialization of the web

source: the 90s purchase of the rootsweb mailing list assets by Ancestry dot com, their transaction records and subsequent paywall garden with lots of extras


Just because something can be found on the internet does not mean you can sell it without permission.


As the article says, Ancestry wasn’t selling this data and was linking back to where it had come from. Ancestry also says they will honor robots.txt with instructions not to scrape data.


You are right. I was just reacting to the idea that anything you find on webpage is yours to use without restriction. Think about pictures of my kids on social media!

Thank you for clarifying.


The OP wrote “make and use copies of data” — seems a reasonable response


Per the article they are not selling anything:

> Access to web records is free. No one needs to subscribe or register with Ancestry to view these records.

> Web records are attributed to the content publishers.

> They're easily available. Prominent links make it easy to access the source website.

> We follow web standards for restricting crawling (robots.txt files). If a website has a robots.txt file that prohibits crawling the genealogical records, we don't search those records. If records from your website are included and you'd like them removed, please send a request to websearch@an cestry .com.


Actually, it does. Facts can’t be copyrighted. If I find facts on the public internet, it’s legal to sell and use in most jurisdictions around the world.

If I don’t want the world to see material, I don’t give it away for free to everyone.


While an individual fact can't be copyrighted, a collection of facts does have database right protections in the EU & UK: https://en.wikipedia.org/wiki/Database_right


Yes that's true, but copyright is not the only restriction about the use of data you find online. There's trademark and privacy to consider. I don't think either thing is likely to apply to dead people :)


Trademark also doesn’t apply to facts. And if data is published on the open internet then privacy is most likely not an issue. The genealogy is already published in the open.

So I don’t think those elements are relevant to the conversation without more context.


I think this is the key point that lots of people need to understand.

Your photo of a headstone? You own the copyright.

Grandfather died on 9 June 1938 & is buried at X? Not copyrightable.


Seems like Ancestry's policy here is pretty in line with how the web mass been treated for decades.

It's largely newcomers who get surprised that when they add content to the world's network of knowledge, that content gets accessed, indexed, and recontextialised for maximum convenience.

It's been this was since web rings have way to automated crawling and search.


Is anyone scraping Ancestry.com?


Clearly lots of people are trying but they have very strict anti-crawler measures that often trigger for me (a human) if I browse too quickly.


I like that you clarified that you are a human. Just to be safe. AGI is coming or something along those lines.


Well that's the huge irony. Ancestry paywalls almost everything useful, so you need to set up a monthly subscription to see the stuff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: