Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Relationship between HN and Scribd?
49 points by nkurz on Feb 14, 2010 | hide | past | web | favorite | 39 comments
I recently submitted a link to a PDF paper. It first appeared as I submitted it, but after some time the title had been changed to include a Scribd link. I found this off-putting: it implied either that I was encouraging the use of Scribd, or that the author had already chosen to use Scribd as a publisher.

What is the relationship between Hacker News and Scribd? Are all PDF links modified to link to Scribd automatically, or is this done by hand on a case-by-case basis? And why is the use of Scribd officially encouraged, while primary non-framed links are preferable for HTML content?

I think Scribd is a Y-combinator funded startup (hence the Hacker News connection) and some of the Hacker News admins by-hand edit other people's posts. I am very against copying PDFs into a walled-garden like Scribd, but this isn't my website.

If true, that seems downright unethical to me. I've actually met (and possibly drank with) some of the Scribd founders and they seem like really nice, smart fellows, but this practice really shouldn't be encouraged (unless there is a good reason behind it other than to promote Scribd).

It's certainly there to promote Scribd. I don't believe that HN has ever claimed to be an independent, impartial site. It's on a subdomain of ycombinator.com, it is heavily frequented by people who are either involved in YC, or people who want to be. Any story that is linked to a YC company is going to get a disproportionate amount of attention here, and I doubt that YC will hesitate to integrate links/widgets from YC companies as they've done with Scribd, CO2stats, and others.

I'm pretty sure that one of the purposes of the site is to attract people to YC. If you want to know why people like myself (who are just here for the hacker news/discussion) stick around, it's because it has some of the most interesting and civil discussion of any online community, and if you know about the YC bias then it's not a big deal to filter it out in your mind when browsing/reading the site.

Maybe "unethical" is inaccurate, but I did mean to make a strong point. How about "annoying and unnecessary"? One reason why I enjoy HN is because, as you said, the intelligent and civil discussion. Despite it being sponsored by YC, I really appreciate the distinct lack of advertisements or blatant links to YC funded apps. It truly feels like an inclusive environment for all hackers, not just YC affiliated ones. Your comments are judged by their content, not (usually) by whether the author is part of the YC clique. Yes, I understand YC doesn't owe anyone anything, and if they wanted to, they have the right to turn HN into a billboard for YC companies. But that's not why I come here. That's why I think HN should nip these trends in the bud, so it stays as inclusive and impartial as possible.

Initially (or at least a couple of years ago) if you submitted a PDF, the link would only be available through Scribd. It was a topic of much discussion. See here for example: http://news.ycombinator.com/item?id=195431

PG wants scribd to do well, and so HN auto-adds Scribd links (I believe this is how it works).

Yea, it's no different than the Green Stats thing at the bottom of the page. He's just helping people out by using their stuff. Can't blame him for that.

It's very different than the Green Stats.

One is simply using a service, the other involves rehosting content without the request or permission of the author or content owner.

If the author doesn't want their content mirrored, they're free to throw up a robots.txt file, assuming Scribd respects that. The internet requires middlemen to make multiple copies of content for each request to successfully complete, so I'd call an HTTP 200 status code implied consent.

Every request from a different place ultimately is fulfilled by the original host barring some extreme caching (which you can also use http headers to instruct against). This is completely different than the case of it being taken from the original host and put up on scribd, which could easily be a copyright violation. An HTTP 200 response is implied consent for an end-user (or even a robot) to view it, not to redistribute it. AFAIK scribd does not crawl for content (with them then hosting that would be blatantly illegal in the US) so robots.txt is not really applicable.

This interpretation makes caching illegal and puts routing in a gray area. If not instructing against caching is implied consent to redistribute the content, then you're essentially agreeing with me.

robots.txt is indeed intended for crawling, but if it's there and you redistribute someone's content anyway, I'd consider it less defensible.

AFAIK scribd does not crawl for content (with them then hosting that would be blatantly illegal in the US)

As far as I know, they don't do that, but if they did, how would it be different from Google cache?

Google cache doesn't beat your main site in the rankings, and it clearly cites you as the original source of the data.

That said, it's irrelevant because they don't crawl for content. The HN admins are way out of line with their practices on this one. They're making illegal copies to help their friends at Scribd, and doing so without the requisite consent.

Search rankings have nothing to do with copyright law, which seems to be the basis of your argument. You have yet to give a decent argument that materially distinguishes Scribd from Google's cache, caching proxies, or even the basic routing required for any request on the internet, which copies content by definition.

See also: http://docs.google.com/viewer

I was just responding to Zak's comment about differences, giving a few. It wasn't meant as a complete argument of anything, or any sort.

That said, two points:

1) doing so is not just immoral, it's against Scribd's TOS.

2) your claim that re-hosting content without permission is indistinguishable from transport makes clear that your beliefs are so different than mine that I cannot possibly find a way to communicate with you.

The basic concept of copyright law is that an author is the only one who is allowed to make copies of her work, and only she can give others permission to do so as well. At a technical level, sending a response to a request involves telling another machine to pass a message along for you, so there is at least implied consent to copy it and send it to another user. However, what are the bounds of this consent? Can a router store the data it has been passed? For how long? Can it serve it to others besides the IP address the response was intended for?

There may be case law and/or actual laws that clarify these points, but I am not a lawyer. I presume you aren't either. If you think the concept of IP law has a straightforward and indisputable application to the internet that clears all questions about what routers can and can't do with the data they are passed, feel free to explain.

It seems like you're operating within the "lots of people do it, so it must be legal somehow" school of thought when it comes to routing. This isn't necessarily a problem as long as it's applied consistently. Several other commenters and I effectively made the same argument in saying that Google does almost the same thing Scribd does in terms of copying and redistributing content. It isn't possible for you to call that argument invalid then rely on that argument as proof of routing's obvious legality.

Scribd is legally indistinguishable from a caching proxy. Feel free to let Opera and all the other caching proxy operators know.


... Scribd is legally indistinguishable from a caching proxy.

You already said you weren't a lawyer. No need to prove it dramatically.

Feel free to prove me wrong. It's generally more productive than taunting.

You have a valid point about ethics, but I was talking about copyright law. The two are not closely related.

I've found clicking a PDF link on anything but Mac OS to be rather unpleasant. Many people (me included) consider the scribd link a more pleasant experience.

Contrast this with my experience: whenever I happen upon an article on Scribd I immediately skip over it. I really don't like the user interface. (I'm on a Mac, by the way.)

Try foxit reader over the adobe junk :-) I switched a while ago and it's such an improved experience.

That doesn't run well on my platform of choice. Evince does though, and is also fast and lightweight. Still, I find PDF links a bit annoying, and am glad to have the scribd option.

I just put Evince on all my Windows machines and I am very happy about it. I was able to remove all the Adobe Air bonus arterial sclerosis.

I tried Foxit and was not very impressed. I thought it was interesting that the Evince windows port was made since the last time I looked for an Acrobat replacement about a year ago.

As long as they don't change the link to the original PDF, it seems fair to me (if you accept scribd as fair in principle - they simply gobble up all PDFs they can get, I suppose). Meaning I think it is just a helper to include a viewer in a link to a PDF.

Yeah, Scribd is YC funded. I always figured HN automatically scribd-ed PDFs, but I guess admins might do it manually (though I'd guess there'd have to be something special anyway, since normally the whole title links to the story link).

I think the rationale is that some people don't like PDFs. They don't want to download something when they're just going to skim it now and probably never look at it again. I don't really get it, but I guess it would make sense for very long PDFs so that you don't have to wait to download the whole thing when you might take one look at it and decide you don't care.

Now I remember what Scribd reminds me of: Steven Brill's Brill's content (the web site, not the magazine).

Scribd is definitely a YC-funded startup. When I interviewed there, the founder mentioned it.

I don't like Scribd, but I also don't like it when I click on a link and a file appears on my filesystem, which is what happens when I click PDF links in NetNewsWire.

The consensus here seems to be that it's an effort to promote Scribd, but I had always assumed that it was an attempt at trying to be helpful to those who prefer to remain within a browser, and to those who hate opening PDFs. At least, it serves that use case to me.

I think it's both.

how about we submit pdf's inside google's pdf viewer?


just mention the real domain in the post title

Submitting the original pdf link is best for pointing to the real source and giving people the option of how to deal with it. Those that like Google viewer links can always fix that themselves. Details in this sub-thread: http://news.ycombinator.com/item?id=1115897

If anything, HN scribd posts seem to turn into straight pdfs. I think the admins search for posts with [scribd] in the title and link to a pdf version instead.

No. I say this with absolute certainty, having just submitted a link to a PDF, to which the [scribd] portion was added, either by the system or by an HN administrator.

If you click on the title, it still goes to the original URL, which in my case was a straight PDF. If you click on the [scribd] portion, you are taken to a copy at Scribd.

Oh, from the original post I took it that it replaced your link with a scribd link (rather than adding a second link). I still think it is a questionable practice, but find it much less objectionable than I did at first.

IIRC, when I joined (circa mid 08) it did replace your link with one to Scribd. There was some questioning about whether this was right, people don't like the Scribd layout, blah blah, and I believe the decision was made to add a Scribd link but keep the original.

Posting it violated the Scribd TOS sections 8.1 and 8.8.

Oh I see. I always assumed when I saw that, that somebody had uploaded a scribd link and an admin had modified it to go to the straight pdf. Thanks for the correction.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact