I'm one of the founders of Scribd. You might be surprised to hear this, but I applaud the PDFy service and am glad someone has built it.
Scribd is not designed to be a simple, lightweight way to host a PDF file. Yes, this was the original idea of Scribd 8 years ago, but we've long since left that path. We see that market as having been made irrelevant by a combination of Google Docs, Dropbox/Box.net/etc., and better PDF readers now built into browsers like Chrome. There may be room for someone to build an imgur like service for PDFs too, but that's not what we're doing.
Scribd is really good for two things:
1) Scribd is a subscription reading service ("Netflix for books") where you can read over 400,000 professionally published books for $9 / month, including thousands of new releases and best-sellers. It doesn't include many programming books unfortunately (yet!), but if you like to read other things, it's a good deal.
2) Scribd is good for serious authors and publishers who want to publish a lot of content and organize it well. For example, the World Bank uploads thousands of research reports to Scribd and organizes them into collections. And many serious authors publish books and other writings with us.
We're sorry that we haven't done a good enough of explaining who we are as it's changed over the years. And we're sorry if you've been frustrated trying to use Scribd for something it's not particularly good at.
To joepie91_ - I think it's cool that you've started this. We have some experience building document hosting services, and I can see you are already encountering some issues we've worked on, like DMCA and copyright. If you'd like to talk, we'd be more than happy to help you out.
So far I've seen Scribd as very annoying pdf host, and most often I decide not to read the content at all when given a scribd link ("just send me the pdf, damn it!").
Up until now my encounters with Scribd have generally followed this scenario: I'm reading a news article and notice a link to some of the source documents for the article. I click on the link and then am sent to a scribd page that displays the document along with nice little download buttons that purport to let me download a copy of the document. Of course clicking any download button gives me a modal telling me I either need to "Login with Facebook" or create a scribd account. Back buttons are pressed, tabs are closed.
In the example links above these documents are not books, they are not part of a curated collection put up by the "serious author" or publisher of the document. It looks much more like a publisher looking for an easily linkable or embeddable document viewer was snookered into believing that by uploading the document to scribd it would be easily accessible and "available" to the world.
There are countless scribd accounts were I imagine the author really intended their upload to be freely available, not used as bait for scribd to suck people into account creation. For example, the U.S. Naval Research Laboratory, various government officials or agencies, and in fact entire scribd categories seem to be documents which are neither authored by the uploader, or copyrighted at all, such as public court filings.
I think your "Netflix for books" is a great idea, and might even be something I would go for, except for the really bad taste the above interactions leave in my mouth. These documents don't fit into the two categories you say scribd is really good for and you mention you are sorry you haven't done a good enough job explaining who you are as it has changed over the years. A great place and way to explain this would be right next to a download button for this type of content that doesn't require someone to "Login with Facebook" or create an account. Instead of getting the feeling I've gotten suckered by clicking on the link, I might think it is great you are hosting and making available this type of content and I should explore some more about this "Netflix for books" thing you are talking about.
 For example: http://arstechnica.com/tech-policy/2014/05/in-18-months-feds...
 e.g.: http://www.scribd.com/doc/204954147/Lolli-v-BF-Labs-Journal-...
Now thats a fantastic example of building a product around a real problem. If I even see domains like Scribd anymore I won't even give it a click. No, I don't want to sign up. I'd rather just do site:domain"thedoc.pdf" or some other way.
I hope your product takes off and everyone uses it!
They must be using a different imgur than I use, because it's definitely commercial and has ads. I can't see pdfy surviving on donations indefinitely.
As for running off donations, I've addressed that here: https://news.ycombinator.com/item?id=8034529
It's incredibly misleading and awful. Also, try uploading an image - you might notice that the "direct link to image" textbox on the right bar no longer exists, and hasn't existed in a very long time.
Check this out in term:
$ curl -I -H 'Referer: http://twitter.com/' 'http://i.imgur.com/ZKfUroW.png'
If you don't get way ahead of these kinds of users, you'll end up with an untenable drain on your resources that will make it easier to shut down than to sustain.
I have no intention of acting as a 'shield' against DMCA requests for this particular service (if they're valid, they'll be followed up on, as described on the TOS page), so ideally there shouldn't really be a problem. We'll see how it goes.
This is really great. Thanks for posting this.
I've gotten pretty good over the years at running stuff on a shoestring budget (my current hosting expenses are around 100 euro a month for everything together at a large number of hosts, and I have plenty of resources to spare), and as far as I can predict PDFy won't be running into any issues any time soon.
I should add that it definitely helps to custom-develop everything - generic solutions tend to come with a large amount of (resource) overhead, which make it harder to run it on a small budget. By doing just about everything custom, the overhead is minimal. Traffic is dirt cheap nowadays if you know where to look, so that's not really a concern anymore either.
In either case, you probably want to get very acquainted w/ the dmca and register an agent ; hopefully you know all about this but it's worth running requirements to stay w/in the dmca past a lawyer
Are you talking about where to go find traffic for your website? Do you mind sharing?
For this particular VPS, I'm paying $9.30 a month, and it includes 2TB of montly traffic. Cheaper offers exist, and there are always providers like OVH that genuinely offer unmetered traffic on the cheap (as long as it's used for a legitimate purpose, eg. serving hosted files).
What browser are you using?
Perhaps I should just remove the pdf.js 'download' button, seeing as there's one elsewhere in the UI anyway.
Anyhow, thanks for putting together such a great site!
That said, I really need to write some code to extract the document metadata from the PDF files and display it on page; right now, the only thing that search engines (and the site itself) have to go off is the filename, which is far from optimal.
In fact, the same file could be uploaded multiple times with different metadata. There's room here for experimentation, e.g. publishing content hashes and linking "duplicates" that have different metadata.
If PDF metadata can be published in a structured format, it should be then be possible for Calibre or Docear / Zotero to import the PDF + JSON metadata directly into the document database.
Google would do the rest.
Perhaps crowdsourcing metadata might be an idea - but that involves quite a bit more complexity, implementation-wise.
Can you tell me what the name of the offending cookie is?
I have vague recollections about Chromium and/or Firefox confounding localStorage with cookies when it comes to allowing or denying.
I have a solid track record of running non-commercial services :)
EDIT: Here's a ticket: https://github.com/joepie91/pdfy/issues/13
I know this implies complexity, but I think it'd be nice if there was a comment feature for the PDFs, perhaps even something à la Soundcloud, i.e. per section comments.
I'd love to find things such as my washing machine's manual improved by user experience through comments etc. :-D
Right now I'm quite swamped with stuff to do, but once I have some more free time I'll definitely be looking into this and some other enhancements that are currently sitting on the issue tracker.
We're focusing on annotating documents within teams, schools, etc, but the public annotations to improve documents (à la rapgenius) is something we are thinking about.
(We actually have the ability to share files with a link too)
Unless you don't use either service, which is totally valid.
Will PDFy submit lists of content to search engines or be easily crawlable?
EDIT: And both are more involved than just dragging and dropping a PDF.
But good luck anyway.
1. Adds drag-and-drop to choose the file to upload
2. Cleaner permissions structure
3. The button is much bigger, and there's fewer options to do other things
4. Adds galleries to look at collections of PDF files
5. Exact same number (6) of mouse clicks to copy a link to a hosted pdf to clipboard
6. Removes log-in requirement (though to be fair, it's really easy to stay logged in to Google services)
It seems to me to be a much smaller difference to the UI difference with Dropbox - the automatic mirroring and context menu actions are huge.
Not particularly easy for the three-in-four Internet users who don't have any form of Google account, though...
Not to mention that editing of PDFs can be a somewhat painful experience :)
1. NotablePDF (https://web.notablepdf.com)
2. PDFZen (http://pdfzen.com)
3. PDF Escape (https://www.pdfescape.com/)
4. PDF Buddy (http://pdfbuddy.com)
I thought HN was objective. Maybe that's changing.
No, I don't think so, I just think a lot of people here disagree with you in your judgement of Scribd.
Hopefully this service can have the utility of Scribd as you've described and keep it's lovely usability.
I wouldn't quite call this "tribalism," but perhaps it's one step short. In the old days of HN, a valid point wouldn't simply be dismissed without a response. The fact that no one is stepping up to point out why I'm mistaken is suggestive; if it continues to happen, then it's indicative of a trend. And over the last year I've seen this happen to others somewhat frequently.
No, your reasoning is specious, you're assuming "Scribd has content nobody else has, therefore if there was no Scribd, there would be no content". This is simply broken logic.
Scribd is like the cup I used to drink water from this morning; if that particular cup hadn't been made, would I die of thirst?
Scribd having a lot of unique content isn't a feature; it's an observed state. There is no reason why Scribd would exclusively be able to offer that, and any other PDF host wouldn't. It's simply a consequence of a lot of people using a service.
Scribd having amassed so much data due to their sheer size is not a point in favour of Scribd; if anything, it makes the walled-garden approach of Scribd even more frustrating.
And if it's true that the world is better off for Scribd having the paper, then it must be true that Scribd is beneficial to the world. The fact that people don't like it is irrelevant.
(I created a new account not to dodge downvotes, but because HN wouldn't let me continue submitting replies to this thread with my other one.)
Your argument is equivalent to saying: the registrar of google.com is MarkMonitor, therefore if MarkMonitor didn't exist, we wouldn't have Google.
It's really not, though. Scribd is terrible in terms of its user experience, and frankly it's interface is horrid in my opinion, but it has content that isn't found anywhere else (which is a plus). Those two are not mutually exclusive, you're drawing a false dichotomy. *shrugs
Usually when the "reply" link isn't shown, you can click on the comment's "link" link and reply from there.