
PDFy – Instant PDF Host - ridgewell
https://pdf.yt/
======
snowmaker
Hello,

I'm one of the founders of Scribd. You might be surprised to hear this, but I
applaud the PDFy service and am glad someone has built it.

Scribd is not designed to be a simple, lightweight way to host a PDF file.
Yes, this was the original idea of Scribd 8 years ago, but we've long since
left that path. We see that market as having been made irrelevant by a
combination of Google Docs, Dropbox/Box.net/etc., and better PDF readers now
built into browsers like Chrome. There may be room for someone to build an
imgur like service for PDFs too, but that's not what we're doing.

Scribd is really good for two things:

1) Scribd is a subscription reading service ("Netflix for books") where you
can read over 400,000 professionally published books for $9 / month, including
thousands of new releases and best-sellers. It doesn't include many
programming books unfortunately (yet!), but if you like to read other things,
it's a good deal.

2) Scribd is good for serious authors and publishers who want to publish a lot
of content and organize it well. For example, the World Bank uploads thousands
of research reports to Scribd and organizes them into collections. And many
serious authors publish books and other writings with us.

We're sorry that we haven't done a good enough of explaining who we are as
it's changed over the years. And we're sorry if you've been frustrated trying
to use Scribd for something it's not particularly good at.

To joepie91_ - I think it's cool that you've started this. We have some
experience building document hosting services, and I can see you are already
encountering some issues we've worked on, like DMCA and copyright. If you'd
like to talk, we'd be more than happy to help you out.

~~~
multiplier
Scribd was synonymous with copyright violation for years. Nice to see you're
finally making an honest business out of it.

~~~
nnnnni
...and now pdf.yt has stepped in to take the reins!

------
BorisMelnik
>"I got sick of documents getting locked up behind login walls of services
like Scribd."

Now thats a fantastic example of building a product around a real problem. If
I even see domains like Scribd anymore I won't even give it a click. No, I
don't want to sign up. I'd rather just do site:domain"thedoc.pdf" or some
other way.

I hope your product takes off and everyone uses it!

------
mgkimsal
"...much like Imgur does for images. PDFy is free, ad-free, and non-
commercial."

They must be using a different imgur than I use, because it's definitely
commercial and has ads. I can't see pdfy surviving on donations indefinitely.

~~~
joepie91_
I probably should've worded that better, but that text will be
changed/moved/removed in the near future anyway. The Imgur comparison really
only refers to the upload/sharing process, not the non-commercial bit.

As for running off donations, I've addressed that here:
[https://news.ycombinator.com/item?id=8034529](https://news.ycombinator.com/item?id=8034529)

~~~
dylz
imo the worst part is that imgur doesn't even permit direct linking in many
cases: they will 3xx-force redirect direct links to their ad filled pages

~~~
BorisMelnik
yikes I did not know that. I generally link directly to the imgur image itself
and bypass the image page. will be on the lookout for this.

~~~
dylz
Linking to the direct imgur image itself will trigger 3xx to the image page if
a few conditions are met - referrer (is popular site), enough hits to the
image (popular images trigger 3xxs), some weird cookies.

It's incredibly misleading and awful. Also, try uploading an image - you might
notice that the "direct link to image" textbox on the right bar no longer
exists, and hasn't existed in a very long time.

Check this out in term:

    
    
      $ curl -I -H 'Referer: http://twitter.com/' 'http://i.imgur.com/ZKfUroW.png'

~~~
vertex-four
I just uploaded a random pic, and I definitely see the "direct link to image"
box.

~~~
dylz
Are you logged in?
[http://i.imgur.com/7k3si8M.png](http://i.imgur.com/7k3si8M.png)

~~~
vertex-four
Yes, I am logged in.

------
codezero
This looks great, but I hope you have a solid system in place to deal with the
huge number of DMCA requests you'll get. Free PDF hosting services (free file
hosting services) end up being a target of pirates, but not just that, of
automated systems that index things specific to pirated text in order to get
clicks.

If you don't get way ahead of these kinds of users, you'll end up with an
untenable drain on your resources that will make it easier to shut down than
to sustain.

~~~
joepie91_
I'm just going to wait and see how things go. I generally improvise; there's
not really much prior work (let alone documentation) on the way I run
projects, so it's mostly just a matter of figuring stuff out as I go along.
One thing that's certain is that I have no intention of shutting down the
service. I've run stuff that attracted more abuse :)

I have no intention of acting as a 'shield' against DMCA requests for this
particular service (if they're valid, they'll be followed up on, as described
on the TOS page), so ideally there shouldn't really be a problem. We'll see
how it goes.

~~~
codezero
Yep, all I am suggesting is that you be prepared to deal with things in as
automated a way as possible, like blacklisting naughty IP ranges, automating
DMCA take-downs, fingerprinting known bad content and dropping it into a black
hole etc...

------
digitalengineer
Great idea. Just one little tip: I would not use the combination 'Instant and
PDF' in your communication. Instant PDF is a well known product by Enfocus.
It's basically a check-app for Print-optimized PDF files. As such it's a
world-wide standard designers are forced to use. Create PDF, run it through
Instant PDF, if approved it will attach a flightcheck report to said PDF and
newspapers, magazines, printers can process the file. The flightcheck searches
for common mistakes (non-embedded fonts, rgb colors, low res photo's, etc).
Recently the Instant PDF name got absorbed by Connect but the brand Instant
PDF is quite powerfull really).

------
alialkhatib
I'm skeptical about the longevity of a site that operates entirely on
donations, but seeing that you offer the source code for free (and it runs on
PHP, which is arguably pretty well-supported), _and_ your license is
reasonable (if crass, but who cares?)...

This is really great. Thanks for posting this.

~~~
joepie91_
Hi, PDFy owner here. I've been running a number of services for 3 years now,
without running into financial problems (see
[http://cryto.net/](http://cryto.net/),
[http://cryto.net/~joepie91](http://cryto.net/~joepie91),
[http://cryto.net/~joepie91/projectlist](http://cryto.net/~joepie91/projectlist)).
My biggest issue has actually been lack of time, rather than money :)

I've gotten pretty good over the years at running stuff on a shoestring budget
(my current hosting expenses are around 100 euro a month for everything
together at a large number of hosts, and I have plenty of resources to spare),
and as far as I can predict PDFy won't be running into any issues any time
soon.

I should add that it definitely helps to custom-develop everything - generic
solutions tend to come with a large amount of (resource) overhead, which make
it harder to run it on a small budget. By doing just about everything custom,
the overhead is minimal. Traffic is dirt cheap nowadays if you know where to
look, so that's not really a concern anymore either.

~~~
boldpanda
What do you mean by "Traffic is dirt cheap if you know where to look"

Are you talking about where to go find traffic for your website? Do you mind
sharing?

~~~
joepie91_
As MitchellRobert already pointed out, I'm refering to traffic in the sense of
data traffic (commonly called bandwidth). A very common remark I've gotten is
"but what about the bandwidth usage?!", but nowadays it's not hard to get a
few terabytes of monthly traffic allowance for under $10.

For this particular VPS, I'm paying $9.30 a month, and it includes 2TB of
montly traffic. Cheaper offers exist, and there are always providers like OVH
that genuinely offer unmetered traffic on the cheap (as long as it's used for
a legitimate purpose, eg. serving hosted files).

------
lnanek2
Wow, love this so much more than Scribd. Already clicked on the latest uploads
and found something cool, and it just worked and didn't require login.
Amazing.

------
nkw
This is really great. Scribd can suck it. Also props for mirroring to the
Internet archive.

------
mhd
This really needs NSFW tagging. Right now, the "latest public documents" is
full of hentai. Doesn't mean that people will actually do that, but at least
giving uploaders an option wouldn't hurt.

~~~
juan_venter
Yeah, I just ran into the same problem ! Opened the site at work and had to
close asap !

------
jon4988
I love it. But instead of "document.pdf" as the file name on download, what if
you changed it to "[pdf title].pdf"? Several downloads in a row gets confusing

~~~
joepie91_
It should give you the original filename upon download, unless your browser
ignores/mis-parses the Content-disposition header:
[https://github.com/joepie91/pdfy/blob/master/public_html/mod...](https://github.com/joepie91/pdfy/blob/master/public_html/modules/download.php#L108)

What browser are you using?

~~~
Hengjie
Were you letting users download it via PDF.js before which is why it was
document.pdf? It makes sense to download it via PDF.js since the file is
actually already loaded once the user renders it.

~~~
joepie91_
Oh, that might actually be it. The 'download file' button in the pdf.js menu
is a different button from the button at the bottom/right hand side of the
page (with different code behind it - the pdf.js 'download' button comes stock
with pdf.js). I haven't really tested its behaviour much, but it's quite
possible that it's calling things document.pdf.

Perhaps I should just remove the pdf.js 'download' button, seeing as there's
one elsewhere in the UI anyway.

------
fiatjaf
Well, someone could scrape all these PDF links and index them in a search
engine, then Scribd could suck it. Or will Google do it automatically?

~~~
joepie91_
The gallery is intentionally plain HTML; Google appears to be indexing all
public documents on PDFy correctly (both viewer pages and actual PDF files).
Unlisted documents get a noindex tag.

That said, I really need to write some code to extract the document metadata
from the PDF files and display it on page; right now, the only thing that
search engines (and the site itself) have to go off is the filename, which is
far from optimal.

~~~
walterbell
Ideally, metadata extraction would be done on upload and presented to the user
for optional manual correction. This would be a major contribution to
findability, because PDFs often have incorrect metadata (try searching for
anything on archive.org) or the person uploading may have metadata relevant to
a use case that is unforeseen by the document creator.

In fact, the same file could be uploaded multiple times with different
metadata. There's room here for experimentation, e.g. publishing content
hashes and linking "duplicates" that have different metadata.

If PDF metadata can be published in a structured format, it should be then be
possible for Calibre or Docear / Zotero to import the PDF + JSON metadata
directly into the document database.

~~~
joepie91_
The problem is that I don't want to add any more roadblocks to the upload
process - it should be as 'instant' as possible. Even the 'public' vs.
'unlisted' selection took quite some consideration.

Perhaps crowdsourcing metadata might be an idea - but that involves quite a
bit more complexity, implementation-wise.

~~~
walterbell
How about an "edit metadata" button linked to the submitter's session cookie,
which is only active for a few minutes, similar to HN post editing? For those
who don't care, upload process stays the same. Those who want to edit have the
option, within a few mins.

~~~
ratpik
Why not build a re-CAPTCHA type service around crowd sourcing PDF metadata.
Read a PDF while you wait for something to happen.

Too idealistic?

------
Pistos2
@joepie91_: Just FYI, documents do not load until you accept the site's
cookies. I don't know of a technical reason why this should be, off the top of
my head, so if there isn't one, you may consider removing that
restriction/requirement.

~~~
joepie91_
I suspect that might be caused by pdf.js. While PDFy itself will attempt to
send you a cookie (even if you just try to download the PDF), it should still
work even if the cookie is rejected, as it's not dependent on it for serving
the file.

Can you tell me what the name of the offending cookie is?

~~~
aw3c2
Works fine without accepting cookies in Opera 12. Needs JS and iframes
enabled.

~~~
Pistos2
In my case, Chromium 35.0.1916.153 .

------
jonpaul
This is awesome! But I can't see it surviving on donations. Commercialize it
please so that it survives. Tasteful ads aren't so bad like what Reddit does
or Carbon[something]... I'm just concerned you won't survive on donations.

~~~
joepie91_
I absolutely don't need to commercialize it to keep it running. In fact,
commercialization would come with an entire set of issues (and cost factors)
of its own.

I have a solid track record of running non-commercial services :)

------
bluthru
Could you please move the top toolbar to the side to create one side panel?
It's taking up a lot of real estate on laptop screens.

~~~
joepie91_
On smaller screens, it should automatically move the entire sidebar to be a
(relatively thin) footer bar instead. I should probably make the top bar
shrink in height at that point as well. I suspect you might be just slightly
above the cut-off for the small-screen layout.

EDIT: Here's a ticket:
[https://github.com/joepie91/pdfy/issues/13](https://github.com/joepie91/pdfy/issues/13)

~~~
bluthru
What I'm describing is getting rid of the top bar and putting the logo in the
side bar. Vertical screen real estate is precious.

------
Nux
Very nice initiative! Good luck with it!

I know this implies complexity, but I think it'd be nice if there was a
comment feature for the PDFs, perhaps even something à la Soundcloud, i.e. per
section comments.

I'd love to find things such as my washing machine's manual improved by user
experience through comments etc. :-D

~~~
joepie91_
I've actually been considering this. I'm not yet sure how to implement it,
though - and while I'd love to have positional comments (perhaps 'annotations'
would be a better term?), it'd require some significant modifications to the
pdf.js viewer.

Right now I'm quite swamped with stuff to do, but once I have some more free
time I'll definitely be looking into this and some other enhancements that are
currently sitting on the issue tracker.

~~~
walterbell
Contact the author of this presentation, he wrote epub.js and could provide
helpful advice.

[http://www.w3.org/2014/04/annotation/slides/Hartnell.pdf](http://www.w3.org/2014/04/annotation/slides/Hartnell.pdf)

[http://www.youtube.com/watch?v=Xtj4LYBzRiw](http://www.youtube.com/watch?v=Xtj4LYBzRiw)

Related:
[http://www.w3.org/2014/04/annotation/](http://www.w3.org/2014/04/annotation/)

------
hansy
This is really sweet, but if I may, what's wrong with using existing solutions
like Dropbox or Google Docs? Neither shows a paywall or login screen when
accessing the shared links.

Unless you don't use either service, which is totally valid.

~~~
thomas4019
Dropbox and Google docs rarely if even show up in Google search results.

Will PDFy submit lists of content to search engines or be easily crawlable?

~~~
joepie91_
The gallery is plain HTML; all public documents are correctly (and quite
rapidly, <1 day) indexed by Google. Unlisted documents get a noindex tag.

------
heeton
I also made NextPrev.it a little while ago with a friend. It gives you option
to host a PDF and control which page the viewer sees, perfect for
presentations or looking through contracts etc.

------
maximumoverload
I don't want to sound negative, but when I look at the "latest public
documents", most of them seem to be copyright infridgement.

But good luck anyway.

------
namespace
Great work @joepie91_! I wish your service the very best and hope that would
scale against the DMCA brigade.

------
seesomesense
Lots of copyrighted material there already. O'Reilley, Prentice Hall etc Get
ready for a DCMA deluge.

------
ThrustVectoring
This looks like duplicated effort to me. Upload the pdf to Google Drive, set
the permissions to "people with link can view", and share the link it gives
you. Alternatively, there's a publish to web option - they're identical in
this use case. I already do something similar on my portfolio page - I have a
link that downloads a pdf version of my resume on Google Drive.

~~~
sillysaurus3
This comment is almost identical to one posted when Dropbox first premiered on
HN. They were saying Dropbox was duplicated effort. Interface matters a lot.

~~~
ThrustVectoring
That's a great point. The exact differences between the work flows:

1\. Adds drag-and-drop to choose the file to upload

2\. Cleaner permissions structure

3\. The button is much bigger, and there's fewer options to do other things

4\. Adds galleries to look at collections of PDF files

5\. Exact same number (6) of mouse clicks to copy a link to a hosted pdf to
clipboard

6\. Removes log-in requirement (though to be fair, it's really easy to stay
logged in to Google services)

It seems to me to be a much smaller difference to the UI difference with
Dropbox - the automatic mirroring and context menu actions are huge.

~~~
dingaling
> (though to be fair, it's really easy to stay logged in to Google services)

Not particularly easy for the three-in-four Internet users who don't have any
form of Google account, though...

------
seqizz
A login might be useful. You know.. to check (or delete) what I've uploaded
and see the stats later.

------
McGuffin
Whoa, opened the site to a bunch of obvious nsfw stuff. Maybe there should be
an option to filter this?

------
diegolo
Add comments on each pdf and it could become an important tool for
(anonymously) reviewing papers

------
anonbanker
Wait, is this the joepie91 that used to hang out on #anonops?

------
ourjs
I saw some sexy contents, will you delete them?

~~~
sarciszewski
Delete them? Hah

------
shire
would be cool if we can edit pdf's

~~~
joepie91_
Editing of PDFs really is outside the scope of PDFy. There are quite a few PDF
editors available elsewhere already, all of which will likely do the job
better than PDFy ever could.

Not to mention that editing of PDFs can be a somewhat painful experience :)

~~~
Hengjie
To build on that, here's a list of web PDF editors that are available:

1\. NotablePDF ([https://web.notablepdf.com](https://web.notablepdf.com)) 2\.
PDFZen ([http://pdfzen.com](http://pdfzen.com)) 3\. PDF Escape
([https://www.pdfescape.com/](https://www.pdfescape.com/)) 4\. PDF Buddy
([http://pdfbuddy.com](http://pdfbuddy.com))

------
sillysaurus3
Could we please stop hating on Scribd? In addition to being rude, it's also
mistaken: I've found PDFs on Scribd that simply aren't available anywhere
else, because the original links on other sites became broken. This is
somewhat common for academic papers, for example. Yes, I had to upload a PDF
in order to download the PDF from Scribd, but that's a fair trade; I chose to
upload an academic paper that perhaps might be useful to someone else.

~~~
Camillo
When you have a horrible, user-hostile service, being the only place to find
something only tends to make people hate you more.

~~~
sillysaurus3
Which is better: not being able to find something at all, or having to jump
through a hoop to download it?

I thought HN was objective. Maybe that's changing.

~~~
girvo
> I thought HN was objective. Maybe that's changing.

No, I don't think so, I just think a lot of people here disagree with you in
your judgement of Scribd.

Hopefully this service can have the utility of Scribd as you've described and
keep it's lovely usability.

~~~
joepie91_
(There's a maximum nesting limit for replies?)

Scribd having a lot of unique content isn't a feature; it's an observed state.
There is no reason why Scribd would exclusively be able to offer that, and any
other PDF host wouldn't. It's simply a consequence of a lot of people using a
service.

Scribd having amassed so much data due to their sheer size is not a point in
favour of Scribd; if anything, it makes the walled-garden approach of Scribd
even _more_ frustrating.

~~~
sillysaurus5
Are you sure? If an academic paper were to become lost from the internet
because Internet Archive hadn't archived it and no other site mirrored it,
then the world wouldn't be worse off as a result? That seems dubious.

And if it's true that the world is better off for Scribd having the paper,
then it must be true that Scribd is beneficial to the world. The fact that
people don't like it is irrelevant.

(I created a new account not to dodge downvotes, but because HN wouldn't let
me continue submitting replies to this thread with my other one.)

~~~
icebraining
But the alternative to "Scribd having the paper" is not necessarily "no other
site having it". If someone uploaded it to Scribd, why wouldn't they upload it
to some other site instead?

Your argument is equivalent to saying: the registrar of google.com is
MarkMonitor, therefore if MarkMonitor didn't exist, we wouldn't have Google.

