
Show HN: ThePDFApi, a Chrome Based PDF Generation API Hosted on AWS - marcTPA
Hello HN!<p>It kept surprising me how much of a hassle it is to generate a PDF with decent HTML5 rendering from my SaaS apps. I tried several free libs and APIs but ended up with botched rendering a lot of times. So I set out to simplify this chore by creating an AWS hosted HTML to PDF conversion API that&#x27;s based on Chrome. This API will allow a dev to just send the HTML to our API and get a PDF in response without having to worry about running and managing Chrome somewhere in their infra.<p>I just finished the first version of my landing page and hosted pdf generation API (<a href="https:&#x2F;&#x2F;thePdfApi.com" rel="nofollow">https:&#x2F;&#x2F;thePdfApi.com</a>) and would love to have some feedback on the following points:<p>- is it clear upon viewing the landing page what the product is about?<p>- are there any questions that you have that are not answered on the page? I&#x27;m thinking about adding an FAQ once the first questions pop up.<p>- would this product provide value to you if your startup needed to generate PDFs? If not, what would you use instead?<p>Thanks for helping a fellow hacker out.<p><a href="https:&#x2F;&#x2F;thePdfApi.com" rel="nofollow">https:&#x2F;&#x2F;thePdfApi.com</a>
======
joewils
Generating PDF's from HTML is a huge PITA. I really like what you've done.

re: "is it clear upon viewing the landing page what the product is about?"
Yes. I think your landing page copy reads well, but could use a bit of polish.
Emphasis how well PdfApi solves common margin and background rendering issues
compared to other alternatives.

re: "are there any questions that you have that are not answered on the page"
I'm not sure you need to answer all of a users questions, but considering you
are targeting startups and developers, I'd focus on building out your API
documentation.

re: "would this product provide value to you if your startup needed to
generate PDFs" Potentially, but considering your audience
(developers/startups) I suspect most of us would tackle the issue locally with
a Chrome headless setup.

re: "what would you use instead" I've used the following to generate PDF's
from HTML: * Chrome Headless * WKHTMLtoPDF

This is good work. I like what you've done. Build out strong API documentation
with multiple code snippets and examples to improve your developer marketing.

------
ertand
I don't know much about the complexities of this task so excuse my question if
it's too obvious.

How is it different/better than using puppeteer? If it's better, maybe SxS
comparisons of generated pdfs could be a good selling point.

~~~
marcTPA
Thanks for leaving a comment, really appreciate it. The idea of creating a
comparison between PDFs rendered with different solutions is genius.
Definitely gonna add this.

Puppeteer would indeed come close in rendering quality. Improvements of using
my solution over puppeteer are:

1) I tweaked Chrome headless to have the fonts available to ensure that
typography renders as it should. Even emojis work!

2) you don't need to worry about installing and maintaining puppeteer and
Chrome headless into your own infra

3) I didn't really make this very clear on my landing page so far, but I'll
provide support to clients that have issues getting a certain document to
render exactly like they want.

4) Not really a benefit yet since I wanted to launch with the MVP but soon
I'll offer several options in the API that puppeteer itself doesn't offer such
as multi-document PDFs, automated clickable terms of content for longer
documents, etc.

~~~
ertand
You might also want to take a look at
[https://github.com/GoogleChrome/puppeteer/issues/557#issueco...](https://github.com/GoogleChrome/puppeteer/issues/557#issuecomment-344418024)

------
andreareina
A warning from my own experience: do _not_ use web technologies for any
printing (including rendering to pdf) where positioning is critical (e.g. for
filling out pre-printed forms). Fiddling with `@media print {…}` and
`position: absolute` will work… until there's some minor change in the
rendering engine that will throw all that careful work into disarray and leave
you asking questions like, "why is this right-aligned bit of text in an
8.5-inch wide container being printed down the middle of my page?" (the
preview looked great btw). Oh, and the vertical scale was only _slightly_
short, so I couldn't just scale the page. The right answer in this case was a
package that actually spoke pdf and would flow text into a fixed-size box at a
fixed location. Oh, and once you've got the file, don't let the browser print
it either -- somehow both Firefox and Chrome wouldn't render it to the printer
correctly, and of course they would mess up in different ways.

Browsers are good at laying things out on the screen. On paper, not so much.

~~~
seanwilson
Could you go into more detail about why printing a web page doesn't do what
you want? You mean for example different versions of Chrome would screw up
your previously working layout? How about between the same browser on
different operating systems?

I've looked into JS libraries that will directly generate PDFs you can print
but each library seems to come with a lot of caveats.

~~~
andreareina
Yes, different versions would produce different layouts. Element positions and
bounding boxes would change, more so in the horizontal dimension than the
vertical. The text itself would be rendered at the proper size though. FWIW
Chrome was better-behaved in this regard than Firefox, but even a 10% shift is
too much when you need to get text into a specific box that's already been
printed.

I didn't bother to check whether the same browser version on different
operating systems would produce the same results.

------
citrablue
Have you considered offering this as a browser extension? It would greatly
increase your market size, and I have worked with (non-technical) people who
would love to be able to activate an extension, enter an email address, and
email out a PDF (of an invoice) to a colleague.

This workflow is a common one, and really frustrating: "Print -> Save as PDF
-> choose location on disk/google drive/dropbox-> Save -> switch to email ->
compose email -> enter email address -> enter subject -> add attachment ->
navigate to saved location (if I can remember it) -> Send".

You could even add a premium feature that would hit a URL on a schedule, to
automate report sending to managers (e.g. of Yahoo Ads or any other platform
with similarly terrible reporting).

My manager and I at my old place of work used to spend 1-3 hours/month, times
however many people had access to his credit card for their subscriptions.

~~~
marcTPA
Interesting idea, I've been planning to build a save as PDF extension around
it as a case study.

Combining the PDF functionality together with an email function sure is
interesting, gonna think this over a bit more. Thanks!

------
smhg
Simple layout and fast loading time: big plus. I think the copy could be
better. Put more focus on the strengths. Don't use the link-blue color if it
isn't a link.

About the PDF results: I get mobile versions of websites a lot, but I guess in
normal use cases you won't even request those.

~~~
marcTPA
Thanks a lot for the constructive feedback. Noting them in down my priority
list.

You're right, the most common use case would be that a client sends HTML
instead of an URL to the endpoint. This way a PDF can be created of data
that's not publicly exposed on the internet (think invoices, contracts, etc.)

I also don't store any of the data you send to the API, as to not further
contribute to your GDPR nightmares.

------
marcTPA
A little addendum, I'm also contemplating to create a manual service where you
could send me a document in any format (JPG, PSD, Microsoft Word) and I'd
create a REST endpoint that you can call with the data that you want to have
inserted into the document. The output of this rest endpoint would be the
binary PDF data.

This way your dev team would not have to invest any time at all in the
creation of PDF documents. Shoot me a mail at the email address in my profile
if you want to know more.

~~~
vageli
How would you insert data at arbitrary places in the document? Or is this more
like a form-filling API or something else?

------
jexah
Since there is no contact form on the site, I figure I'll ask here. How well
does it manage different margins on each page? ex Cover page with no margins,
but rest of document with margins. Does it support Table of Contents and page
numbers? What about other features like JS running in the page header/footer
components?

Edit: Holy hell, was just reading some more comments mentioning the price and
then had a second look. $79 bucks a month for 10 PDFs. Yeah I think I'm gonna
go with spending 10 minutes writing a WebAPI to access my PDF generation API.
For reference, it took me two days to land on Puppeteer, a day to configure it
how I wanted, and costs me $5 a month to do about 50 PDFs per day (not upper
limit, that's just how many we need in a typical business day, I don't know
what the box is capable of).

------
eschutte2
I like it! I'm about to release a related service soon so I'm interested in
this space.

It's very fast! Are you caching at all?

Since people probably want to use this with private data, would they usually
be sending HTML strings to you, vs URLs?

Why's the "i" in API lower case?

~~~
marcTPA
Thanks a lot for the feedback.

I'm not caching at all since I do not want to store any potential confidential
data on my servers. The main reason that it's fast is that there are several
instances running Chrome headless behind a load balancer.

The main use case would indeed be to send HTML to the API instead of an URL. I
just didn't add this use case to the landing page API tester, but it's
definitely supported.

The i in APi is lowercased because I thought it looked cute :)

------
schappim
I literally just built this for rendering invoices etc yesterday.

I ended up using wktohtml on AWS Lambda. Wktohtml isn’t nearly as nice as
Princexml or headless Chrome, but for the most part it gets the job done.

What did you end up using for your stack?

~~~
marcTPA
That's a coincidence :) I used wkhtmltopdf in the past as well but it
struggled when dealing with more complex layouts.

My stack is a tweaked Chrome headless on Linux in a docker container, exposed
by a Node API.

------
starptech
Generating PDF's from HTML and Web Technologies is nothing new. There are tons
of PHP and Node.js libs e.g I could host a service in minutes with the help of
[https://github.com/GoogleChrome/puppeteer](https://github.com/GoogleChrome/puppeteer)
in only few lines of code. Hosting that service on digital-ocean would reduce
the cost to ~5/month. Cheers!

~~~
marcTPA
Thanks for your feedback.

You are right that generating PDFs from HTML and Web Technologies is nothing
new. Most existing PHP and node libs sadly don't provide great rendering once
you have a document that consists of modern CSS and HTML or modern image
formats such as SVG.

Puppeteer would indeed provide the same rendering quality but then you're
responsible for the maintenance of the running instances. I'm hoping to make
my user's life easier by taking this task out of their hands.

------
taneltahepold
As this is a developer-oriented product then I would like to see API
documentation. Also, in my opinion, the pricing is a bit hidden at the moment.

I have a similar API product for generating PDFs, we saw that the generation
part is easy, it gets crazy when your customers start asking the
customizations for their labels, invoices, packing slips, contacts etc.

------
koliber
Ha! I recently launched an analogous service for converting CSVs into Excel
files so that web sites can offer users rich spreadsheet downloads instead of
crummy CSV files -- goodgrids.com. I posted it to Show HN and just found this!
This is great.

~~~
marcTPA
Just checked your landing page, great job at making the advantages of your
service crystal clear. And of course congrats on launching, always scary to
show your baby to the world.

------
richjdsmith
Looks good! Pricing is a bit steeper than I'd expected, but I've also never
tried dealing with the pains of trying to generate PDFs. What stack is it
built on?

~~~
marcTPA
Thank you. Pricing is still experimental, I'm hoping that the easy API
interface and the lack of maintenance that my clients would need to do provide
a lot more value than the monthly cost.

The stack is Linux on Docker, running a tweaked version of headless Chrome
with an API created in Node.

------
kierenj
This is probably a general headless-Chrome question, but - with something like
this, how would you go about specifying margins, page breaks, etc?

~~~
marcTPA
For the margins there is a global setting in the API that will set the
specified margins on the entire document. You can specify additional margins
on top of this by setting a margin in css.

Page breaks are controlled by the CSS properties page-break-after, page-break-
before and page-break-inside.

------
ernststavrob
How about not allowing file:// URLs?

------
stevekemp
Do you have a security contact address?

 __Edit __: Emailed your contact@brainhashed address.

~~~
stevekemp
Now that it has been fixed I'll say the site previous allowed you to enter
URLs of the form `file:///etc/passwd`, which were then rendered in the PDF
output.

In short arbitrary local file inclusion.

------
swyx
yes, it looks very nice. i dont have a need to generate pdf's unfortunately so
i'm not your target market. good luck. great to productize your side effects.

~~~
marcTPA
Thanks for the kind words!

------
teddyqwerty
I tried generate PDF on the landing page and get a 422 error.

~~~
marcTPA
What URL did you try to generate a PDF from? The PDF APi replies with a 422
when it can not connect to the requested URL.

This could have a few reasons, the most common one would be if the site you
want to generate a PDF for blocks connections from an AWS ip-range.

