Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: An API for filling out PDF forms (formapi.io)
209 points by nathan_f77 on Oct 8, 2017 | hide | past | favorite | 85 comments

Hi HN! I started FormAPI because I've been living in Southeast Asia for the last few years, and I've had to fill in a lot of visa application and extension forms. I want to build a service where you can fill in visa forms online, so I built FormAPI as the first step towards that project.

I also used to work at Gusto (a payroll company), and we had to fill out a lot of tax forms, which took a lot of work. I know this is something that many financial and legal startups have to do, so hopefully FormAPI can make this process a lot easier.

Let me know if you have any questions or feedback!

This is absolutely great and it will enable A LOT of very useful applications.

One question though: I assume that the automatic detection of form fields does not always work. Especially if the PDF does not exist in a “fillable" format. Am I right with that interpretation?

Hi, at the moment we can only import fillable PDF forms (AcroForms or XFA.)

In the future we might implement OCR and machine learning to automatically detect fields in scanned or photographed PDFs. I think it could also be useful to build an index of fillable PDF forms (e.g. from the IRS and UCSIS), and then use those original fillable PDFs whenever we detect a matching scanned form. We might also build a public database of fillable forms.

But I don't want to build any of those features unless I have a customer who actually needs it. (Please send an email to hello@formapi.io if this is something you would use!)

Website seems to suggest that non fillable/scanned forms can be programmed/mapped with fields

Back in 2000-2001, I worked up a prototype of something similar using Adobe's FDF SDK [1] for PDF. We had a PDF-centric application that would render to PDF using an HTML -> PDF library and it was hugely problematic, so the prototype definitely made things a lot easier and, had the business not run out of money at about that time, it would have been the direction we headed with the product.

So, how similar is this to FDF? Does it use FDF under the hood? If not, what are the pros and cons of this over FDF?

[1] http://www.adobe.com/devnet/acrobat/fdftoolkit.html

I decided not to use FDF or AcroForms / XFA to create the output PDFs. I just thought it would be easier to render the data as plain text, shapes, and images. However, I do parse form fields from AcroForms / XFA forms.

I'm not sure about the pros and cons. If a customer needed to generate PDFs with data in the FDF format, then that's something that I would be happy to build.

IMO integrating this into a system that already knows these things about a user would be super powerful.

Obviously you can start with the most basic things like name, address, birthdate, etc, but to the extent that you can capture data every time a user fills out a form it becomes akin to a field linking problem that you can generalize across users.

Obviously the privacy folks here will hate this idea, but a system that could fill out forms for you because it already knows most of the structured data about you would be amazing.

It would be useful more as a tool to help migrate away from e-paper workflows, i.e. emailing PDF scans of legacy paper forms, and towards smarter architectures.

For example: suppose a customer with a workflow where the customer fills out a PDF form and emails the completed PDF to an externality (e.g. supplier, other department whose workflow they can't control, etc.). So you build a modern UX that uses the PDF API to fill out the PDF with the data from the UI, and then email the PDF to the externality, just like before, so that they're on-board with the new UX (since it doesn't affect them at all).

Once you build the new UX, you can then sit with your externality and figure out what they need in order to hook into the modern system. After all, they benefit from a single queryable source of truth, just like anybody else, and the customer is generating that source of truth when they put their info into the UI which ultimately emails the PDF. Once you build the UX for the externality, there's no need to retrain your original customers - they can keep using the modern UX you built for them before. All you have to do is update internally to make sure you're updating the sources the externality's UX expects, stop using the PDF API and emailing the results, and your migration is finished in a pretty simple way that is transparent to the customer.

This is what SeamlessDocs [1] are doing for government forms, and I've been thinking about doing something similar for immigration.

[1] https://www.seamlessdocs.com/

I think there are much greater business opportunities beyond immigration.

I mean, please do it for immigration, that's a good deed.

But I think the money is in using this technology in order to build “fill-in wizards" for businesses, especially law firms and other "conservative", paper-driven industries.

If this is done without an API, I feel like this is worse, because now everyone is locked into using a specific web app which has no real incentive to improve. At least everyone can generate PDFs.

That's what I would like to do for the immigration service that I'm planning to build on top of FormAPI.

For example, every time you visit Vietnam, you have to fill out this visa application form: https://app.formapi.io/templates/c59fc7c0cc4df684a13cf4d04e6... (That's a working example.)

I would love to be able to save my details and automatically fill out most of the fields in these forms.

Very cool! I had a similar idea, taking a smartphone hovering over a form, automatically detecting form type using CNNs, performing OCR and filling in some example data showing how the form has to be correctly filled. There seems to be quite a lot of potential in business for this ;-)

As someone who has spent tireless hours doing the same work(gov forms PDF generation), I love seeing this tool. I’ve been stuck using wkhtmltopdf for too long. Unfortunately, I cannot use FormsAPI for my projects at work, but I will give the service a try to give my feedback!

Thanks for your comment! If it helps, we can offer licenses for on-premise hosting, and white-labeled solutions. (Please contact enterprise@formapi.io for more information.)

I look forward to hearing your feedback!

Very cool! Do you use the pdf standard form controls or is that too limited for this?

My lawyer seem to have a really good system (probably white-labeled), but the pdf's it generates have issued with annotations. I have to print them using Acrobat otherwise checkboxes are sometimes missing and hidden annotations are visible.

I'm parsing the fields from PDF forms, but I don't use them in the generated PDFs. In the end it was just easier to output plain text, shapes, and checks, etc, instead of trying to work with AcroForms and XFA.

(But if someone needs to generate a PDF that uses editable forms, then I can add support for that.)

Your lawyer's system sounds interesting, and I can relate to those rendering issues. PDFs are incredibly complex - the PDF reference [1] has 1310 pages, and I've had to read a lot to understand things like fillable forms and transformation matrices. I ended up fixing a bug in a PDF library where all the PDFs from Google Chrome were flipped upside down - https://github.com/prawnpdf/prawn-templates/pull/20

[1] http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdf...

I read that you don't output fillable form, but you can. May I ask what you have in mind? I was working on a similar project but I could not figure out a way. Ended up making a XDP file...

Hi, yes we can output fillable PDF forms that include the fields you have configured. And yes, you were on the right track with XDP. We would generate an XDP form and store the XML under AcroForm > XFA.

That's actually a great idea. If anyone is looking for a cheaper alternative to Adobe LiveCycle Designer, you could use FormAPI to create fillable PDF forms. Please send us an email at hello@formapi.io if that's something you would use. This feature would be available when you subscribe to our Starter plan, and you could create unlimited fillable PDFs.

how long did this project take to build?

I've been working on this for about 2 months, although I feel like I should have launched an MVP much sooner.

I was tempted to wait even longer and redesign everything, but I want to make sure I'm building something that people need.

wow, that is fast, you might be one of those rare 10x developers! congrats and wish you success!

From the terms:

FormAPI is not PCI DSS Certified or HIPAA compliant.

You understand that the technical processing and transmission of the Service, including your Content, may be transferred unencrypted and involve (a) transmissions over various networks; and (b) changes to conform and adapt to technical requirements of connecting networks or devices.

Unfortunately, this is a service that sees the data being entered on the form. If it processed a blank PDF form and sent back something you ran locally to generate a filled-in form, that would be great. But as a service, it sees too much user data.

Hi, thanks for the feedback! I am planning to move our hosting to Aptible [1], and we will become PCI certified and HIPAA compliant.

If anyone is interested in FormAPI, but requires PCI certification and HIPAA compliance, please send an email to compliance@formapi.io, and we'll let you know when we are ready. You can also send an email to enterprise@formapi.io to inquire about on-site hosting.

[1] https://www.aptible.com/

agree... this looks amazing, but no way I could ever use it... can't send user data to a service like this... DOA...

Thanks for the feedback! I mentioned this in another comment, but we have licenses for on-site hosting, and can provide Docker or virtual machine images. In the future, we are planning to move our hosting to Aptible, and will become PCI certified and HIPAA compliant.

What's with the 3 ellipses? A few of my friends text that way and it feels awkward.

Just pointing it out so please don't shoot the messenger, but I've noticed that a lot of people from China will text like this. FWIW I don't think the ellipses have the same awkward meaning that's conveyed in Western conversations.

It’s standard for unexplicit continuation of a thought...

I honestly can't read this without visualizing someone rolling their eyes after every thought while finishing their sentence sarcastically with a vocal fry lol.

Downvotes? Seriously..?

To the OP: Can I ask what your process was? A project like this obviously takes a lot of time. How certain were you that you had a product people wanted before embarking on the dev?

I mentioned this in another comment, but I've had to fill in a lot of visa application and extension forms while living in Southeast Asia. I wanted to make that process easier, so this was the first step towards that goal.

I've also worked at companies where we built something similar in-house. So while this is a very niche product, I know there's a least a few developers out there who may find it useful.

Building something like this is also a great way to find new problems to solve. I think even if it's a niche idea or something you're not sure about, it's better to start building something than to wait for the perfect idea.

Thanks for sharing that. I think using one's job to identify potential markets is a great idea.

This api has inspired this idea:

Ever notice most new health care provider visits still require a ton of paper work?

Use mobile app to take a picture of their arbitrary form, use OpenCV to interpret it as as real data structure, autofill it from a personal database, and print it out on one of those tiny portable printers, and hand it back to them.

To try and make it viral: When used to process a form, the app also generates a web site for that office. Then anyone can search for dr. lowtech’s office and use the site to form and print them at home before coming in.

Each printed form would have a header/footer asking the provider to register on the site, at which point it could become an official entry point into the system and leverage other efficiencies. Some hipaa stuff applies but nothing insurmountable.

I guess everyone else here also has 10 ideas a day pop into their head for a startup. So on to the next one...

@nathan_f77: That's an unusually large plans and pricing page:


I'm not saying that it's bad, but I think it's strongly worth your while A/B testing that idea because it's quite granular.

You may find that with just 3 plans you would leave less money on the table over time.

(Of course you would also have a "call for enterprise options" link.)

Another option is (and I did this with pluggio) is to let, say, 50 people in for a very low special price and then observe exactly how many api calls they make.

Then based of that data work out your usage bands and magic levers.

Also, I note your only magic lever is usage but I bet there are other levers you can use to get people from one plan to another.

Anyway, I hope this is useful feedback!

Great job on the site :)

Thanks for the great feedback! We definitely need to test various pricing and sign up pages. And yes, we will be watching how our customers use the service, and have already discovered a lot of things that we need to improve.

Hey guys,

Not trying to plug our co but its totally relevant here...we built a field detection algorithm to find the fields in any form. So you can drag n drop your doc and start filling it in and sign it within a couple of seconds. Feel free to check it out. Thanks guys! https://www.paperjet.com/

This is great! How do you detect the fields? Are you using image processing?

Most forms contain personally identifiable information. Some contain extremely sensitive information, for example the kinds of forms that you describe-visas, etc.- contain national ID numbers and so forth. The idea of providing this to a third party web site scares the daylights out of me. Is it possible for me to obtain the source code on Github and host it myself?

I totally agree, these forms can contain very sensitive information. Actually there is some information that is too sensitive - We are not currently PCI DSS Certified or HIPAA compliant, so we cannot handle credit card details or protected health information.

We are very serious about protecting PII. One of the ways we achieve that is by using battle-tested frameworks and libraries with the default settings (Rails and devise), and not writing our own code for crypto and security.

By default, we delete generated PDFs and any associated data after 7 days. This can be configured in the template settings [1], so you can make the retention period much shorter. You can also immediately delete any submitted data by making an API request [2]. (Disclaimer: Data may be present in our automated database backups for up to 2 weeks.)

Finally, FormAPI is not open source, but we can provide a license for a self-hosted installation. For enterprise plans and on-site hosting, please contact: enterprise@formapi.io

[1] https://formapi.io/docs/template_editor/settings.html#expire...

[2] https://formapi.io/docs/api/expire_submission.html

> We are very serious about protecting PII.

Problem is, everyone (from Equifax to Yahoo) says that. If I can't trust a huge multi-billion corporate, I would certainly be nervous about trusting a very new startup with much more than my email address.

Yes, that is very true.

I am planning to move my hosting to Aptible [1], and will become PCI certified and HIPAA compliant.

If anyone is interested in FormAPI, but requires PCI certification and HIPAA compliance, please send an email to compliance@formapi.io. We'll let you know when we are ready. You can also send an email to enterprise@formapi.io to inquire about on-site hosting.

[1] https://www.aptible.com/

Congratulations on your launch!

On the one hand, I understand the concerns about PII in your app.

On the other hand, I'd be willing to bet there are a ton of line-of-business apps that don't handle PII that could benefit from this (purchase orders, B2B shipping forms, etc.).

Unless you have investors, I would suggest waiting until you get >100 paying customers to find out whether you need to pay the premium for PCI/HIPAA hosting.

Thanks for the feedback @runako, yes PCI and HIPAA compliance is very expensive. I will need to hear from more customers who need this level of security and compliance before I can afford to make the jump.

This looks really good and the demo is really effective. Best of luck to you!

Was DocRaptor a big influence on how you priced it?

Shameless plug: I've also recently been working with PDFs - a PDF template creator and API [0] - but I have a different use case in mind.

[0] https://fetchpdf.com

I've been working on a similar tool for a while now. Editor + API to create PDF and HTML documents [0]. Our main difference is that you can integrate the editor with your SaaS and allow your users to design the templates and generate documents via API.


Hi I like this.

Are you aware of https://www.docmosis.com/ ?

How do you compare to them?

I wasn't aware of them, no. They're definitely more established and have more features.

The goal of FetchPDF was to let web applications give their users a way to create and edit PDF templates in the browser. e.g. to customise system templates for invoices, certificates etc.

I still have a lot of work to do to, it would seem.

Don't let it discourage you. They have their own set of problems, the first one being that they use libre office for templates, a piece of software that not everybody likes or is familiar with.

I've been a happy customer of docmosis, but I'm still interested in using your service if it provides easier templating.

Please feel free to sign up for the trial.

Send through an email to the support email address and I'll upgrade the trial and set an API key without you needing to put in billing details.

But be warned - it's still in MVP stage and may disappoint! (It also means that it's very receptive to feedback :)

All right, I signed up :)

The big selling point for me is that the template generation is right there. It is much easier to explain to people than saying: All right, so you download libre office, create a document, upload it here, name it like this or that...

Hi, I might not find time for this at the moment, but I'll remember your service next time we need a new document.

Hi, thanks very much! Yes, I really liked DocRaptor's sign-up page and pricing.

FetchPDF looks great! Best of luck to you as well!

Hey Nathan! Great to see another product in the PDF space. I see a few similarities to PDF Otter[0] and was wondering if you drew any inspiration from it.

[0] https://www.pdfotter.com

Hey, I did come across PDF Otter a few weeks ago when I found your Show HN post [1]. There was some great feedback in the comments. I started building FormAPI a few months ago, and should have done more market research!

[1] https://news.ycombinator.com/item?id=14805750

No worries! Feel free to email me if you want to collaborate: Mariusz at PDF Otter dot com

Looks good. I think some real life use cases would help on the sales site.

So the use case is you have a company that only accepts PDFs as an input and you have a digital data set that you need to get each individual row into one PDF?

How do you handle signatures?

Not OP but one use case is charitable grants. I made an app for a homeless agency my girlfriend used to work at for automatically filling out PDFs. They spent an inordinate amount of time filling out grant application PDFs, most of which requested essentially the same info. We made a simple WPF app that let them tag each field on the PDF as a particular info type (e.g. address, or address line 1, or statement of organization's purpose) then click a button to autofill. They keep all the data in a single underlying spreadsheet for convenience and just update that, rather than treating each of their hundreds of applications as bespoke.

Also I work at an accounting firm and we have a home grown solution that does precisely this for all of the standard IRS forms we fill out. For a larger firm like mine, we can afford to have someone code the solution in house, but for smaller firms or for in-house accounting departments, this could be worthwhile, depending on the price point.

Thanks! Totally agree, there's a lot of room for improvement on the landing page.

That use case sounds right. I've mainly been thinking about tax and immigration forms, or filling out financial and legal documents, etc.

There's no special field type for signatures at the moment, but you can upload a signature image, or use a text field with a handwriting font (Dancing Script [1]).

I will probably add the signature_pad [2] library to the online forms in the near future, and will add a special "signature" type. There might also be a new flow where you can request a signature via the API, and we'll send an email to the person who needs to sign the document.

(I actually just need to get a few more customers first, so that I'm building something that people need.)

[1] https://fonts.google.com/specimen/Dancing+Script

[2] https://github.com/szimek/signature_pad

There exists also php-pdftk (https://github.com/mikehaertl/php-pdftk). Great software for filling pdf forms.

I like this service as well: https://www.platoforms.com.

Would be interested to hear a bit about the internals. Which PDF libraries you tried, which one you ended up using, why, etc.

I built the site using Ruby on Rails, and ended up trying all the PDF libraries I could find. In the end I had to use six different libraries and tools to solve various problems with parsing or transforming PDFs.

The libraries I use are: pdf-reader [1], prawn [2], prawn-templates [3], origami [4], qpdf [5], and pdftk [6].

There's a new PDF library for Ruby called hexapdf [7], and it looks really promising. I think it might even be able to replace all six of these libraries, but it's released under the AGPL license, and commercial licenses are not available yet.

[1] https://github.com/yob/pdf-reader

[2] https://github.com/prawnpdf/prawn

[3] https://github.com/prawnpdf/prawn-templates

[4] https://github.com/gdelugre/origami

[5] http://qpdf.sourceforge.net/

[6] https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/

[7] https://hexapdf.gettalong.org/

Will there be an API sometime in the future? Also will it be possible to whitelabel the template editor and/or make it embeddable?

this is an interesting idea. i like how it is declarative.

acrobat FDF[1] and Apache PDFBox[2] provide something similar but are not declarative. i'd be interested in seeing an approach with mozilla/pdf.js ... though i'm not sure how usable it would be.

[1] http://www.adobe.com/devnet/acrobat/fdftoolkit.html

[2] https://pdfbox.apache.org/

Is this web form to pdf or pre existing fillable pdf to json?

FormAPI converts any existing PDF into a template that can be filled out via an API call, or via an online form.

If you upload a scanned PDF (e.g. plain images), then you can add form fields in the template editor. If you upload a pdf with a fillable form, then we will automatically import all of the existing fields (and the imported fields can then be modified or removed.)

I'm really confused. Is this intended as a programmer's tool? Or as an end user tool?

If I understand it, create an HTML form, submit to server, spit back PDF. This has been built into ColdFusion for about a decade, and it is trivial. [and I assume other server side software can do the same thing].

Aside from that, the bulk of the PDF Forms I download these days can already be filled out with Acrobat. I'm not sure what benefit you're offering me in time savings.

This is a tool for programmers who need to fill out a lot of PDFs automatically. For example, a freelancing service could use FormAPI to fill out W-9 forms for contractors.

Or if I built a service for end-users to fill out and sign PDFs, I would create a separate website and use FormAPI to generate the PDFs. (I don't think I will do this, though.)

You're right that if you just need to fill in a single form, then there are easier ways to do that. (Even just using the Preview app on Mac).

If I download a W9 from the IRS web site I can fill it out directly using Acrobat. I'm pretty sure the free reader is all you need.

I guess I'm not understanding the niche you fill.

Good luck with it, though.

Why exactly is this a web API? Don't get me wrong, it's great work, but it seems like something that would be much better suited to a library.

He can make money off of a web API. A library would be hard to keep closed source.

I think even if you built something like FormAPI internally, it's great to run it as a self-contained service with a job queue that can retry failed jobs. For example, if an image download fails, we will keep retrying the job, and then will post a webhook once the PDF is processed. You also don't need to worry about installing a myriad of PDF and image processing libraries, and you just make a single API call to generate your PDF.

Fantastic work! The UI looks really slick! How long have you been working on this project?

Thanks! I've been working on this for about 2 months.

I've got formy.io if you want it. Was building something similar way back.

Cool project!

Shameless plug: if you need the opposite, i.e. getting text/structured data out of filled PDF forms (or any kind of PDF document), feel free to contact me.

Emailed you Nat (when you get a chance).

How does this handle user signatures?

There is currently no signature support on the online forms, but that's something I would like to add soon.

When generating a PDF via an API request, you can add an image field, and upload your user's signature as an image. You could also create a faux signature by using a text field with a handwriting font (The Dancing Script [1] font is available.)

This is a tool for developers to fill out PDFs, so in your own application you could use something like the signature_pad library [2], and then pass the signature images to FormAPI.

[1] https://fonts.google.com/specimen/Dancing+Script

[2] https://github.com/szimek/signature_pad

after you fill out the form, does this return back a PDF to you with fields filled out?

Yes, when you submit the form, we fill out the fields, and then you can download the generated PDF.

will the generated PDF be in the original format of the PDF?

Yes, we return the original PDF with the fields filled out. However, for PDFs with "fillable forms", we remove the forms, so the PDF is no longer editable.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact