
Zapier Email Parser: Extract Text From Automated Emails - bryanh
http://parser.zapier.com/?welcome=back
======
sunsu
This is a killer feature for Zapier and i'm really excited. Zapier has been a
huge time saver for us at BetterVoice.com.

How many times do you get approached by customers or other vendors about "when
are you going to integrate with XYZ" product? If you just integrate ONE time
with Zapier, you're immediately connected with over 200 other online services.
Then your answer to those customers/vendors can be: "We're integrated with
Zapier, so you guys should integrate with them too and you'll get all the
other benefits as well". With that answer you accomplish 2 things: 1) you
shift the burden to them (2) you can use Zapier as a "testing ground" to see
how popular an integration with a particular service is. If one particular
Zapier connection starts taking off, then you can consider a direct
integration. If only a few people use a particular service connection, then
you haven't lost anything.

~~~
userbinator
For some reason, this comment really read like a marketing testimonial.

~~~
ceejayoz
If you want infuriating marketing, try Googling for an integration between two
obscure web services.

The Zapier "Zapbook" automatically makes a page for each possible combination
of the services they hook into, giving useless bullshit like
[https://zapier.com/zapbook/ducksboard/aim/](https://zapier.com/zapbook/ducksboard/aim/)
(which, by the way, isn't the same as
[https://zapier.com/zapbook/aim/ducksboard/](https://zapier.com/zapbook/aim/ducksboard/)
either).

~~~
userbinator
Zapier lets you connect apps together in interesting ways, whether or not it
really makes sense to. Maybe someone out there will find it useful to be able
to manage their dashboards with instant messages.

~~~
bryanh
Indeed, we actually have users doing both of those examples! (None have opted
to share their zaps, so the parents' links remain barren.)

------
bryanh
Happy to answer any questions! This tool has been in use for many months by
some select Zapier users and we decided to finally release it. I definitely
want to open source the core extractor bits and document the REST API that
powers the Zapier integration.

We have more information on using it in Zapier (the main use case at the
moment) here: [https://zapier.com/zapbook/updates/308/introducing-zapier-
em...](https://zapier.com/zapbook/updates/308/introducing-zapier-email-
parser/)

~~~
nswanberg
Very nice! This looks very useful and easy to use.

How much boilerplate text do you need on either side of a token in order to
identify it? Put another way, how much can the template emails vary? If the
template format changes is there any sort of notification? Did you use the
simplest implementation that could work or is this much more complicate than
it looks?

~~~
bryanh
It can actually get pretty complex, the technique we're using is a wacky hacky
hodgepodge of Google diff-match-patch that works _surprisingly_ well! If you
run into any that don't work, just let us know and we can add it to the test
suite and figure it out.

~~~
funkiee
We've got some particularly complicated html RegEx for Email parsing at our
company. We manually write new ones for new email layouts as we get them. I'd
be interested in any information on how you're solving the issue, as I love
how you've tackled it at least on the UI end.

~~~
bryanh
Sure! In very broad strokes:

First, download yourself a copy of Google's diff-match-patch.

Second, make a template for the email you have (think "Your shipment will be
delivered {{date}}. Thank you!" vs. the original raw email "Your shipment will
be delivered 2014-04-04. Thank you!").

Third, run it through diff-match-patch.

Forth, walk over the change tree and record the insertion (1), a deletion (-1)
or equality (0) transformations (one as keys the other as values).

(There are a _lot_ of edge cases to handle between the forth and fifth step,
but test cases make those pretty obvious (if not very frustrating.)

Fifth, collate the keys/values into a dictionary and do some last minute
cleanups.

We will be documenting a REST API so you can use parser.zapier.com directly,
and it is pretty easy to forward emails automatically to our robot (so you can
conceivably avoid writing anything at all and just use the app).

------
azinman2
Good luck Zapier. I've spent many an hour on this exact problem.

If the templates don't change at all (no ads, nothing contextual or optional)
then this is possible, but in my experience emails have a surprising amount of
variation when trying to do stuff like this. Paid with an online check this
time versus a credit card? Billing address, last 4 numbers, etc might now be
gone, which totally messes up the extraction.

Hopefully though people will be using this for more niche tools than Amazon
receipts.

~~~
ma2rten
It's difficult, but not impossible. My university has a spin-off company,
which does parsing of documents. One thing that they do is parsing of CVs. You
give it a pdf file of any CV in any format and it will convert that into a
machine readable format. Obviously that requires domain knowledge, but it's
possible.

I don't know how Zapier works, but it is possible that they do some kind of
fuzzy matching, that is robust to those things.

------
nostromo
Can someone share a use case?

It seems cool for... _something_.

edit: thanks all, very interesting.

~~~
bdunn
Sure, so whenever someone cancels their Planscope account I send myself an
email with their name, email address, LTV, plan, and cancelation reason (this
is also stored in my database.)

With this, I could simply CC that email to Zapier, they'll yank out all that
info and shove it into a Google Spreadsheet, which will let me do certain
things that would be a pain to program myself.

~~~
bratsche
I don't understand.. if you send automated emails to yourself in a human-
friendly format, why not just use a Google API to push the same data into the
spreadsheet at the same time?

Zapier may be really cool, but it still seems like you're taking raw data,
converting it to human-readable form, then sending that to Zapier to attempt
to extract the raw data again. Or am I misunderstanding?

~~~
jxf
The difference is that you have to do new work every time the destination of
the data changes. Using an interface to say "here are the relevant parts" is a
lot nicer than having to code it yourself.

~~~
stephenson
Or just push your entire object to google spreadsheet:
[https://github.com/firmafon/to_google_spreadsheet](https://github.com/firmafon/to_google_spreadsheet)

------
chezmo
Congratulations Bryan for launching it!

We are doing the same at [http://mailparser.io](http://mailparser.io) and I
can confirm that there is a real need for a solution like that. A lot of our
existing customers use mailparser.io in combination with Zapier. Those
customers can now directly use the parser of Zapier. Which sucks for us but
which is surely great for the customer and Zapier ... :-)

One caveat though, a lot of use-cases are not the static "contact form" email
where nothing moves except the values. We get a lot of requests for parsing
lists, tables etc. Curious how the chosen approach works on this kind of
parsing jobs.

~~~
carlosrt
How would you parse lists and tables from a website? e.g. parsing an online
schedule in order to use the data in a responsive mobile or native app?

Random site example: peoplewhodance.net/events/NYE/schedule.html

~~~
bryanh
You might look into [http://www.kimonolabs.com/](http://www.kimonolabs.com/).

------
eo3x0
This seems really useful but I can't think of an example usage. Can someone
share an example where this is useful?

~~~
patio11
"Set up receipts@example.com as Google hosted email, automatically parse all
Amazon/SaaS subscription/etc emails and update the bookkeeping system" would
already save me (or, well, my bookkeeper) 5+ hours a year.

~~~
aymeric
Shouldn't this kind of notifications be handled with the Paypal / Amazon API
directly for more reliability?

~~~
patio11
These are business expenses rather than things I am selling. As such, they're
on my corporate credit cards. Regrettably, those don't have API access, and
the best way to get information from them is currently "Log in once a month,
dump a PDF file, and email it to the bookkeeper."

~~~
davidw
As a tech guy, I always feel extremely guilty about it, but my Italian
accountant is cool with me dumping a big pile of receipts and papers in his
office every few months. Even US accountants are often more or less ok with
this, even if it makes me a bit sick to my stomach.

~~~
nthj
> As a tech guy, I always feel extremely guilty about it

As a tech guy, business people routinely dump big piles of raw data in my
office every few days. I mutter a bit about how this is why we invented
databases and then laugh all the way to the bank.

And I guarantee you the business people don't feel at all guilty about it.

------
dsugarman
We have been using the feature for a while, it has become an integral part of
our system. The Zapier team is great and always willing to help out with any
issues we have, above and beyond what you would expect.

------
fenguin
This is incredibly clever. Zapier has already built a fast-expanding platform
to relate structured business data, and now is taking steps to grow it even
further by structuring unstructured data. SaaS mainstays, watch out.

------
fasteddie31003
If you think this feature is cool, I would encourage you to take a look at my
company: Taskflow.io

It's basically a visual drag-and-drop tool to automate business processes.

You can drag-and-drop many different actions together (including email) to
automate and track many processes you organization might have.

I would love to hear your feedback as we are just getting off the ground.

------
nsxwolf
I just noticed Google started doing something like this. I got Google search
results about purchases I'd made from Apple. They parsed a receipt from my
Gmail account and displayed it in a card in my search results.

I was a bit taken aback by that.

~~~
est
IIRC, this is not _parsing_ , there is a Gmail API mentioned during 2013
Google IO

[https://developers.google.com/schemas/tutorials/google-
now-c...](https://developers.google.com/schemas/tutorials/google-now-cards)

[https://developers.google.com/gmail/actions/getting-
started](https://developers.google.com/gmail/actions/getting-started)

~~~
nl
They do parsing as well.

I forward all my work flight itineraries to Gmail (so I get Google Now
notifications) after cut & pasting them out of the Word document I get them
in.

Gmail parses them and displays them using the "Flight details" format, even
though there is none of the information required for the "Actions" thing
linked above.

~~~
recmend
Gmail doesn't do any parsing, Gmail provides the SDK described here
[https://developers.google.com/schemas/tutorials/google-
now-c...](https://developers.google.com/schemas/tutorials/google-now-cards)

When developer / brands create emails following Gmail quick action guidelines,
and you have Google Now enabled, then you will see Google Now cards.

~~~
nl
Did you read my post?

I cut and paste plain text out of a word document and send it, and that
triggers the Google Now cards.

It's parsing

------
gedrap
It seems like an awesome feature. Just I don't like the landing page... If
patio11 wouldn't have shared on twitter saying how awesome it is, I would have
closed without caring too much - just one more thing on The Internet I simply
am not bothered with.

The explanation, the first paragraph simply doesn't really give a clear idea
on what it does and what's the benefit for me, and, imho a random user is not
really motivated to try and figure it out.

I believe a simple nice diagram

[email] -> [zapier] -> [internet]

with some example integrated in it would do a much better job.

------
saganus
Could anyone more versed in being a sysadmin comment on using this for
logging?

I know there are things like logstash, but this one seems like something that
could be complementary. What about feeding log files for known events, and, I
don't know, have a pretty dashboard will all the stats or something? what
other use for this feature plus logging can you think of?

Disclaimer: I'm currently thinking of a way of doing logging and security
audits to some CA server I'm about to configure, so right now everything looks
like a nail I guess...

------
schrodinger
I think you've got a typo of "chose" vs "choose" in the step that asks the
user to "chose" where the data should be sent.

~~~
bryanh
Ah, thanks! Fix committed and pushing.

------
baghali
Is it possible to extract repeating patterns? example would be extracting repo
name, number of stars, language and project description from GitHub Archive
daily digests.

sample mail: [http://us5.campaign-
archive2.com/?u=439aa16a39e4b10e0b65ff2e...](http://us5.campaign-
archive2.com/?u=439aa16a39e4b10e0b65ff2ef&id=65070f9ad1&e=d43ad64ef4)

~~~
bryanh
It is not possible to extract something (today) that would be generated inside
a for loop. I'd be curious if anyone has ideas on how you might do that.

If you need support for more complex emails and extraction rules, you might
look into using [http://mailparser.io/](http://mailparser.io/).

------
lugg
Would love to see an open source tool like this for parsing general text / csv
/ fixedwidth / html in a similar way.

Very nifty concept.

~~~
nathanathan
I made an open source tool that does something like this using a modified
version of the Levenshtein distance algorithm:
[https://github.com/nathanathan/fuzzyTemplateMatcher](https://github.com/nathanathan/fuzzyTemplateMatcher)

Here's a demo you can play around with:
[http://nathanathan.com/fuzzyTemplateMatcher/](http://nathanathan.com/fuzzyTemplateMatcher/)

~~~
notlisted
It has some problems e.g. take out the dog, call my mother (it is entirely
possible I missed the point of the code)

------
rfelix2121
We’ve been doing this for a couple of years over at
[http://getdispatch.com](http://getdispatch.com), with a particular focus on
sales leads.

------
mrmondo
A host-it-yourself version of this would be nice.

------
rtpg
can someone here explain the difference between ifttt and Zapier? At least
from a macro perspective they seem the same

~~~
corkill
Zapier has more business integrations. IFTTT more consumer.

------
n1ghtmare_
Man, that's some cool software ! I love it when I see software that's simple
yet high quality. Good job.

------
earlyriser
I would like this for PDFs. Do you know an app that can handle that?

~~~
rahimnathwani
If the text in your PDFs has enough structure (i.e. fixed field names or
punctuation) then it should be possible to convert the PDF to text and then
pass it to Zapier's parser.

~~~
earlyriser
Great to know!

------
johnpmayer
Make an enterprise offering. Ops teams would kill for this.

