Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: NLP and NER-powered beautiful bank transaction feed (herondata.io)
35 points by ferradas on Oct 28, 2021 | hide | past | favorite | 13 comments


We recently shipped a huge improvement to our merchant extraction API (which you can try in the link above), and I wanted to share it and get your thoughts.

Our new approach consists of using custom Natural Language Processing and Named Entity Recognition models to predict a) what substring of a bank transaction string represents a merchant e.g. "FACEBK" and b) which canonical merchant e.g. "Facebook" this substring corresponds to in our merchant DB.

The toughest steps were to manually label thousands of bank transactions very carefully to use as training data, and to deploy the models in a production env where we need API response times to be under 200ms (usually what's required in order to incorporate this API in a payment auth flow).

We always optimize for accuracy (>99% currently), because we never want to return an incorrect merchant, but with this new approach our coverage is now at 80% of all bank transactions we see through our system.

We would love any feedback and comments, and also happy to answer any questions about the product or how we productionized it!


What models are you using? Your problem seems simpler than general NER (just identifying a subset of entities). I would wager a good ol' LSTM or GRU can do it just fine.


Hey! I'm the lead ML engineer behind the solution.

We have trained our custom NER to detect merchants using millions of transactions as training data rather than a general NER.

We have tested with a few general NER models but they weren't detecting merchants properly. We haven't tested with LSTM/GRU yet but that's a good suggestion!


He was asking what algorithm you're using. Are you using a MaxEnt classifier, CRF, something else?


Sorry for not getting this right away: but are you solely matching the string to a merchant and then returning all of the (already-stored) information for that merchant? Or are you joining that to some other public record of transactions to get additional information?

In other words, in the example on the homepage, is all of the returned data except the dollar amount static data from a DB once you have matched the transaction to the merchant (Google in this case)?

And then sort of unrelated, what are your long-term plans? If step 1 is perfectly hydrating a merchant to a transaction, what is step 2 (if there is one)?


Great questions.

Yes, we match the raw string to a merchant with extra static fields (URL, icon, logo, Merchant Category Code).

In our mind, merchant enrichment unlocks many possibilities. The most obvious one is to have a nice and modern bank transaction feed. But our customers also use us to lock company cards to particular merchants, group by merchants to see where people are spending the most, and more. Would love to hear if you had other "step 2"s in mind!


This is awesome work! I have a similar problem in a different domain and I'm curious how you match against a "canonical merchant"? Do you apply some fuzzy matching / string similarity against the result of the NER? Or do you have an E2E ML model doing this task (a classifier?)


Thank you! We require an exact match to our merchant DB. We experimented with fuzzy matching in the past, but due to the sensitivity towards false positives in this domain, it didn't work for us.


Looks nice! How many transactions can the model manage at a time?


Thank you!

The average response time for this endpoint is ~200ms, and we guarantee this latency for up to 100 requests per minute.

In production we currently handle ~1 million transactions a day with no problem.

Does this answer your question?


Awesome! Does this work only in US? Would have loved it in India


As Ahmed said, the model is trained on UK and North American data but feel free to test it out on Indian merchants too. It will match the bigger / more international ones and still clean up the transaction strings.


Hey Debdut,

This currently supports merchants present in the US, UK and Canada mostly!




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: