Hacker News new | past | comments | ask | show | jobs | submit login
Base64.ai – Extract text, data, photos and more from all types of docs (base64.ai)
90 points by alierkurt on Feb 10, 2021 | hide | past | favorite | 52 comments



Somewhat confused by the naming choice here. Naming your company after something as fundamental as base64 encoding seems to inevitably lead to confusion down the line.


Sorry you find it confusing. Our vision is to provide AI services for everything in base64 format; images, videos, sounds, etc.


Doesn't matter. Facebook will buy them for 2B in another year and a half and will get folded into the mix


I looked into OCR a while ago for some hundreds of thousands of pages of PDF. All hosted offerings would end up costing quite a bit.

After looking at options and few tests, I figured I'd use https://github.com/jbarlow83/OCRmyPDF It converts the PDF to an image for Tesseract and then recreates the PDF with the text copy-able.

It won't identify the address part of a driver's license, but that wasn't necessary for this project.


Few years back I worked with someone to build an Android OCR app.

At the time there were not may apps out there and we partnered with a 3rd party service who did the OCR off the app so our quality of conversion (at the time) was close to state of the art from a mobile once people got comfortable with this method (which of course not everyone did).

We made some decent money as a side project from it but I also started to appreciate the sheer complexity of OCR.

We spent a lot of time fine tuning pre-processing before hitting the OCR engine (e.g. orientation, shading) small changes here made huge impact to performance. We also built various prompts to guide the user on how to take the photo to help. Managing expectations was something we were very conscious off and it was tough.

The unexpected use (but rewarding) use case was when we found people who were blind started to use the app to help with their daily lives - only a few but it was making a real impact to them so we priortized a few features to this segment knowing we were drifting away from maximizing revenue but we were cool with this as it was not a primary income source.

In the end we all moved to other things, more apps / services came on the market, google lens became a thing so we decided to sunset the product and did our best to manage customers through this process.

A rewarding experience overall - lots of lessons were learn that I have used elsewhere in my life since and ticked off' Build an app that made thousands of $' of my bucket list (which yea I should probably review!).


Interesting!

I've been thinking of running OCR on video frames. I'd also like to do speech-to-text extraction for searching my archives later (have about 4TB of video to trawl through, and desire text-based search capabilities). It's an interesting space to explore, but everything's been moving to web-service at a cost-prohibitive model.


Should be able to use ffmpeg[0] to extract a single frame each second/keyframe (doubtful it's worth doing every single frame) and then pass it to tesseract.

For speech to text.. if english, try mozilla's deepspeech? https://github.com/mozilla/DeepSpeech

Might be fun to try.

[0] https://stackoverflow.com/questions/27568254/how-to-extract-...


Yup, was planning to use ffmpeg (or, more likely, OpenCV), and a subset of the frames.

Thanks so much for the tip on DeepSpeech!


@Darkphibre; we are happy to provide you an AI that takes in a video and outputs OCR and speech-to-text. With Base64.ai, you don't have to worry about the implementation details, and focus on your projects. Let's have a meeting to discuss more? https://base64.ai/meeting


For speech-to-text extraction you can try Silero [1].

Free software (AGPL-3.0 License), fast, highly accurate and extremely simple to deploy (I have no affiliation with them).

[1] https://github.com/snakers4/silero-models


Thanks for the heads up! Will definitely check it out.


If you’re looking to index/ process video - maybe we can help. Checkout Vidrovr (https://vidrovr.com)

Full disclosure im one of the founders.


It's not really working. Tried 2 English PDF invoices. Normal format. One came back empty, the other only had the amount right.

I'm assuming they only trained on some specific documents (passport of country X, etc) and all others don't work.

If someone processes the same document all the time, then my invoice2data project may work better and is open source. It's based on Regx, rather than machine learning: https://github.com/invoice-x/invoice2data


Stay tuned for my new biotech startup, http.ai.

What was the process resulting in this name?


http.ai is a cool name too! Our vision is to provide AI services for everything in base64 format; images, videos, sounds, etc.


What about the liabilities of sharing data with a third party? Your are sending all kind of data to a third party processor.

Edit: I am not being critical, I am really asking.


All I can find on this is

> Base64.ai SOC 2 compliancecertifies our bank-level security standards. Our API does not store your data to prevent possible data breaches. All API traffic must be authenticated and encrypted over HTTPS.

Sounds... Good enough? I mean, for what it is, it sounds like it's at least trying.


In Europe that falls far short of the requirements of GDPR law for personal data.


that's not a question with an answer, it's a negative point for any third party processor such as this one.


@robarr that's a fair question to ask. Briefly, Base64.ai neither stores the images you sent, nor their extracted data. We provide the power and extensibility of the cloud without the risks of a data breach. Base64.ai complies with GDPR requirements too. Our SOC-2 compliance report details the extend of security measures we take for your data. Happy to share the report for your review under MNDA.


Does your solution have any unique features or benefits in comparison to existing solutions like Acuant, MicroBlink or Regula? Those already classify various documents and extract the data pretty well.

https://www.acuant.com/idscan-data-capture-software/

https://microblink.com/products/blinkid

https://api.regulaforensics.com/


They are good too, but we have products and services that match their offerings at a fraction of their cost.

We also offer products that they don't provide. Our AI is capable of analyzing sound data (speech to text). It is extensible to add your custom forms and document types. We provide a cloud API and RPA components for UiPath, Bardeen and other RPA providers. We built Base64.ai so that you won't need a new vendor for new document types and platforms.

Happy to meet over Zoom if you want to learn more https://base64.ai/meeting



A dollar per page? :O


Yeah, this will never take off unless they can get pricing below 1c per call.

Manual data entry for an _entire page_ of text is about 15c, or 10c at volume.


Maybe the plan is to compete on latency? Say someone wants to regularly extract content from the same kind of document, and wants it fast, like 500 milliseconds.


perhaps its cheaper than paying for 401k/benefits etc


generally people doing these sorts of tasks arent full time employees, rather contractors


We have startup plans that start free and runs at 10 cents/page after volume discounts. We also offer prices in local currencies. Happy to work on a deal that works for you.

We are a pure AI company, i.e. there is no human-in-the-loop. We are and strive to be more accurate than manual labor, and our processing time is 1 second rather than minutes-to-hours. Also our AI is naturally unbiased and does not discriminate.


> We are a pure AI company, i.e. there is no human-in-the-loop.

In that case, shouldn't it a fraction of a penny rather than a whole dollar? Automation is supposed mean lower costs.


That’s more expensive than manual data entry!


And has a 1 second response. That is worth something.


But not a dollar.


Appreciate the feedback. How much do you think would be fair?


The pricing makes me wonder if it's an AAI (Artificial Artificial Intelligence) service?


with this pricing model I'd expect they're just reselling something like the GCP OCR APIs, most likely with some domain specific value adds


yes but we have to cover the costs as well :) we're flexible on pricing based on the volumes. what you see there differs based on the needs but we always aim to find a common price point for all parties.


Base64.ai has nailed time to value for customers. It’s pretty straightforward to integrate with and their extensive list of models makes it really easy to process a wide variety of document types. We used it Bardeen.ai and couldn’t have been happier. Kudos for a great service!


Base64.ai is a cloud API that can extract data, photos, and signatures from all types of documents. We have prebuilt models for IDs, driver licenses, passports, visas, invoices, and many more document types. The integration is only a single API call.


Ali, you may want to add some "About us" page.

Sending such sensitive data "into the cloud" is no joke, for any company.


thanks for the feedback. as replied earlier, Base64.ai neither stores the images you sent, nor the extracted data. We provide the power and extensibility of the cloud without the risks of a data breach. Base64.ai complies with GDPR requirements too. Our SOC-2 compliance report details the extend of security measures we take for your data. Happy to share the report for your review under MNDA.


We've been working with Google Cloud on a very difficult data extraction problem for about 6 months now. Seeing very impressive results with their DocumentAI service. One of my teammates is planning to try this out on some of our data this afternoon though!


Thank you! We're here to help. Please pick a time in our calendar https://base64.ai/meeting


Really confusing name


Sad that it doesn't seem work for HTML! Maybe I will try taking a screenshot... Otherwise cool though, looks very promising.


just tried the Android app. very slow. didn't return any result for simple basic text card (black text on white paper).


another text. some warranty info of some product in multiple languages was recognized as "drivers license":

"First Name": "400 MHz ~2433,5MH:"

"Issuing authority": "0MHz~2833.5MH:"

may be the requirements about the documents the system can accurately recognize need to be explained in the app.


If you want our AI to learn warranty info documents, we're happy to work together. Let's meet http://base64.ai/meeting


Maybe a kind to artificial AI with lots of manual verification and templates? Hence the price.


It says that it does not store the submitted data. If true, then it's essentially just a trained model that we get to invoke for a dollar per API call to get the output from.


This is accurate. We train every document models upfront and make it available via the API. We believe well-trained, high-quality models don't need retraining, just like humans don't need to re-learn reading every day.


The demos are not working for me. Not finish processing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: