Somewhat confused by the naming choice here. Naming your company after something as fundamental as base64 encoding seems to inevitably lead to confusion down the line.
I looked into OCR a while ago for some hundreds of thousands of pages of PDF. All hosted offerings would end up costing quite a bit.
After looking at options and few tests, I figured I'd use https://github.com/jbarlow83/OCRmyPDF
It converts the PDF to an image for Tesseract and then recreates the PDF with the text copy-able.
It won't identify the address part of a driver's license, but that wasn't necessary for this project.
Few years back I worked with someone to build an Android OCR app.
At the time there were not may apps out there and we partnered with a 3rd party service who did the OCR off the app so our quality of conversion (at the time) was close to state of the art from a mobile once people got comfortable with this method (which of course not everyone did).
We made some decent money as a side project from it but I also started to appreciate the sheer complexity of OCR.
We spent a lot of time fine tuning pre-processing before hitting the OCR engine (e.g. orientation, shading) small changes here made huge impact to performance. We also built various prompts to guide the user on how to take the photo to help. Managing expectations was something we were very conscious off and it was tough.
The unexpected use (but rewarding) use case was when we found people who were blind started to use the app to help with their daily lives - only a few but it was making a real impact to them so we priortized a few features to this segment knowing we were drifting away from maximizing revenue but we were cool with this as it was not a primary income source.
In the end we all moved to other things, more apps / services came on the market, google lens became a thing so we decided to sunset the product and did our best to manage customers through this process.
A rewarding experience overall - lots of lessons were learn that I have used elsewhere in my life since and ticked off' Build an app that made thousands of $' of my bucket list (which yea I should probably review!).
I've been thinking of running OCR on video frames. I'd also like to do speech-to-text extraction for searching my archives later (have about 4TB of video to trawl through, and desire text-based search capabilities). It's an interesting space to explore, but everything's been moving to web-service at a cost-prohibitive model.
Should be able to use ffmpeg[0] to extract a single frame each second/keyframe (doubtful it's worth doing every single frame) and then pass it to tesseract.
@Darkphibre; we are happy to provide you an AI that takes in a video and outputs OCR and speech-to-text. With Base64.ai, you don't have to worry about the implementation details, and focus on your projects. Let's have a meeting to discuss more? https://base64.ai/meeting
It's not really working. Tried 2 English PDF invoices. Normal format. One came back empty, the other only had the amount right.
I'm assuming they only trained on some specific documents (passport of country X, etc) and all others don't work.
If someone processes the same document all the time, then my invoice2data project may work better and is open source. It's based on Regx, rather than machine learning: https://github.com/invoice-x/invoice2data
> Base64.ai SOC 2 compliancecertifies our bank-level security standards. Our API does not store your data to prevent possible data breaches. All API traffic must be authenticated and encrypted over HTTPS.
Sounds... Good enough? I mean, for what it is, it sounds like it's at least trying.
@robarr that's a fair question to ask. Briefly, Base64.ai neither stores the images you sent, nor their extracted data. We provide the power and extensibility of the cloud without the risks of a data breach. Base64.ai complies with GDPR requirements too. Our SOC-2 compliance report details the extend of security measures we take for your data. Happy to share the report for your review under MNDA.
Does your solution have any unique features or benefits in comparison to existing solutions like Acuant, MicroBlink or Regula? Those already classify various documents and extract the data pretty well.
They are good too, but we have products and services that match their offerings at a fraction of their cost.
We also offer products that they don't provide. Our AI is capable of analyzing sound data (speech to text). It is extensible to add your custom forms and document types. We provide a cloud API and RPA components for UiPath, Bardeen and other RPA providers. We built Base64.ai so that you won't need a new vendor for new document types and platforms.
Maybe the plan is to compete on latency? Say someone wants to regularly extract content from the same kind of document, and wants it fast, like 500 milliseconds.
We have startup plans that start free and runs at 10 cents/page after volume discounts. We also offer prices in local currencies. Happy to work on a deal that works for you.
We are a pure AI company, i.e. there is no human-in-the-loop. We are and strive to be more accurate than manual labor, and our processing time is 1 second rather than minutes-to-hours. Also our AI is naturally unbiased and does not discriminate.
yes but we have to cover the costs as well :) we're flexible on pricing based on the volumes. what you see there differs based on the needs but we always aim to find a common price point for all parties.
Base64.ai has nailed time to value for customers. It’s pretty straightforward to integrate with and their extensive list of models makes it really easy to process a wide variety of document types. We used it Bardeen.ai and couldn’t have been happier. Kudos for a great service!
Base64.ai is a cloud API that can extract data, photos, and signatures from all types of documents. We have prebuilt models for IDs, driver licenses, passports, visas, invoices, and many more document types. The integration is only a single API call.
thanks for the feedback. as replied earlier, Base64.ai neither stores the images you sent, nor the extracted data. We provide the power and extensibility of the cloud without the risks of a data breach. Base64.ai complies with GDPR requirements too. Our SOC-2 compliance report details the extend of security measures we take for your data. Happy to share the report for your review under MNDA.
We've been working with Google Cloud on a very difficult data extraction problem for about 6 months now. Seeing very impressive results with their DocumentAI service. One of my teammates is planning to try this out on some of our data this afternoon though!
It says that it does not store the submitted data. If true, then it's essentially just a trained model that we get to invoke for a dollar per API call to get the output from.
This is accurate. We train every document models upfront and make it available via the API. We believe well-trained, high-quality models don't need retraining, just like humans don't need to re-learn reading every day.