Hacker News new | past | comments | ask | show | jobs | submit login

Shouldn't it be easy to generate a lot of OCR data? Generate HTML, randomize, generate image, apply noise and let it train on it.





Yes, but if you aren't careful you will end up with a model carefully tuned for be ways that you add noise not all types of noise from the real world. But stuff like this can be very useful for some base training especially if you add many real-world examples afterwards.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: