The first row of output from the Neural Net is a special "no character" output which effectively gives you the character segmentation. You can distinguish "aa" from "a" because the former shows up as "(no)a(no)a(no)" whereas the latter is "(no)a(no)". You can read more about this in the Ocropus paper: http://www.helsinki.fi/~mpsilfve/ocr_course/materials/2008-b...