c++ - Tesseract False Space Recognition -
i'm using tesseract recognize serial number. works acceptable, common problem false recognition of 0 , "o", 6 , 5, or m , h exists. beside tesseract adds spaces recognized words, no space in image. following image recognized "hi 3h".
this image results in " fbkhj 1r1"
so tesseract added space, although there isn't space in image. there possibility parametrize spacing behavior of tesseract?
edit
i'm sorry, have forgot add, have serial numbers include spaces. cannot delete spaces inside recognized serial number.
for example following image containing space in serial number results after tesseract recognition into: j4 f1583bb. beside recognition of characters false, space recognized correct image.
my actual parameters tesseract are:
tesseract::tessbaseapi tess; tess.init(null, "eng", tesseract::oem_tesseract_only); tess.setpagesegmode(tesseract::psm_single_block); tess.setvariable("tessedit_char_whitelist", "abcdefghijklmnopqrstuvwxyz012345789"); char* out = tess.getutf8text(); string text = string(out);
edit
it notices existing answers, space between "j" , "i" example seems little more, between other characters. font-type have chosen monotype font. reason thought, helps tesseract character recognition. drawback of such monospace font-type, every character has same width, kernel (the space between characters) varies. see example image of following source source
which font type think, achieve better recognition results?
adjusting parameter tosp_min_sane_kn_sp
may help. solved problem doing it.
if doesn't help, may try other tosp_*
paramters, or working around space source code "tospace.cpp"
Comments
Post a Comment