c++ - Tesseract False Space Recognition -

- August 15, 2015

i'm using tesseract recognize serial number. works acceptable, common problem false recognition of 0 , "o", 6 , 5, or m , h exists. beside tesseract adds spaces recognized words, no space in image. following image recognized "hi 3h".

example image 1

this image results in " fbkhj 1r1"

example image 2

so tesseract added space, although there isn't space in image. there possibility parametrize spacing behavior of tesseract?

edit

i'm sorry, have forgot add, have serial numbers include spaces. cannot delete spaces inside recognized serial number.

for example following image containing space in serial number results after tesseract recognition into: j4 f1583bb. beside recognition of characters false, space recognized correct image.

example image 3

my actual parameters tesseract are:

tesseract::tessbaseapi tess; tess.init(null, "eng", tesseract::oem_tesseract_only); tess.setpagesegmode(tesseract::psm_single_block); tess.setvariable("tessedit_char_whitelist",             "abcdefghijklmnopqrstuvwxyz012345789");  char* out = tess.getutf8text(); string text = string(out);

edit

it notices existing answers, space between "j" , "i" example seems little more, between other characters. font-type have chosen monotype font. reason thought, helps tesseract character recognition. drawback of such monospace font-type, every character has same width, kernel (the space between characters) varies. see example image of following source source

proportional vs. monospace

which font type think, achieve better recognition results?

adjusting parameter tosp_min_sane_kn_sp may help. solved problem doing it.

if doesn't help, may try other tosp_* paramters, or working around space source code "tospace.cpp"

Search This Blog

Running

c++ - Tesseract False Space Recognition -

Comments

Post a Comment

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -

Why does a .NET 4.0 program produce a system.unauthorizedAccess error on a Windows Server 2012 machine with .NET 4.5 installed? -