|
Paper
#38 |
|
|
|
S.
Marinai, E. Marino, G. Soda "Word
retrieval in document images without OCR"
|
|
Keywords:
digital libraries, document image analysis,
artificial neural networks, string matching
|
|
|
|
We describe a method for efficient indexing and retrieval of words in collections of document
images. During indexing, a self organizing map is trained to cluster similar symbols in a sub-set of the documents to be stored. By using the trained map the
words in the collection can be stored and represented with a fixed-length description, that can be easily compared to score the
words most similar to a user query. The system can be adapted to different languages and font styles.
|
|