Paper #38

 

S. Marinai, E. Marino, G. Soda "Word retrieval in document images without OCR"

Keywords: digital libraries, document image analysis, artificial neural networks, string matching

 

We describe a method for efficient indexing and retrieval of words in collections of document images. During indexing, a self organizing map is trained to cluster similar symbols in a sub-set of the documents to be stored. By using the trained map the words in the collection can be stored and represented with a fixed-length description, that can be easily compared to score the words most similar to a user query. The system can be adapted to different languages and font styles.