Arabic Handwritten Text Recognition using Advanced CNN-RNN Architecture


Authors

Mohamed Elsayed (Egypt Japan university of science and technology)*; Ahmed Ali (Egypt Japan University of Science and Technology); Mohamed Abdeen (Egypt Japan university of science and technology); Abdelrhman Wahdan (Egypt Japan university of science and technology); Walid Gomaa (Egypt-Japan university of science & technology)
mohamed.mohammed@ejust.edu.eg*; ahmed.ali@ejust.edu.eg; mohamed.abdeen@ejust.edu.eg; abdelrhman.wahdan@ejust.edu.eg; walid.gomaa@ejust.edu.eg

Abstract

Optical character recognition (OCR) has attracted the interest of researchers in past years. Nevertheless, the development of learning architectures is lagging in Arabic handwritten text compared to other languages. This is due to the cursive nature of Arabic letters and the complex shapes each letter can have. In this paper, we evaluate the performance of CNN-RNN-based models for Arabic handwritten text recognition exploring variations of EfficientNet involving BiLSTM with and without attention mechanism. We propose using EfficientNetB3, which extracts features of input images, with three BiLSTMs for processing sequential features while improving the architecture performance by utilizing two attention heads. This allows the model to simultaneously focus on different parts of the input sequence. Lastly, the Connectionist Temporal Classification (CTC) layer is used to align the input sequence with the output sequence. The model performance, which is efficient compared to other CNN-RNN-based and transformer-based models, is tested on KHATT and AHAWP achieving cutting-edge results of character error rates of 17.26% and 0.28% respectively. To investigate its generalization ability, we ran the model on both datasets combined achieving a CER of 16.8% on the KHATT test set while 0.2% on that of AHAWP.