Development of an automatic extraction model for Yoruba text.
 No Thumbnail Available 
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Computer Science, Faculty of Technology, Obafemi Awolowo University.
Abstract
The research collected Yoruba textual data and annotated them. It formulated a machine learning model for Yoruba text and implemented the model. It also evaluated the implemented model. These were with a view to developing a machine learning model for automatic event extraction for Yoruba text.
This research employed a multi-faceted approach to achieve its objectives: data were collected through manual methods, including the conversion of Yoruba folktales from print to digital format via typing and subsequent data cleansing. The preprocessing of data was conducted using the Python programming language. A machine learning model, comprising Bidirectional Long Short-Term Memory (Bi-LSTM) Network and Convolutional Neural Network (CNN) architectures was formulated with fine-tuning of hyper parameters tailored for Yoruba text. The model was implemented using Python, and its evaluation was based on the analysis of over 100 unique Yoruba folktale sentences using standard metrics, including accuracy, F-score, precision, and recall.
The results were highly promising, with the Bi-LSTM model for trigger and entity identification achieving an accuracy of 87.00%, precision of 91.72%, recall of 68.54%, and F1 score of 76.67%, while the CNN model for event type classification yielded an accuracy of 47.55%, precision of 52.07%, recall of 49.90%, and F1 score of 48.22%. These findings demonstrate the effectiveness of the developed model, especially the Bi-LSTM component, in capturing event triggers within Yoruba texts. This research not only advances the field of NLP but also contributes to the preservation of Yoruba language and culture, providing a well labelled dataset for event extraction benchmarking in Yoruba language. The study concluded the potential for applying advanced natural language processing (NLP) techniques to linguistically diverse languages and underscores the importance of linguistic diversity in the globalized world. It sets the stage for future research in event extraction from underrepresented languages, paving the way for broader applications in information retrieval, story generation, and cultural preservation.
Description
xiv, 138p
Keywords
Citation
Ademusire, A.J. (2023). Development of an automatic extraction model for Yoruba text. Department of Computer Science, Faculty of Technology, Obafemi Awolowo University.