Development of a model for mining unstructured software repositories

Chabi, Clavers Chaffa Raoul (2019)

i-xvi, 1-124pg

Thesis

This study collected and analysed data from software repositories; formulated a computational model; designed, implemented and evaluated the performance. These were with the view to handling the complex nature of data in the repositories that often make software developers waste time in locating useful data during software development project. The research employed quantitative research approach. PyDriller was used to collect the data from GitHub; precisely, one thousand (1000) projects were extracted from GitHub. Natural language processing and BigQuery. A computational model was formulated using Artificial Neural Network (ANN). The model was designed using visual paradigm Unified Model Language (UML) tool, and implemented using python programming language. The developed model was evaluated using recall, precision and execution time as parameters. The result showed recommended list of requirements documents upon programmers’ requests. With the row dataset, out of one thousand (1,000) repositories gotten from GitHub, more than seven hundred (700) repositories were well structured. The system had 75% of performance in terms of structuring data in repository and 84% of performance in terms of data recommendation with two (02) seconds execution time. The results showed that with these, it helped programmers locate useful requirements documents more effectively than existing tools. The study concluded that the designed and implemented model, has structured the data in the repository and provided requirement document recommended list to users.