Development of a validated dataset and a framework to mitigate bias in facial image processing
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Computer Science And Engineering, Faculty of Technology, Obafemi Awolowo University, Ile-Ife.
Abstract
This study demonstrated the levels of bias in facial image processing arising from a dataset, built a facial image dataset representing the biased population, and formulated an expression and gender recognition model to validate the dataset. It also described a framework showing the needed representation of certain demographic groups to mitigate bias in facial image processing. The performance of the dataset and model were also evaluated. These were with a view to developing a validated dataset and a framework to mitigate bias in facial image processing. A comprehensive review of 40 publicly accessible facial image datasets was conducted. To visualize the racial distribution of the datasets, t-distributed Stochastic Neighbor Embedding (t-SNE) was employed. Oriented FAST and Rotated BRIEF (ORB) were utilized for feature extraction, followed by K-means clustering to group racial features and Principal Component Analysis (PCA) to assess the geo-diversity and bias levels of the datasets. A 64MPX Camera was used to capture facial images in a controlled environment while questionnaires were used to gather the ground truths. A standard labeling convention was employed in labeling the dataset such that each participant was assigned a unique identifier: a string of ten characters as 0001DMY30C. Expression and Gender recognition models were developed using a Convolutional Neural Network Architecture in conjunction with a transfer learning technique. The UTK (University of Tennessee, Knoxville) dataset was used to train machine learning models to establish a framework to mitigate dataset bias. The model was evaluated based on accuracy, precision, and sensitivity metrics, while fairness metrics, such as demographic parity and equalized odds, were used to assess and quantify biases in the framework. From the result obtained, the PCA and k-means algorithms successfully identified the degree of bias in facial image datasets used in the analysis. The PCA also gave a visual representation of the bias levels in the form of scattered plots and bi-plots, where the facial image datasets were distinguished by their bias levels. A total of 3500 facial expression images were collected and used to develop a gender and expression recognition model. The gender recognition model presented an accuracy, precision, and sensitivity of 94%, 94%, and 94%, respectively, while the expression recognition model showed an accuracy, precision, and sensitivity of 96%, 90%, and 90%. The accuracy evaluation performance matrix for each ethnicity: Black, White, Latino, Asian, Indian, and Others is 98%, 92%, 88%, 89%, 88%, and 84%, respectively. The study concluded that the developed validated dataset and the framework were adequate and could be used to mitigate dataset bias in facial image processing. The framework effectively utilized a class weight formula to combat bias.
Description
xix, 212p.
Keywords
Citation
Amarachi, M. U. (2025). Development of a validated dataset and a framework to mitigate bias in facial image processing. Department of Computer Science And Engineering, Faculty of Technology, Obafemi Awolowo University, Ile-Ife.