Research Article | | Peer-Reviewed

Predicting Depression in Women Using Deep Learning Techniques

Received: 23 February 2026     Accepted: 3 March 2026     Published: 12 May 2026
Views:       Downloads:
Abstract

Depression is a significant global health issue with a notably higher prevalence in women. However, many predictive models using artificial intelligence (AI) overlook gender-specific symptom patterns, limiting their sensitivity and effectiveness for female populations. This study addresses this gap by developing and evaluating a multimodal, gender-specific deep learning framework designed to predict depression exclusively in women. Leveraging the female subset of the Distress Analysis Interview Corpus (DAIC-WOZ) dataset, the study utilizes a late-fusion architecture that integrates four distinct data streams: textual transcripts, acoustic features, visual facial cues, and tabular clinical data (PHQ-8 scores). The model employs specialized neural network branches for each modality- a Transformer (DistilBERT) for text, a Bidirectional LSTM (BiLSTM) for audio, a Temporal CNN for visual sequences, and a Multi-Layer Perceptron (MLP) for tabular data before concatenating their embeddings for a final prediction. The results demonstrate the superior performance of the multimodal approach, achieving an F1-score of 089 and an ROC-AUC of 0.92, significantly outperforming unimodal baselines. Ablation studies revealed that textual data was the most influential modality, with its removal causing a performance degradation of over 15% in the F1-score. Acoustic features were identified as the second most critical predictor, underscoring the importance of both linguistic content and vocal prosody.

Published in Science Futures (Volume 2, Issue 3)
DOI 10.11648/j.scif.20260203.11
Page(s) 189-195
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2026. Published by Science Publishing Group

Keywords

Depression, Gender Specific, Healthcare, Prediction Women, Machine Learning, Mental Health Disorder

1. Introduction
Depression is a complex mental health disorder characterized by persistent feelings of sadness, loss of interest, and impaired daily functioning. Depression has been proven to affects women at a higher rate compared to men, with several studies providing evidence for this gender disparity: The DEPRES Study, covering six European countries, found marked gender differences in the six-month prevalence rate for major depression, with women being more affected than men . This gender difference persisted across all age groups. Similarly, the Canadian Community Health Survey 1.2 reported a female to male ratio of major depressive disorder prevalence of 1.64:1 . AI-based approaches are increasingly being utilized to predict and manage depression, offering promising opportunities to address the growing global mental health needs . Artificial intelligence (AI) and deep Learning (DL) models can be developed to predict mood scores for depressed individuals using multimodal data, including Ecological Momentary Assessments, lifestyle data from wearables, and neurocognitive assessments. These models can achieve high predictive accuracy, with errors as low as 6% for some participants . AI technologies like machine learning and predictive analytics enable more precise, data-driven decision-making in healthcare, including mental health assessment and treatment planning . Research shows women are statistically more prone to depression than men due to hormonal, social, and psychological factors . Gender-specific risk factors for depression in women encompass biological factors such as, hormonal fluctuations associated with the reproductive lifecycle, such as during pregnancy, postpartum, and menopause . Genetic susceptibility and neurochemical differences also play a role . Psychological factors such as tendency towards ruminative coping strategies and increased sensitivity to relationship issues . Women are also more likely to experience anxiety and atypical depression symptoms, and social factors such as gender-specific societal roles, socioeconomic disparities, and higher rates of physical and sexual abuse .
World Health Organisation (WHO) highlights how important mental health is to overall wellbeing and acknowledges that it is a basic human right that ranges from extreme suffering to optimal well-being . Globally, anxiety and sadness are common, and suicide is a major cause of death, especially among young people. Furthermore, serious mental health issues frequently result in avoidable physical disorders that cause premature death. Global mental health systems nevertheless confront significant information, research, governance, resource, and treatment shortages and disparities in spite of these obstacles. This study addressed these problems by conducting a literature evaluation in accordance with Okoli's organized eight-step strategy , guaranteeing scientific rigor all along the way. A comprehensive psychiatric interview is necessary to diagnose mental health disorders; this interview typically covers the suspected symptoms, psychiatric history, and physical testing. Psychiatric symptoms can also be identified with the use of psychological examinations and evaluation instruments . Depression disproportionately affects women, with biological, psychological, and sociocultural factors contributing to its higher prevalence. Traditional diagnostic methods are often limited by subjectivity, delayed recognition, and underreporting. Machine learning (ML) presents a promising avenue for improving early detection, personalized risk assessment, and intervention strategies for depression in women.
Depression symptoms in women can be captured through various data types (e.g., survey scores, text, social media posts). Predictive analytics processes this data using CNN models that learn patterns associated with depressive states. These models help in detecting depression early, allowing for informed clinical decisions or interventions. The empirical framework evaluates prior studies that applied machine learning and deep learning techniques for predicting or managing mental health conditions. The studies vary in scope, population, methods, and application domains but share a common goal: improving mental health prediction and intervention using data-driven models. Predicting emergency department revisits among children aged 4–17 using Graph Neural Networks and Recurrent Neural Networks . It recorded an improved recall and prediction performance, although they lacked comprehensive demographic data like gender. Employing decision trees to assess emotional health in elementary school children using the SEHS-S questionnaire and K-Fold cross-validation . It was noted that there was a limited availability of mental health tools in local settings. Supervised learning model for early depression detection from textual data, however lacked related understanding by ignoring sentence and paragraph structures . A hybrid long short-term memory (LSTM) and a traditional Machine learning (ML) model to predict depression in older adults using longitudinal health data was effective, but faced the challenge of being limited to generalizability and backdated scope .
Investigating depression detection using social media data and Machine learning classifiers, ethical concerns around privacy and generalizability were raised . Recent studies have further diversified the empirical landscape of machine learning and deep learning applications in mental health prediction. explored speech emotion recognition by comparing convolutional neural networks (CNN) and random forest (RF) models on combined public emotion datasets. Although their RF model achieved a moderate accuracy of 69%, the research emphasized the untapped potential of robust audio features, yet left questions about real-world applicability unresolved . Integrating a hybrid SVM and Naïve Bayes classifier for early detection of depression from Twitter posts, the hybrid model improved performance over single models, but the limited sample size (111 users) raises concerns about its scalability to other platforms . Leveraging transfer learning using AlexNet for text and audio feature extraction, combined with a CNN-TL, IBi-LSTM, and attention mechanism. The multimodal architecture surpassed other deep learning baselines, demonstrating high accuracy. However, they called for more transparent interpretability, especially for clinical adoption. Similarly focusing on speech. Targeted depression detection among Bahasa Malaysia female speakers using an attention-based CNN-RNN hybrid (AttCRNN) achieved 91% accuracy, underscoring the effectiveness of combining different speech types but leaving feature interpretability open for further investigation .
This study focuses on the relationships between these factors, predictive analytics, and the application of deep learning (DL) primarily through Convolutional Neural Networks (CNNs) as the chosen methodological tool. The framework identifies the interplay between depression as a psychological and physiological condition, the gender-specific factors that influence its presentation and progression, and the technological methodologies employed to analyze and predict its occurrence. Predictive analytics helps identify individuals at risk of depression early enough for timely intervention . The aim is to shift mental healthcare from a reactive to aproactive model—detecting and addressing issues before they escalate. In depression research, especially when focused on women, predictive analytics is instrumental in recognizing subtle patterns that may precede clinically diagnosable episodes, which may include changes in sleep patterns, heart rate variability, shifts in mood reported via digital diaries, or even sentiment expressed on social media platforms. When these diverse data inputs are processed using predictive models, it becomes possible to issue early warnings, personalize care plans, and deploy interventions more efficiently.
2. Methods
Figure 1. Multimodal Depression Classification Model.
The development of a predictive analytics model for depression in women using deep learning uses Late-Fusion Multimodal Neural Network combining 4 branches of specialized models for each type of input. These models include Transformer for text, Recurrent Neural Network (LSTM) for audio, Convolutional Neural Network (1D CNN) for visual sequences, Dense layers for tabular features. In short, the study utilizes a multimodal deep neural network with Transformer + CNN + RNN + MLP fusion. This choice was motivated by the need to adapt flexibly to the evolving nature of the data and modeling process, particularly due to the complexity of combining multimodal inputs—images, textual survey responses, and structured clinical data. The methodology combined traditional CRISP-DM stages with a lightweight deep learning pipeline that facilitated rapid prototyping, performance benchmarking, and validation across multiple neural architectures. This hybrid approach allowed for both theoretical rigor and practical implementation.
2.1. Datasets Acquisition
Data acquisition is a critical step in building any data-driven predictive model, as the quality and structure of the input data determines the overall performance and reliability of the system. For this study, the chosen dataset is the DAIC-WOZ (Distress Analysis Interview Corpus – Wizard of Oz) dataset, which forms part of the larger Depression AVEC 2017 challenge corpus. This dataset is specifically designed for automated depression detection using multiple modalities, including text, audio, visual, and structured data. Its multimodal nature makes it suitable for training deep learning models capable of capturing complex behavioral and linguistic patterns associated with depression. The dataset consists of recorded interviews with participants, where responses were collected in text transcripts, acoustic feature representations, visual behavioral markers, and structured meta-information such as PHQ-8 depression scores. The following subsections outline the source of the dataset, the preprocessing and feature extraction techniques applied, and the data splitting strategy employed for training, validation, and testing of the predictive model.
2.2. Feature Extraction
Textual Data (XXX_TRANSCRIPT.csv): Transcripts were cleaned by removing timestamps and non-verbal filler tokens. All participant utterances were concatenated into a single text block and then tokenized using a DistilBERT tokenizer to create contextual word embeddings.
Audio Data (XXX_COVAREP.csv): Pre-extracted COVAREP audio features were used. To handle variable lengths, sequences were padded or truncated to a uniform length. Features were normalized to ensure consistent scaling across all participants.
Visual Data (XXX_CLNF_AUs.csv): Visual features were derived from OpenFace Action Unit (AU) outputs. To convert the time-series data into a fixed-size feature vector, summary statistics (mean, standard deviation, max, and 25th percentile) were calculated for each AU intensity column across all frames. This resulted in a fixed-size vector summarizing the participant's facial expressions over the interview.
Tabular Data (train_split_Depression_AVEC2017.csv): PHQ-8 scores and demographic data were normalized. Categorical features like gender were one-hot encoded. The dataset was then divided into 70% training, 15% validation, and 15% testing using stratified sampling to preserve label distribution. Speaker independence was enforced to avoid data leakage.
The model development process focused on designing and implementing a multimodal deep learning framework capable of predicting depression levels in women based on audio, visual, textual, and tabular inputs. The architecture was constructed using a late fusion strategy, where features from different modalities are processed independently through specialized branches and later combined at the fusion layer for classification. This design ensures that modality-specific representations are fully captured before integration.
The following hyperparameters were used during training, Learning Rate (2e-5 for DistilBERT, 1e-3 for other branches and fusion layers), Dropout Rate (0.3 for individual branches, 0.5 in the fusion layer), Batch Size of 16.
Table 1. Hyperparameter Summary.

Parameter

Value

Learning Rate

2e-5(Text), 1e-3

Droupout

0.3-0.5

Batch size

16

Optimizer

Adamw

Epochs

15

3. Results
Training was performed on Google Colab using. Model checkpoints were saved in Google Drive after each epoch. The system leveraged early stopping to avoid overfitting and used a stratified validation set for model selection. Logging was done using Matplotlib for training/validation loss and accuracy curves.
The implementation was carried out in Python 3.10 using PyTorch for deep learning model construction, Transformers (Hugging Face) for DistilBERT integration, scikit-learn for splitting and evaluation metrics, Pandas & NumPy for data handling, Matplotlib & Seaborn for visualization. The system was organized into directories /data/ (raw and processed), /models/ (checkpoints), /results/ (evaluation outputs).
The evaluation phase focused on determining the effectiveness of the multimodal depression prediction model and comparing its performance against baseline models and ablation experiments. We adopted a comprehensive evaluation strategy using both classification metrics and graphical analyses to ensure the reliability and robustness of the model. To evaluate the predictive capability of the system, we applied the trained model on the test set, which had been held out during training and validation to guarantee unbiased performance estimation. We measured the ability of the model to correctly classify participants as depressed or not depressed using several well-established metrics, including Accuracy, Precision, Recall, F1-score, and ROC-AUC (Receiver Operating Characteristic – Area Under the Curve). The choice of these metrics was motivated by the nature of the problem. Since depression prediction often involves an imbalanced dataset, where the number of non-depressed participants significantly exceeds that of depressed participants, accuracy alone does not provide a sufficient measure of performance. Therefore, we prioritized Precision, Recall, and F1-score, as they give deeper insights into the model’s behavior on minority classes. The ROC-AUC metric was also considered essential because it illustrates the trade-off between sensitivity and specificity across different classification thresholds.
Accuracy was calculated to determine the proportion of correct predictions out of the total predictions. Precision was used to assess how many participants predicted as depressed were truly depressed, while Recall quantified how many actual depressed participants were successfully identified. The F1-score, which is the harmonic mean of Precision and Recall, was emphasized as the most reliable metric due to the data imbalance. Lastly, ROC-AUC provided a threshold-independent evaluation, which is useful for understanding the classifier’s discriminative ability.
Figure 2. Confusion Matrix of the Multimodal Model on the Test Set.
To complement the numerical metrics, we generated graphical representations of the model’s performance. The Confusion Matrix, Figure 2 provided an immediate view of misclassification patterns, while the ROC Curve, Figure 3 illustrated the trade-off between the True Positive Rate and False Positive Rate at different thresholds. In addition, we plotted the Precision-Recall Curve, which is particularly informative for imbalanced datasets. The curve revealed that the multimodal system maintained high precision even as recall increased, indicating strong robustness against false positives.
Figure 3. ROC Curve for Multimodal Model and Precision-Recall Curve for the Multimodal Model.
The performance of the multimodal model with baseline models and ablation experiments wascompared to assess the contribution of each modality. Baseline models such as Logistic Regression on tabular data performed significantly worse, particularly in terms of recall and F1-score, confirming the need for deep multimodal fusion.
Ablation results revealed that removing textual features led to the largest drop in performance, followed by audio features. This indicates that language and vocal characteristics are the strongest predictors of depressive tendencies in this dataset, while visual and tabular cues provide supplementary but valuable information.
The evaluation results validate the effectiveness of the proposed multimodal approach. By leveraging multiple data sources, the model achieved superior discriminative performance compared to unimodal and baseline methods. This finding reinforces the hypothesis that combining textual, acoustic, visual, and demographic information leads to more accurate and robust depression detection systems. However, it is important to acknowledge potential limitations. While the ROC-AUC and F1-score were strong, the model still exhibited occasional misclassification of borderline cases, likely due to overlapping feature patterns between mild depressive symptoms and neutral states. This challenge highlights the need for future improvements, such as incorporating more context-aware temporal models and attention mechanisms.
The findings were analyzed both quantitatively and qualitatively to demonstrate the model’s performance, its advantages over unimodal and baseline approaches, and the implications of the outcomes for real-world applications.
The multimodal deep learning model successfully processed textual, audio, visual, and tabular inputs, combining them through a late fusion architecture. After training and evaluating the model on the test set, the following results were obtained:
Table 2. Performance Evaluation.

Evaluation Metric

Value

Accuracy

88.5%

F1-score

0.89

ROC-AUC

0.92

When we examined the results from ablation studies, the absence of the textual modality caused the most significant performance degradation, reducing the F1-score by over 15%. This finding emphasizes the critical role of linguistic patterns in identifying depressive symptoms. The audio modality ranked second in importance, reflecting how speech prosody and vocal tone provide strong emotional cues. Visual features contributed positively, although their impact was less pronounced compared to text and audio. Tabular features alone were insufficient for strong predictive performance but offered incremental benefits when combined with other modalities.
The primary objective of this study was to develop a deep learning–based system capable of predicting depression severity in women using a combination of text, audio, visual, and tabular data. The results confirm that this objective was successfully achieved. The multimodal architecture delivered superior performance compared to unimodal and baseline methods, validating the hypothesis that integrating multiple modalities provides richer feature representations and leads to improved classification accuracy.
Additionally, the implementation demonstrated the practicality of using publicly available datasets and open-source frameworks to build clinically relevant tools. Although the system is intended for research purposes and not as a diagnostic solution, the experimental outcomes suggest that such models could assist mental health professionals by providing early indications of depressive symptoms.
4. Conclusion
Multimodal deep learning approach is a highly effective method for predicting depression in women from behavioral data. The integration of diverse data streams allows the model to capture a holistic and nuanced picture of an individual's mental state, leading to more accurate and reliable predictions than is possible with unimodal approaches. This research makes several key contributions to the growing field of mental health informatics and predictive analytics. Primarily, it addresses a well-documented gap in the literature by presenting an end-to-end implementation of a predictive model for depression tailored specifically to women. By focusing on a single gender, this study provides a framework for creating more sensitive and specific models that account for known gender disparities in symptom presentation. Furthermore, the study provides valuable empirical evidence on the relative importance of different behavioral modalities in this context. The ablation studies conclusively demonstrate that linguistic content serves as the most powerful predictor, followed closely by vocal prosody, with visual and tabular data providing supplementary value. This finding helps prioritize feature engineering and data collection efforts for future research. Finally, from a methodological standpoint, this work offers a practical and effective late-fusion deep learning architecture (Transformer + BiLSTM + Temporal CNN + MLP) that is specifically tailored to the complex, multimodal nature of the DAIC-WOZ dataset, serving as a robust baseline for future studies using similar data.
While this study provides a solid foundation, its findings also illuminate several promising avenues for further investigation. A critical next step involves training and validating the model on larger and more demographically diverse datasets. This is essential to enhance its generalizability across different cultural backgrounds, languages, and socioeconomic contexts. Alongside this, exploring more sophisticated architectures, particularly those incorporating attention mechanisms, could not only improve predictive accuracy but also offer deeper, more interpretable insights into which specific features within each modality are most indicative of depression.
Beyond enhancing the model itself, future work should also focus on its application. Applying this framework to longitudinal data would be a significant advancement, potentially shifting the paradigm from static classification to the dynamic monitoring of mental health over time, which could be instrumental in predicting the onset of depressive episodes. Ultimately, the most crucial future direction is the pursuit of real-world clinical validation. This would involve a formal clinical trial to rigorously assess the model's utility, safety, and effectiveness in a live healthcare setting, which is the final and most important step in translating this research into a tool that can tangibly benefit patient care.
Abbreviations

WHO

World Health Organisation

BiLSTM

Bidirectional Long Short Term Memory

DAIC-WOZ

Distress Analysis Interview Corpus

CNN

Convulutional Neural Network

ML

Machine Learning

LSTM

Long Short-term Memory

RF

Random Forest

DP

Deep Learning

AU

Action Unit

MLP

Multilayer Perceptron

RNN

Recurrent Neural Network

CRISP-DM

Cross-industry Standard Process for Data Mining

Author Contributions
Chidi Ukamaka Betrand: Conceptualization, Software
Chinwe Gilean Onukwugha: Data curation, Investigation
Douglas Allswell Kelechi: Data curation, Methodology, Visualization
Mercy Eberechi Benson-Emenike: Formal analysis, Validation
Nneka Martina Oragba: Supervision, Writing – original draft
Toochi Chima Ewunonu: Project administration, Supervision
Conflicts of Interest
The authors declare no conflicts of interest.
References
[1] Angst, J., Mendlewicz, J., Pine, J.-P., Gastpar, M., Gamma, A., & Tylee, A. (2002). Gender differences in depression. Epidemiological findings from the European DEPRES I and II studies. European Archives of Psychiatry and Clinical Neuroscience, 252(5), 201–209.
[2] Romans, S. E., Tyas, J., Cohen, M. M., & Silverstone, T. (2007). Gender Differences in the Symptoms of Major Depressive Disorder. Journal of Nervous & Mental Disease, 195(11), 905–911.
[3] Chatterjee, S., Sundram, F., Mishra, J., & Roop, P. (2023). Towards Personalised Mood Prediction and Explanation for Depression from Biophysical Data. Sensors, 24(1), 164.
[4] Dakanalis, A., Riva, G., & Wiederhold, B. K. (2024). Artificial Intelligence: A Game-Changer for Mental Health Care. Cyberpsychology, Behavior, and Social Networking, 27(2), 100–104.
[5] Thomas, J. (2025). Artificial intelligence in nursing research: A narrative review of transforming clinical practice, enhancing patient outcomes, and shaping future care. Journal of Nursing Reports in Clinical Practice, 3(4), 368–374.
[6] Saggu, S., Daneshvar, H., Samavi, R., Pires, P., Sassi, R. B., Doyle, T. E., Zhao, J., Mauluddin, A., & Duncan, L. (2024). Prediction of emergency department revisits among child and youth mental health outpatients using deep learning techniques. BMC Medical Informatics and Decision Making, 24(1).
[7] Dennerstein, L., & Soares, C. N. (2008). The unique challenges of managing depression in mid‐life women. World Psychiatry, 7(3), 137–142.
[8] Grigoriadis, S., & Robinson, G. E. (2007). Gender Issues in Depression. Annals of Clinical Psychiatry, 19(4), 247–255.
[9] Accortt, E. E., Freeman, M. P., & Allen, J. J. B. (2008). Women and Major Depressive Disorder: Clinical Perspectives on Causal Pathways. Journal of Women’s Health, 17(10), 1583–1590.
[10] WHO strategic communications framework for effective communications. 2017. Available from:
[11] Okoli C. A guide to conducting a standalone systematic literature review. Comm Assoc Inform Syst 2015; 37: 43.
[12] Jencks, S. F. Recognition of mental distress and diagnosis of mental disorder in primary care. JAMA 1985, 253, 1903–1907.
[13] Liputo, S., & Tupamahu, F. (2024). Prediction of Mental Health of Elementary School (SD) Students using the Decision Tree Algorithm with K-Fold CV testing in Bone Bolango Regency, Gorontalo Province. Journal La Multiapp, 5(1).
[14] Burdisso, S. G., Errecalde, M., & Montes-y-Gómez, M. (2019). A Text Classification Framework for Simple and Effective Early Depression Detection Over Social Media Streams.
[15] Su, D., Zhang, X., He, K., & Chen, Y. (2021). Use of machine learning approach to predict depression in the elderly in China: A longitudinal study. Journal of Affective Disorders, 282, 289–298.
[16] Shah, F. M., Ahmed, F., Saha Joy, S. K., Ahmed, S., Sadek, S., Shil, R., & Kabir, M. H. (2020). Early Depression Detection from Social Network Using Deep Learning Techniques. 2020 IEEE Region 10 Symposium, TENSYMP 2020, 823–826.
[17] Katchapakirin, K., Wongpatikaseree, K., Yomaboot, P., & Kaewpitakkun, Y. (2018). EasyChair Preprint Facebook Social Media for Depression Detection in the Thai community Facebook Social Media for Depression Detection in the Thai Community.
[18] S, S., & S. Raj, J. (2021). Analysis of Deep Learning Techniques for Early Detection of Depression on Social Media Network - A Comparative Study. Journal of Trends in Computer Science and Smart Technology, 3(1), 24–39.
[19] Lilhore, U. K., Dalal, S., Varshney, N., Sharma, Y. K., Rao, K. B. V. B., Rao, V. V. R. M., Alroobaea, R., Simaiya, S., Margala, M., & Chakrabarti, P. (2024). Prevalence and risk factors analysis of postpartum depression at early stage using hybrid deep learning model. Scientific Reports, 14(1).
[20] Al-Ezzi Ahmed Ezzi, M., Nur Wahidah Nik Hashim, N., Ahmad Basri, N., & Fauziah Toha, S. (2021). Speech-Based Depression Detection for Bahasa Malaysia Female Speakers Using Deep Learning. 20(3), 1–6.
[21] Jamalirad, H., & Jajroudi, M. (2023). Prediction of Mental Health Support of Employee Perceiving by Using Machine Learning Methods. Studies in Health Technology and Informatics, 302.
Cite This Article
  • APA Style

    Betrand, C. U., Onukwugha, C. G., Allswell, D. K., Benson-Emenike, M. E., Oragba, N. M., et al. (2026). Predicting Depression in Women Using Deep Learning Techniques. Science Futures, 2(3), 189-195. https://doi.org/10.11648/j.scif.20260203.11

    Copy | Download

    ACS Style

    Betrand, C. U.; Onukwugha, C. G.; Allswell, D. K.; Benson-Emenike, M. E.; Oragba, N. M., et al. Predicting Depression in Women Using Deep Learning Techniques. Sci. Futures 2026, 2(3), 189-195. doi: 10.11648/j.scif.20260203.11

    Copy | Download

    AMA Style

    Betrand CU, Onukwugha CG, Allswell DK, Benson-Emenike ME, Oragba NM, et al. Predicting Depression in Women Using Deep Learning Techniques. Sci Futures. 2026;2(3):189-195. doi: 10.11648/j.scif.20260203.11

    Copy | Download

  • @article{10.11648/j.scif.20260203.11,
      author = {Chidi Ukamaka Betrand and Chinwe Gilean Onukwugha and Douglas Kelechi Allswell and Mercy Eberechi Benson-Emenike and Nneka Martina Oragba and Toochi Chima Ewunonu},
      title = {Predicting Depression in Women Using Deep Learning Techniques},
      journal = {Science Futures},
      volume = {2},
      number = {3},
      pages = {189-195},
      doi = {10.11648/j.scif.20260203.11},
      url = {https://doi.org/10.11648/j.scif.20260203.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.scif.20260203.11},
      abstract = {Depression is a significant global health issue with a notably higher prevalence in women. However, many predictive models using artificial intelligence (AI) overlook gender-specific symptom patterns, limiting their sensitivity and effectiveness for female populations. This study addresses this gap by developing and evaluating a multimodal, gender-specific deep learning framework designed to predict depression exclusively in women. Leveraging the female subset of the Distress Analysis Interview Corpus (DAIC-WOZ) dataset, the study utilizes a late-fusion architecture that integrates four distinct data streams: textual transcripts, acoustic features, visual facial cues, and tabular clinical data (PHQ-8 scores). The model employs specialized neural network branches for each modality- a Transformer (DistilBERT) for text, a Bidirectional LSTM (BiLSTM) for audio, a Temporal CNN for visual sequences, and a Multi-Layer Perceptron (MLP) for tabular data before concatenating their embeddings for a final prediction. The results demonstrate the superior performance of the multimodal approach, achieving an F1-score of 089 and an ROC-AUC of 0.92, significantly outperforming unimodal baselines. Ablation studies revealed that textual data was the most influential modality, with its removal causing a performance degradation of over 15% in the F1-score. Acoustic features were identified as the second most critical predictor, underscoring the importance of both linguistic content and vocal prosody.},
     year = {2026}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Predicting Depression in Women Using Deep Learning Techniques
    AU  - Chidi Ukamaka Betrand
    AU  - Chinwe Gilean Onukwugha
    AU  - Douglas Kelechi Allswell
    AU  - Mercy Eberechi Benson-Emenike
    AU  - Nneka Martina Oragba
    AU  - Toochi Chima Ewunonu
    Y1  - 2026/05/12
    PY  - 2026
    N1  - https://doi.org/10.11648/j.scif.20260203.11
    DO  - 10.11648/j.scif.20260203.11
    T2  - Science Futures
    JF  - Science Futures
    JO  - Science Futures
    SP  - 189
    EP  - 195
    PB  - Science Publishing Group
    SN  - 3070-6289
    UR  - https://doi.org/10.11648/j.scif.20260203.11
    AB  - Depression is a significant global health issue with a notably higher prevalence in women. However, many predictive models using artificial intelligence (AI) overlook gender-specific symptom patterns, limiting their sensitivity and effectiveness for female populations. This study addresses this gap by developing and evaluating a multimodal, gender-specific deep learning framework designed to predict depression exclusively in women. Leveraging the female subset of the Distress Analysis Interview Corpus (DAIC-WOZ) dataset, the study utilizes a late-fusion architecture that integrates four distinct data streams: textual transcripts, acoustic features, visual facial cues, and tabular clinical data (PHQ-8 scores). The model employs specialized neural network branches for each modality- a Transformer (DistilBERT) for text, a Bidirectional LSTM (BiLSTM) for audio, a Temporal CNN for visual sequences, and a Multi-Layer Perceptron (MLP) for tabular data before concatenating their embeddings for a final prediction. The results demonstrate the superior performance of the multimodal approach, achieving an F1-score of 089 and an ROC-AUC of 0.92, significantly outperforming unimodal baselines. Ablation studies revealed that textual data was the most influential modality, with its removal causing a performance degradation of over 15% in the F1-score. Acoustic features were identified as the second most critical predictor, underscoring the importance of both linguistic content and vocal prosody.
    VL  - 2
    IS  - 3
    ER  - 

    Copy | Download

Author Information