Generative AI–Based Multilingual Multimodal Framework for Depression Detection

Authors

  • Uswa Ashraf Department of Computer Science, University of Southern Punjab, Pakistan
  • Hamid Ghous Department of Computer Science, University of Southern Punjab, Pakistan
  • Mubasher H. Malik Department of Computer Science, University of Southern Punjab, Pakistan
  • Majid Khawar Department of Computer Science, University of Southern Punjab, Pakistan

Keywords:

Depression Detection, Artificial Intelligence (AI), Generative AI, Multimodal Models, Multilingual Analysis

Abstract

Depression is a prevalent psychological condition that is not easy to detect at an early stage due to its multipolar nature in terms of clinical manifestations and subjective clinical diagnoses. Recent advancements in generative artificial intelligence models, deep learning, and machine learning techniques have made it feasible to digitally identify depression based on how the illness manifests itself in speech, facial expressions, text, and physiological and behavioral characteristics. The study examines and discusses the use of large language, multimodal, and unimodal models for digital depression identification in various multilingual contexts. The technology consistently outperforms unimodal systems, according to the results, with enhanced transformer and cross-attention architecture performance in cross-modal relationship capture, a crucial component of clinical decision support. Large language models have been shown to have potential applications in few-shot learning, multilingual analysis, transcript-based severity estimation, data generation for simulation, and transparent clinical decision support systems. The current review aims to provide a systematic overview of the current methodologies and identify the areas of future research; however, much work needs to be done to have such applications extensive in number, culture-independent, and reflecting the ordinal level severity.

Downloads

Published

2026-04-19