Search

KR-20260065474-A - Method and apparatus for multimodal artificial intelligence-based depression detection and mental health analysis

KR20260065474AKR 20260065474 AKR20260065474 AKR 20260065474AKR-20260065474-A

Abstract

A method and apparatus for diagnosing depression and analyzing mental health based on multimodal artificial intelligence are provided. An apparatus for diagnosing depression and analyzing mental health according to one embodiment of the present invention comprises: a multimodal data collection unit that collects multimodal data including a user's voice data, text data recognized from the user's speech, and heart rate data; a depression state diagnosis unit that outputs depression state data indicating the user's depression state based on the collected multimodal data; a medical data and biometric information linkage unit that collects physical health indicators by linking the user's biometric information; a mental health analysis unit that generates mental health analysis results from the multimodal data, the depression state data, and the physical health indicators using a large-scale language model; and an analysis result generation unit that visually provides the mental health analysis results to the user.

Inventors

  • 이아람
  • 문세환
  • 전은경
  • 김정은

Assignees

  • 한국전자통신연구원

Dates

Publication Date
20260508
Application Date
20250415
Priority Date
20241031

Claims (20)

  1. A multimodal data collection unit that collects multimodal data including user's voice data, text data recognized from user's speech, and heart rate data; A depression state determination unit that outputs depression state data indicating the user's depression state using an artificial intelligence model based on collected multimodal data; A medical data and biometric information linkage unit that collects physical health indicators by linking biometric information including user's medical data, blood pressure, body composition measurements, and arrhythmia measurement results; A mental health analysis unit that generates mental health analysis results from the multimodal data, the depression state data, and the physical health indicators using a large-scale language model; and An analysis result generation unit that visually provides the mental health analysis results generated by the above-mentioned mental health analysis unit to the user; A depression diagnosis and mental health analysis device equipped with
  2. In paragraph 1, the multimodal data collection unit is, A voice feature extraction unit for extracting voice feature data from voice data, and a heart rate feature extraction unit for extracting heart rate feature data from heart rate data, Depression diagnosis and mental health analysis device.
  3. In paragraph 2, The above heart rate characteristic data includes heart rate variability, Depression diagnosis and mental health analysis device.
  4. In paragraph 2, the above-mentioned depression determination unit is, A first neural network for processing the above voice feature data; A second neural network for processing the above text data; A third neural network for processing the above heart rate characteristic data; and A multimodal processing unit that calculates depression state data using output values from the first to third neural networks; A depression diagnosis and mental health analysis device equipped with
  5. In paragraph 2, the above-mentioned depression state determination unit outputs a number indicating the severity of depression as depression state data, a depression determination and mental health analysis device.
  6. In paragraph 1, The above-mentioned mental health analysis unit generates prompts using prompt optimization and inputs the generated prompts into a large-scale language model to generate personalized health analysis results, Depression diagnosis and mental health analysis device.
  7. In paragraph 1, the analysis result generating unit is, Comprehensive analysis of two or more data points through few-shot Chain-of-Thought (CoT)-based prompt optimization, identifying key information from the analysis results and generating it in HTML format, Depression diagnosis and mental health analysis device.
  8. In any one of paragraphs 1 through 7, The text collected by the above-mentioned multimodal data collection unit includes text written by the user on social network services or questionnaires, Depression diagnosis and mental health analysis device.
  9. In any one of paragraphs 1 through 7, The data collected by the above multimodal data collection unit includes activity data, and The above medical data and biometric information linkage unit outputs a visualized image after circadian fitting the activity data, Depression diagnosis and mental health analysis device.
  10. A multimodal data collection unit that collects multimodal data including user's voice data, text data recognized from user's speech, and heart rate data; A medical data and biometric information linkage unit that collects physical health indicators by linking biometric information including user's medical data, blood pressure, body composition measurements, and arrhythmia measurement results; A mental health analysis unit that generates mental health analysis results from the multimodal data and physical health indicators using a large-scale language model; and An analysis result generation unit that visually provides the mental health analysis results generated by the above-mentioned mental health analysis unit to the user; A mental health analysis device equipped with
  11. In item 10, the above multimodal data collection unit is, A voice feature extraction unit for extracting voice feature data from voice data, and a heart rate feature extraction unit for extracting heart rate feature data from heart rate data, Mental health analysis device.
  12. In Paragraph 10, The above-mentioned mental health analysis unit generates prompts using prompt optimization and inputs the generated prompts into a large-scale language model to generate personalized health analysis results, Mental health analysis device.
  13. In Clause 10, the above analysis result generating unit, Comprehensive analysis of two or more data points through few-shot CoT-based prompt optimization, identifying key information from analysis results and generating it in HTML format, Mental health analysis device.
  14. In any one of paragraphs 10 through 13, The text collected by the above-mentioned multimodal data collection unit includes text written by the user on social network services or questionnaires, Mental health analysis device.
  15. In any one of paragraphs 10 through 13, The data collected by the above multimodal data collection unit includes activity data, and The above medical data and biometric information linkage unit outputs a visualized image after circadian fitting the activity data, Mental health analysis device.
  16. A multimodal data collection step for collecting multimodal data including user's voice data, text data recognized from user's speech, and heart rate data; A depression state determination step that outputs depression state data indicating the user's depression state using an artificial intelligence model based on collected multimodal data; A medical data and biometric information linkage step for collecting physical health indicators by linking biometric information including the user's medical data, blood pressure, body composition measurements, and arrhythmia measurement results; A mental health analysis step that generates mental health analysis results from the multimodal data, the depression state data, and the physical health indicators using a large-scale language model; and Analysis result generation step for visually providing the mental health analysis results generated by the above-mentioned mental health analysis unit to the user; A method for diagnosing depression and analyzing mental health equipped with
  17. In Clause 16, the multimodal data collection step is, A voice feature extraction step for extracting voice feature data from voice data, and a heart rate feature extraction step for extracting heart rate feature data from heart rate data, comprising Methods for Depression Diagnosis and Mental Health Analysis.
  18. In Clause 17, the above-mentioned depressive state determination step is, A step of processing the voice feature data using a first neural network; A step of processing the text data using a second neural network; A step of processing the heart rate characteristic data using a third neural network; and A multimodal processing step for calculating depression state data using output values from the first to third neural networks; A method for diagnosing depression and analyzing mental health, equipped with
  19. In Clause 16, the above-mentioned mental health analysis step is, Steps for generating a prompt using prompt optimization, and Step of inputting the generated prompt into a large-scale language model to generate personalized health analysis results A method for diagnosing depression and analyzing mental health, including
  20. In Clause 16, the above analysis result generation step is, A step of comprehensively analyzing two or more data points through prompt optimization based on few-shot CoT (Chain-of-Thought), and Step of identifying key information from analysis results and generating it in HTML format A method for diagnosing depression and analyzing mental health, including

Description

Method and apparatus for multimodal artificial intelligence-based depression detection and mental health analysis The present invention relates to a method and apparatus for diagnosing depression and analyzing mental health based on multimodal artificial intelligence. Conventional mental health monitoring technologies have limitations in performing comprehensive mental health analysis because they primarily rely on single-modality data, such as text or voice, to identify emotional states. Furthermore, conventional methods struggle to accurately detect emotional changes or states in real time because they depend on user self-reports. Meanwhile, there has recently been an increasing number of attempts to comprehensively evaluate emotional and physical health status by analyzing various biometric data, such as heart rate variability (HRV) and activity levels. There is a need for a system that efficiently integrates such multimodal data to accurately analyze an individual's health status and automatically generate analysis results. FIG. 1 is a block diagram showing the configuration of a multimodal artificial intelligence-based depression diagnosis and mental health analysis device according to one embodiment of the present invention. Figure 2 shows several examples of feature data extracted from voice data. Figure 3 is a graph showing the time difference between consecutive heartbeats used to calculate heart rate variability. Figure 4 shows an example of calculating depression state data in a depression state determination unit using data collected from a multimodal data collection unit. Figure 5 shows an example of data input into the mental health analysis unit. Figure 6 shows an example of linking medical data and biometric information in the medical data and biometric information linkage unit. Figure 7 shows an example of a prompt generated by the mental health analysis unit. Figure 8 shows an example of an analysis result output from an analysis result generation unit. The aforementioned objectives of the present invention, as well as other objectives, advantages, and features, and the methods for achieving them, will become clear from the embodiments described in detail below together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below but can be implemented in various different forms, and the following embodiments are provided merely to easily inform those skilled in the art of the purpose, structure, and effects of the invention, and the scope of the rights of the present invention is defined by the description in the claims. Meanwhile, the terms used in this specification are for describing the embodiments and are not intended to limit the invention. In this specification, the singular form includes the plural form unless specifically stated otherwise in the text. As used in this specification, "comprises" and/or "comprising" do not exclude the presence or addition of one or more other components, steps, actions, and/or elements to the mentioned components, steps, actions, and/or elements. FIG. 1 is a block diagram showing the configuration of a multimodal artificial intelligence-based depression detection and mental health analysis device according to an embodiment of the present invention. The multimodal artificial intelligence-based depression detection and mental health analysis device of FIG. 1 comprises a multimodal data collection unit (110), a depression state detection unit (120), a medical data and biometric information linkage unit (130), a prompt engineering and large-scale language model (LLM)-based mental health analysis unit (140), and an analysis result generation unit (150). The multimodal data collection unit (110) collects the user's voice data, text recognized from the user's speech, heart rate (ECG), activity data, etc. In one embodiment, voice and text data are used to analyze emotional information including memories of happiness and unhappiness. In one embodiment, text data may include posts uploaded by the user to social network services (SNS), etc., or text written on a questionnaire. The multimodal data collection unit (110) includes a voice feature extraction unit (111) for extracting voice feature data from voice data as shown in FIG. 4, and a heart rate feature extraction unit (112) for extracting heart rate feature data from heart rate data. Figure 2 shows several examples of feature data extracted from voice data. In the case of voice, as shown in the example in Figure 2, features ranging from a few dozen to a few thousand are extracted from recorded digital voice data. Representative features include voice tone, tremor, and intensity by frequency band. The trends of the above features according to emotional state are learned through machine learning. For more detailed frequency analysis, the frequency spectrum can be visualized and then configured to be learned through a Convolutional Neural Network (CNN). In