KR-20260065515-A - METHOD AND SYSTEM FOR EVALUATING EXERCISE COMPLIANCE USING VISION LANGAUGE MODEL
Abstract
The present invention relates to a method and system for evaluating exercise performance using a vision language model. The method for evaluating exercise performance using a vision language model according to the present invention may include the steps of: requesting an electronic device to perform an exercise for a specific exercise movement; receiving a user’s exercise video related to the specific exercise movement from the electronic device; extracting user action information regarding the user’s exercise movement corresponding to the specific exercise movement from the exercise video; generating a prompt to perform an analysis of whether the user’s exercise movement satisfies a preset performance condition using the user action information and reference action information corresponding to the specific exercise movement; inputting the generated prompt and the exercise video into a pre-trained vision language model to obtain performance evaluation information regarding the user’s exercise movement from the pre-trained vision language model; and evaluating the user’s exercise performance rate based on the performance evaluation information.
Inventors
- 윤찬
- 정형진
Assignees
- 에버엑스 주식회사
Dates
- Publication Date
- 20260508
- Application Date
- 20251014
- Priority Date
- 20241030
Claims (12)
- A step of requesting an electronic device to perform a specific movement; A step of receiving a user’s exercise video related to the specific exercise movement from the electronic device; A step of extracting user motion information regarding the user’s exercise motion corresponding to the specific exercise motion from the exercise video; A step of generating a prompt to perform an analysis of whether the user's exercise motion satisfies a preset execution condition using the user action information and reference action information corresponding to the specific exercise motion; A step of inputting the generated prompt and the exercise video into a pre-trained vision language model to obtain performance evaluation information regarding the user's exercise movements from the pre-trained vision language model; and A method for evaluating exercise performance using a vision language model, characterized by including a step of evaluating the user's exercise performance rate based on the above-mentioned performance evaluation information.
- In Article 1, The specific exercise movements mentioned above include, According to the main movements of the specific exercise motion mentioned above, a target body part is defined and exists, and The above target body part is, A method for evaluating exercise performance using a vision language model, characterized by including at least one of a main target body part and an auxiliary target body part, depending on the degree of correlation with the above-mentioned main movement.
- In Article 2, The step of extracting the above user action information is, A step of identifying the target body part, which is predefined for the specific exercise movement, from the exercise video; and A method for evaluating exercise performance using a vision language model, characterized by including the step of analyzing the movement of the target body part in the exercise video and extracting user motion information for the specific exercise movement.
- In Paragraph 3, The step of extracting the above user action information is, A step of determining whether the main target body part is identifiable in the above exercise video; and A method for evaluating exercise performance using a vision language model, characterized by including a step of determining whether to extract user motion information for at least one of the main target body part and the auxiliary target body part according to the above judgment result.
- In Paragraph 4, The step of determining whether the above main target body part is identified is: If the main target body part is not identified in the exercise video, the method further includes the step of requesting the electronic device to include the main target body part in the exercise video. In the step of requesting that the above main target body part be included in the exercise video, A method for evaluating exercise performance using a vision language model, characterized by outputting a guidance message to an electronic device that guides the adjustment of at least one of the video recording position and posture so that the main target body part is included in the exercise video.
- In Article 2, The step of obtaining the performance evaluation information regarding the exercise movements of the above user is, A step of analyzing whether the performance condition set for the target body part is satisfied for a predetermined number of times for the specific exercise movement using the above-mentioned previously trained vision language model; and A method for evaluating exercise performance using a vision language model, characterized by including the step of obtaining performance evaluation information regarding the user's exercise movements from the previously learned vision language model based on the analysis result of the above satisfaction.
- In Article 6, The step of evaluating the above exercise performance rate is, A step of calculating the ratio of the number of executions satisfying the execution conditions set for the target body part among the aforementioned preset execution counts; and A method for evaluating exercise performance using a vision language model, characterized by including the step of evaluating the user’s exercise performance rate as at least one of a performance score and a percentile according to the ratio of the calculated number of times the exercise is performed.
- In Article 1, The method further includes the step of requesting the user to perform a specific exercise movement corresponding to the user's exercise movement again, based on the exercise performance rate calculated for the user's exercise movement. The step of requesting the re-performance of the specific exercise movement described above is, A step of providing a request message to the electronic device requesting the specific exercise movement to be performed again, based on the fact that the exercise performance rate does not satisfy a preset standard condition; A step of receiving user input for a request icon displayed in a portion of the request message from the electronic device; and A method for evaluating exercise performance using a vision language model, characterized by including the step of re-evaluating the exercise performance rate for the user’s exercise movement corresponding to the specific exercise movement based on the occurrence of an activation event for the request icon according to the user input.
- In Article 1, The method further includes the step of updating the exercise program set in the user account of the user based on the exercise performance rate calculated for the exercise movements of the user. The step of updating the above exercise program is, A step of generating feedback information related to the exercise movements of the above user; A step of generating a feedback prompt for updating the exercise program using the above feedback information; and A method for evaluating exercise performance using a vision language model, characterized by including the step of processing the above feedback prompt as input to the above-mentioned vision language model and updating the exercise program from the above-mentioned vision language model.
- In Article 9, The step of generating the above feedback information is, A step of receiving voice data corresponding to voice received through a microphone provided in the electronic device; A step of receiving user survey response data for at least one user survey provided to the electronic device; and A method for evaluating exercise performance using a vision language model, characterized by including the step of generating feedback information using at least one of the voice data and the survey response data.
- It includes a communication unit that receives a user’s exercise video related to a specific exercise movement from an electronic device, and a control unit that requests the electronic device to perform the exercise for the specific exercise movement. The above control unit is, Extract user motion information regarding the user's exercise motion corresponding to the specific exercise motion from the exercise video, and Using the above user motion information and reference motion information corresponding to the above specific motion, a prompt is generated to perform an analysis of whether the user's motion satisfies pre-set execution conditions, and The generated prompt and the exercise video are input into a pre-trained vision language model to obtain performance evaluation information regarding the user's exercise movements from the pre-trained vision language model, and An exercise performance rate evaluation system using a vision language model characterized by evaluating the user's exercise performance rate based on the above-mentioned performance evaluation information.
- A program that is executed by one or more processes in an electronic device and stored on a computer-readable recording medium, The above program is, A step of requesting an electronic device to perform an exercise for a specific movement; A step of receiving a user’s exercise video related to the specific exercise movement from the electronic device; A step of extracting user motion information regarding the user’s exercise motion corresponding to the specific exercise motion from the exercise video; A step of generating a prompt to perform an analysis of whether the user's exercise motion satisfies a preset execution condition using the user action information and reference action information corresponding to the specific exercise motion; A step of inputting the generated prompt and the exercise video into a pre-trained vision language model to obtain performance evaluation information regarding the user's exercise movements from the pre-trained vision language model; and A program stored on a computer-readable recording medium characterized by including instructions for performing a step of evaluating the exercise performance rate of the user based on the above performance evaluation information.
Description
Method and System for Evaluating Exercise Compliance Using Vision Language Model The present invention relates to a method and system for evaluating the exercise performance rate of a user's exercise movements using a vision language model. With the recent rapid advancement of artificial intelligence (AI) technology, Vision-Language Models (VLMs), capable of simultaneously understanding and processing visual information such as images and videos as well as human language, are garnering attention. In particular, by integratively analyzing natural language and visual information, VLMs are demonstrating advanced technological capabilities that go beyond existing text-based question-and-answer large-scale language models. They enable the understanding of actual user behavior or actions and the generation of customized responses. Such artificial intelligence technology is being utilized in medical industries, including healthcare, sports, fitness, exercise for daily living, and rehabilitation management, going beyond simple conversational question-and-answer. In particular, there is a growing demand for AI technology that provides personalized healthcare to users based on exercise videos captured through various electronic devices, such as smart devices, mobile cameras, and wearable sensors. Recently, the use of digital therapeutics, along with artificial intelligence technology, has been gaining attention in the medical industry. For instance, in the case of CPAP machines used to treat sleep apnea, insurance coverage applies only if the user consistently uses the device for a certain period of time; if usage falls below this standard, the patient is responsible for the cost. This system is designed to guarantee treatment effectiveness and compliance, with the patient's actual usage serving as a crucial criterion for determining eligibility for treatment support. Similarly, for rehabilitation exercises or healthcare services utilizing digital therapeutic devices, whether the user has actually performed the exercises can serve as a key indicator for evaluating treatment effectiveness. However, existing video-based verification methods have a drawback in that they can only determine whether a user has watched a video, making it difficult to verify whether the user has actually performed the exercise. For example, users may evade authentication through perfunctory viewing behavior, such as playing an exercise video without actually performing the workout. To address this issue, there is a need to utilize a vision language model to evaluate whether the user performs the exercise movements according to execution conditions based on the user's exercise video, and to calculate the user's exercise performance rate accordingly. FIG. 1 is a conceptual diagram illustrating a system for evaluating the performance rate of motor movements using a vision language model according to the present invention. FIG. 2 is a flowchart illustrating a method for evaluating exercise performance using a vision language model according to the present invention. FIG. 3 is a flowchart illustrating the process of collecting exercise images related to a user's exercise movements from an electronic device according to the present invention. FIGS. 4a and FIGS. 4b are conceptual diagrams for explaining the process of extracting user motion information from a motion video according to the present invention. FIGS. 5A and FIGS. 5B are conceptual diagrams for explaining the process of generating a prompt according to the present invention. FIG. 6 is a conceptual diagram illustrating the process of evaluating the exercise performance rate of a user's exercise movements using a vision language model according to the present invention. FIGS. 7a to 7c are conceptual diagrams for explaining an embodiment of requesting the re-execution of an exercise movement according to the exercise performance rate according to the present invention, and updating an exercise program set in a user account using user feedback information. FIG. 8 is a block diagram illustrating a computing system in which the present invention can be implemented. FIGS. 9 and FIGS. 10 are block diagrams illustrating an embodiment of a computing device according to the present invention. Hereinafter, embodiments disclosed in this specification will be described in detail with reference to the attached drawings. Identical or similar components are assigned the same reference number regardless of the drawing symbols, and redundant descriptions thereof will be omitted. The suffixes "module" and "part" used for components in the following description are assigned or used interchangeably solely for the ease of drafting the specification and do not have distinct meanings or roles in themselves. Furthermore, in describing the embodiments disclosed in this specification, if it is determined that a detailed description of related prior art could obscure the essence of the embodiments disclosed in this