KR-20260065495-A - METHOD AND SYSTEM FOR PROVIDING EXERCISE MOTION FEEDBACK USING VISION LANGAUGE MODEL

KR20260065495AKR 20260065495 AKR20260065495 AKR 20260065495AKR-20260065495-A

Abstract

The present invention relates to a method and system for providing exercise motion feedback using a vision language model. The method for providing exercise motion feedback using a vision language model according to the present invention may include the steps of: receiving an exercise video of a user from an electronic device; collecting user information related to the user; generating a prompt that enables analysis of the exercise video based on the user information using the user information and the exercise video; inputting the prompt and the exercise video into a pre-trained vision language model to perform an analysis of the user's exercise motion through the vision language model; obtaining feedback information regarding the user's exercise motion generated from the vision language model based on the analysis result of the user's exercise motion; and providing the feedback information to the electronic device.

Inventors

윤찬
이상희

Assignees

에버엑스 주식회사

Dates

Publication Date: 20260508
Application Date: 20250819
Priority Date: 20241030

Claims (12)

A step of receiving a user's exercise video from an electronic device; A step of collecting user information related to the above user; A step of generating a prompt that enables analysis of the exercise video based on the user information, using the user information and the exercise video; A step of inputting the above prompt and the above exercise video into a pre-trained vision language model, and performing an analysis of the user's exercise movements through the vision language model; A step of obtaining feedback information regarding the user's movement, generated from the above vision language model based on the analysis result of the user's movement; and A method for providing motor motion feedback using a vision language model, characterized by including the step of providing the above feedback information to the electronic device.
In paragraph 1, The step of generating the above prompt is, A step of evaluating the user's exercise expectation level based on the above user information; A method for providing exercise motion feedback using a vision language model, characterized by including the step of generating a first prompt that enables analysis of the user's exercise motion to be performed according to the exercise expectation level.
In paragraph 2, In the step of evaluating the exercise expectation level of the above user, A method for providing exercise motion feedback using a vision language model, characterized by evaluating the user's exercise expectation level for at least one of a plurality of body parts based on at least one of medical information and exercise history information included in the user information.
In paragraph 3, The step of generating the first prompt above is, A step of specifying at least one body part among the plurality of body parts that satisfies a preset condition based on the determined exercise expectation level of the user, for at least one of the plurality of body parts; Step of setting a weight for the specified at least one body part; and A method for providing exercise motion feedback using a vision language model, characterized by including the step of generating a first prompt that analyzes the user's exercise motion related to any one of the plurality of body parts based on set weights.
In paragraph 4, The step of performing an analysis of the exercise movements of the above user is: A step of extracting key points corresponding to each of a plurality of pre-set joint points in a specific object corresponding to the user included in the motion video using the above-mentioned pre-trained vision language model; and A method for providing exercise motion feedback using a vision language model, characterized by including the step of analyzing the relative positional relationship between the key points and, based on the analysis of the positional relationship, performing an analysis on the user's exercise motion related to any one of the plurality of body parts.
In paragraph 5, The step of performing an analysis of the exercise movements of the above user is: A step of calculating a performance score for the user's exercise movements based on the analysis results of the above positional relationship; A step of comparing a performance score for a past exercise movement and a performance score for the user's exercise movement using an exercise history analysis result for a past exercise movement performed in the past in relation to the user's exercise movement included in the exercise history information; and A method for providing exercise motion feedback using a vision language model, characterized by including the step of generating a trend analysis result related to the user's exercise motion based on the above performance score.
In paragraph 6, The step of generating the above prompt is, The method further includes the step of generating a second prompt requesting that feedback information regarding the user's exercise movement be generated according to the analysis result regarding the exercise movement. In the step of generating the second prompt mentioned above, A method for providing motion feedback using a vision language model, characterized by generating the second prompt that generates the feedback information based on the analysis results including the trend analysis results.
In Paragraph 7, The above feedback information is, It includes text feedback information containing a feedback message regarding the above analysis result and voice feedback information corresponding to the text feedback information, and The above feedback message is, A method for providing exercise motion feedback using a vision language model, characterized by including at least one of a modification instruction for the exercise motion, a performance score for the exercise motion, comparison information with the exercise history analysis result, and alternative motion information related to the exercise motion, based on the analysis result.
In Paragraph 7, The step of calculating the above performance score is, A step of calculating the similarity between the relative positional relationship between the key points according to the exercise movements of the user and the positional relationship corresponding to the pre-set correct posture; A method for providing motor motion feedback using a vision language model, characterized by including a step of calculating a performance score based on the cumulative time in which the similarity satisfies a preset condition.
In paragraph 1, The method further includes the step of updating the exercise program performed by the user using the above vision language model, and The above update step is, A step of receiving voice data corresponding to voice received through a microphone provided in the electronic device; A step of receiving user survey response data for at least one user survey provided to the electronic device; A step of generating user response information using at least one of the voice data and the survey response data; A step of generating a third prompt to update the exercise program assigned to the user account using the above user response information; and A method for providing exercise motion feedback using a vision language model, characterized by including the step of inputting the third prompt into the vision language model and updating the exercise program through the vision language model.
A communication unit that receives a user's exercise video from an electronic device; and It includes a control unit that collects user information related to the above user, and The above control unit is, Using the above user information and the above exercise video, a prompt is generated to enable analysis of the exercise video based on the above user information, and The above prompt and the above movement video are input into a pre-trained vision language model, and through the vision language model, an analysis of the user's movement is performed. From the above vision language model, feedback information regarding the user's movement, generated based on the analysis result of the user's movement, is obtained, and A motion feedback provision system using a vision language model characterized by providing the above feedback information to the electronic device.
A program that is executed by one or more processes in an electronic device and stored on a computer-readable recording medium, The above program is, A step of receiving a user's exercise video from an electronic device; A step of collecting user information related to the above user; A step of generating a prompt that enables analysis of the exercise video based on the user information, using the user information and the exercise video; A step of inputting the above prompt and the above exercise video into a pre-trained vision language model, and performing an analysis of the user's exercise movements through the vision language model; A step of obtaining feedback information regarding the user's movement, generated from the above vision language model based on the analysis result of the user's movement; and A program stored on a computer-readable recording medium characterized by including instructions for performing the step of providing the above feedback information to the electronic device.

Description

Method and System for Providing Exercise Motion Feedback Using Vision Language Model The present invention relates to a method and system for providing feedback on a user's movement using a vision language model. With the recent rapid advancement of artificial intelligence (AI) technology, generative AI based on Vision-Language Models (VLMs), capable of simultaneously understanding and processing visual information such as images and videos as well as human language, is garnering attention. In particular, by integratively analyzing natural language and visual information, Vision-Language Models are demonstrating advanced technological capabilities that go beyond existing text-based question-and-answer large-scale language models. They are capable of understanding users' actual behaviors or actions and generating customized responses. Beyond simple interactive applications, these technologies are being utilized in healthcare fields such as sports, fitness, daily exercise, and rehabilitation management. In particular, there is a growing demand for technologies that provide personalized healthcare to users from exercise videos captured through various electronic devices, such as smart devices, mobile cameras, and wearable sensors. For example, vision language models can be effectively utilized for the management and rehabilitation of musculoskeletal disorders. Musculoskeletal disorders refer to pain or injury occurring in the musculoskeletal system, including muscles, nerves, tendons, ligaments, bones, and surrounding tissues. As a principle, the treatment of musculoskeletal disorders should begin with less invasive procedures; non-pharmacological conservative treatments (e.g., exercise therapy and education, cognitive therapy, or relaxation therapy) should be implemented first, followed by pharmacological treatment and surgical treatment in sequence. Treatment guidelines strongly recommend non-pharmacological conservative treatment for musculoskeletal disorders, and active research on methods for implementing such treatments is being conducted, primarily in the United States and Europe. However, since continuous treatment and rehabilitation are crucial for non-pharmacological conservative treatment, the need for patients to visit the hospital frequently poses a significant burden. To address these issues, there is a need to provide a personalized feedback service on users' exercise movements remotely using a vision language model. FIG. 1 is a conceptual diagram illustrating a motion motion feedback provision system using a vision language model according to the present invention. FIG. 2 is a flowchart illustrating a method for providing motion feedback using a vision language model according to the present invention. FIG. 3a is a flowchart illustrating the process of generating a first prompt that enables analysis of an exercise video using user information and an exercise video according to the present invention. FIG. 3b is a flowchart illustrating the process of generating different first prompts according to user types according to the present invention. FIGS. 4a to 4c are conceptual diagrams for explaining the process of analyzing a user's exercise movements according to the present invention. FIGS. 5 and 6 are conceptual diagrams for explaining the process of generating feedback information according to the present invention and providing it to an electronic device. FIGS. 7a and FIGS. 7b are conceptual diagrams illustrating the process of updating an exercise program based on a user's response according to the present invention. FIG. 8 is a block diagram illustrating a computing system in which the present invention can be implemented. FIGS. 9 and FIGS. 10 are block diagrams illustrating an embodiment of a computing device according to the present invention. The present invention relates to a method and system for providing exercise motion feedback using a vision language model. More specifically, the present invention relates to a method and system for providing customized feedback information regarding a user's exercise motion using a vision language model capable of natural language understanding and image analysis. The vision language model according to the present invention may refer to an intelligent system capable of autonomously performing specific tasks by understanding and processing visual information and linguistic information in combination without human intervention, based on a generative artificial intelligence model. Specifically, the vision language model is a multimodal generative model capable of processing visual inputs such as images and videos together with linguistic inputs such as text and voice, and can analyze video information related to a user's movement and generate natural language-based feedback information based on the results to provide to an electronic device. The “feedback information” according to the present invention may refer to various information provided to an el