KR-20260064832-A - METHOD FOR INFERRING COACHING FOR EXERCISE VIDEOS USING LARGE LANGUAGE MODEL

KR20260064832AKR 20260064832 AKR20260064832 AKR 20260064832AKR-20260064832-A

Abstract

The present disclosure relates to a method for inferring coaching for exercise videos using a large-scale language model. A method according to one embodiment of the present disclosure may include: extracting a skeleton image representing a user’s body for each frame of an exercise video capturing a user’s exercise movement; extracting one or more representative frames for a first posture of an exercise movement from the exercise video based on the extracted skeleton images; obtaining a first evaluation of the first posture using a Large Language Model (LM) for each of the one or more representative frames; obtaining a second evaluation of the first posture using an LLM for each of the one or more skeleton images corresponding to the one or more representative frames; obtaining a prompt by combining the first evaluation and the second evaluation for each of the one or more representative frames; obtaining a third evaluation of the first posture using the prompt and an LLM for each of the one or more representative frames; and obtaining a final evaluation of the first posture based on the third evaluation for each of the one or more representative frames.

Inventors

김원균

Assignees

주식회사 에이아이핏

Dates

Publication Date: 20260508
Application Date: 20241029

Claims (10)

A step of extracting a skeleton image representing the user's body for each frame of an exercise video capturing the user's exercise movements; A step of extracting one or more representative frames for a first posture of the exercise motion from the exercise video based on the extracted skeleton images; For each of the above one or more representative frames, a step of obtaining a first evaluation of the first posture using a Large Language Model (LM); For each of the one or more skeleton images corresponding to the one or more representative frames, a step of obtaining a secondary evaluation of the first pose using the LLM; For each of the above one or more representative frames, a step of obtaining a prompt by combining the above first evaluation and the above second evaluation; For each of the above one or more representative frames, a step of obtaining a third evaluation of the first posture using the prompt and the LLM; and A method comprising the step of obtaining a final evaluation of the first posture based on a third evaluation for each of the one or more representative frames.
In paragraph 1, A step of obtaining an evaluation of the exercise movement based on a final evaluation of the first posture; and A method further comprising the step of providing an evaluation of the above exercise movement to the user.
In paragraph 1, The step of extracting the above representative frame is: A step of extracting a reference skeleton image from a reference image representing the first posture; For each of the skeleton images extracted from the above motion video, a step of calculating the similarity with the reference skeleton image; and A method comprising the step of selecting one or more frames corresponding to one or more skeleton images having the highest similarity among the similarities calculated above as the one or more representative frames.
In paragraph 3, A method comprising the step of obtaining the above reference image from a database.
In paragraph 3, The step of calculating the similarity with the above reference skeleton image is, A step of calculating a first angle between a first joint and a second joint included in a first skeleton image extracted from the above motion video; A step of calculating a second angle between a third joint corresponding to the first joint of the reference skeleton image and a fourth joint corresponding to the second joint of the reference skeleton image; A step of calculating the difference between the first angle and the second angle; and A method comprising the step of calculating the similarity between the first skeleton image and the reference skeleton image based on the difference between the first angle and the second angle.
In paragraph 1, For each of the above one or more representative frames, the step of obtaining a third evaluation of the first posture using the prompt and the LLM is: For each of the above one or more representative frames, a step of calculating a confidence score for the third evaluation of each frame using the LLM; and A method comprising the step of selecting the third evaluation of a representative frame having the highest reliability score among the above-calculated reliability scores as the final evaluation for the above-calculated first posture.
In paragraph 1, A step of extracting one or more representative frames for a second posture of the exercise motion from the exercise video based on the extracted skeleton images; For each of one or more representative frames for the second posture, a step of obtaining a first evaluation of the second posture using LLM; For each of one or more skeleton images corresponding to one or more representative frames for the second posture, a step of obtaining a second evaluation of the second posture using the LLM; For each of the one or more representative frames for the second posture, a step of obtaining a prompt by combining the first evaluation of each representative frame and the second evaluation of each representative frame; For each of one or more representative frames for the second posture, a step of obtaining a third evaluation of the second posture using the prompt of each frame and the LLM; and A method further comprising the step of obtaining a final evaluation of the second posture based on a third evaluation of each of one or more representative frames of the second posture.
In Paragraph 7, A step of obtaining an evaluation of the exercise movement by combining the final evaluation of the first posture and the final evaluation of the second posture; and A method further comprising the step of providing an evaluation of the above exercise movement to the user.
In paragraph 1, For each of the above one or more representative frames, the step of obtaining a primary evaluation of the first posture using the LLM is: A step of inputting a first representative frame and a first question among the above one or more representative frames into the LLM to obtain a first answer; A step of obtaining a second answer by inputting the first representative frame and the second question into the LLM; and A method comprising the step of obtaining a first evaluation of the first pose of the first representative frame from the first answer and the second answer based on a predefined weight for the first question and a predefined weight for the second question.
In paragraph 1, The above LLM is a method learned using coaching data for the above exercise movements.

Description

Method for Inferring Coaching for Exercise Videos Using a Large Language Model The present disclosure relates to a method for inferring coaching for exercise videos using a large-scale language model, and more specifically, to a method that allows consumers to easily record their own exercise videos without the need for separate device installation or carrying, view them on a smart device, and verify the correctness of their exercise posture through AI and ICT-based services. Generally, those wishing to engage in weight training can utilize fitness centers equipped with various exercise machines. Previously, consumers faced difficulties when they wanted to check their exercise form and videos, as they had to rely on the assistance of others or go through the trouble of setting up a camera to record. This method is not only cumbersome but also presents the problem that it is difficult to verify accurate form due to inappropriate filming angles or positions. Furthermore, even when receiving assistance from others, situations may arise where it is difficult to receive accurate feedback because those individuals may lack professional knowledge regarding exercise. Furthermore, there are limitations in determining whether one's exercise form is correct. This is because it is difficult to check one's posture in real-time while exercising, and even if recorded videos are reviewed later, accurate analysis is challenging. Due to these issues, continuing to exercise with incorrect form increases the risk of injury and can reduce the effectiveness of the workout. Therefore, for more efficient and safer exercise, a method is needed to accurately check and correct one's posture. FIG. 1 illustrates an exemplary block diagram of a system for inferring coaching for exercise images using a large-scale language model according to one embodiment of the present disclosure. FIG. 2a illustrates an exemplary flowchart of a method for inferring coaching for exercise images using a large-scale language model according to one embodiment of the present disclosure. FIG. 2b illustrates an exemplary flowchart of a method for inferring coaching for exercise images using a large-scale language model according to one embodiment of the present disclosure. FIG. 3a exemplarily illustrates a UI (310) including a motion video of a user (10) captured by the camera device (120) being displayed on the display of the camera device (120). FIG. 3b exemplarily illustrates a UI (320) including means for identifying a user (10) being displayed on the display of a camera device (120). FIG. 3c exemplarily illustrates a UI (330) for uploading a motion video being displayed on the display of a camera device (120). FIG. 4a illustrates, in an exemplary manner, a UI (410) for providing a list of exercise videos on the display of a user terminal (130). FIG. 4b exemplarily illustrates a UI (420) for providing a list of exercise videos displayed on the display of a user terminal (130). Figure 5 illustrates an exemplary flowchart of a method for inferring coaching for exercise videos using a large-scale language model. FIG. 6 illustrates an exemplary image including a skeleton extracted from one frame of a motion video according to one embodiment of the present disclosure. FIG. 7 illustrates an exemplary block diagram of a server (110) according to one embodiment of the present disclosure. Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. The embodiments will be described clearly and in detail so that a person skilled in the art can easily practice the present disclosure. However, the scope of the rights is not limited or restricted by these embodiments. Identical or similar reference numerals are used for similar components in each drawing, and redundant descriptions of identical or similar components are omitted. The terms used in the following description have been selected as common and universal in the relevant technical field, but other terms may exist depending on technological development and/or changes, conventions, preferences of the skilled technician, etc. Therefore, the terms used in the following description should not be understood as limiting the technical concept, but as illustrative terms to explain the embodiments. In addition, there are terms arbitrarily selected by the applicant in specific cases, and their detailed meanings will be described in the relevant explanatory section. Therefore, the terms used in the description below must be understood not merely as their names, but based on their meanings and the content throughout the specification. Singular expressions may include plural expressions unless the context clearly indicates otherwise. Terms used herein, including technical or scientific terms, may have the same meaning as generally understood by those skilled in the art as described in this specification. Additionally, terms including ordinal numbers, such as "f