EP-4213713-B1 - VISION-BASED MOTION CAPTURE SYSTEM FOR REHABILITATION TRAINING

EP4213713B1EP 4213713 B1EP4213713 B1EP 4213713B1EP-4213713-B1

Inventors

LIN, SHIH-YAO
YANG, TAO
HUANG, CHAO
QIAN, ZHEN
FAN, WEI

Dates

Publication Date: 20260506
Application Date: 20211216

Claims (11)

A method for video-based motion capture, executable by at least one processor, the method comprising: obtaining video data including at least one body part of a person; in the video data, selecting keypoints of the at least one body part based on a predetermined rehabilitation category; extracting a motion feature of the at least one body part from the video data; determining a score of the motion feature based on the predetermined rehabilitation category; and generating a display illustrating the motion feature and said score of the motion feature, wherein said selecting the keypoints of the at least one body part based on the predetermined rehabilitation category comprises predicting the predetermined rehabilitation category by a deep neural network, DNN, configured to predict N possible regions representing possible locations of the keypoints with respect to the at least one body part, N is a positive integer, said predicting the predetermined rehabilitation category by the DNN comprises comparing the video data including at least one body part of a person to a plurality of anchor poses, the plurality of anchor poses each comprise poses of ones of predetermined rehabilitation categories, including the predetermined rehabilitation category, the extracting a motion feature of the at least one body part from the video data is carried-out by means of a keypoint distance extractor and a keypoint angle extractor, wherein the keypoint distance extractor calculates a Euclidean distance between two specific keypoints of the at least one body part, and the angle extractor calculates an angle between two limbs between upper and lower portions of an arm separated with an elbow therebetween of the at least one body part, wherein the two specific keypoints are selected from the keypoints selected based on the predetermined rehabilitation category.
The method according to claim 1, further comprising: scaling the at least one body part of the person to a predetermined size based on a height of the person, wherein said determining a score of the motion feature based on the predetermined rehabilitation category comprises said scoring after said scaling of the at least one body part of the person.
The method according to claim 2, further comprising: applying one or more Gaussian filters to the motion feature after the scaling the at least one body part of the person to a predetermined size based on a height of the person.
The method according to any of claims 1 to 3, wherein said generating the display illustrating the motion feature and said scoring of the motion feature comprises plotting the motion feature in the video.
The method according to claim 1, wherein said predicting the predetermined rehabilitation category by the DNN comprises ranking N*K pose regions, and wherein K is an integer indicating a number of predetermined keypoints of a human body.
The method according to claim 1, wherein said generating the display illustrating the motion feature and said scoring of the motion feature comprises generating the display such that at least one of the anchor poses is illustrated as overlayed on the at least one body part of the person.
The method according to any of claims 1 to 6, wherein the video data of the at least one body part of the person comprises a red-green-blue, RGB, image of the at least one body part of the person obtained by a monocular camera.
An apparatus for video coding comprising: at least one memory configured to store computer program code; at least one processor configured to access the computer program code and operate as instructed by the computer program code, the computer program code including: obtaining code configured to cause the at least one processor to obtain video data including at least one body part of a person; selecting code configured to cause the at least one processor to select keypoints of the at least one body part based on a predetermined rehabilitation category; extracting code configured to cause the at least one processor to extract a motion feature of the at least one body part from the video data; scoring code configured to cause the at least one processor to determine a score of the motion feature based on the predetermined rehabilitation category; and generating code configured to cause the at least one processor to generate a display illustrating the motion feature and said scoring of the motion feature, wherein the computer program code further comprises said selecting the keypoints of the at least one body part based on the predetermined rehabilitation category comprises predicting the predetermined rehabilitation category by a deep neural network, DNN, configured to predict N possible regions representing possible locations of the keypoints with respect to the at least one body part, N is a positive integer, said predicting the predetermined rehabilitation category by the DNN comprises comparing the video data including at least one body part of a person to a plurality of anchor poses, the plurality of anchor poses each comprise poses of ones of predetermined rehabilitation categories, including the predetermined rehabilitation category, the extracting a motion feature of the at least one body part from the video data is carried-out by means of a keypoint distance extractor and a keypoint angle extractor, wherein the keypoint distance extractor calculates a Euclidean distance between two specific keypoints of the at least one body part, and the angle extractor calculates an angle between two limbs between upper and lower portions of an arm separated with an elbow therebetween of the at least one body part, wherein the two specific keypoints are selected from the keypoints selected based on the predetermined rehabilitation category.
The apparatus according to claim 8, wherein the computer code further includes scaling code configured to cause the at least one processor to scale the at least one body part of the person to a predetermined size based on a height of the person, and wherein said determining a score of the motion feature based on the predeter3mined rehabilitation category comprises said scoring after said scaling of the at least one body part of the person.
The apparatus according to claim 9, wherein the computer code further includes application code configured to cause the at least one processor to apply one or more Gaussian filters to the motion feature after the scaling the at least one body part of the person to a predetermined size based on a height of the person.
A non-transitory computer readable medium storing a program causing a computer to execute a process, the process comprising the steps in the method of any of claims 1 to 7.

Description

BACKGROUND 1. Field The present disclosure is directed to technical solutions with respect to vision-based motion capture system (VMCS) for rehabilitation training. 2. Description of Related Art A physiotherapist may offer physical rehabilitation, such as for patients, including the elderly, suffering from motor dysfunction-related diseases and/or injuries. However, such rehabilitation training requires direct supervision by a professional physiotherapist, and when patients are performing rehabilitation exercises there is also a need for continuation guidance by a rehabilitation specialist. As such, the training and exercising requires arduous monitoring and attention of such patients by those professionals and specialists which requires substantial time and effort in assisting the rehabilitation efforts. Additionally, even without the professionals and specialists, it is expected that patients training themselves, such as at home rather than in a clinic, will lack the specialized rehabilitation experience and guidance offered by the physiotherapist which will therefore lead to poor rehabilitation performance and even cause further physical injuries. Attempting to automate assessment systems, such as by sensor-based approaches (e.g., wearable, infrared cameras), is complicated to use, expensive, and inadequately scalable. For example, such sensor-based motion capture system (SMCS) is technically inadequate, in at least requiring specific hardware, specialized programing to obtain a process data, complicated requirements to capture human body motion, high costs therewith such software and hardware, thereby limiting practical application with respect to rehabilitation training. US 2020/085348 A1 concerns simulation and evaluation of human physiological measurement, analysis, and diagnosis, and a method and system for simulation of physiological functions for monitoring and evaluation of bodily strength and flexibility automated biomechanical analysis of bodily strength and flexibility. Additionally, a method and system for simulation and evaluation of biomechanical functions for predicting, measuring, and diagnosing anterior cruciate ligament (ACL) symptoms, as well as other bodily joints or locations, for example, the lower back, shoulder, and elbow, is disclosed. DA GAMA ALANA ET AL: "Motor Rehabilitation Using Kinect: A Systematic Review", GAMES FOR HEALTH JOURNAL, vol. 4, no. 2, 1 April 2015 (2015-04-01), pages 123-135, XP093109407, concerns a systematic review of articles that involve interactive, evaluative, and technical advances related to motor rehabilitation. LEI QING ET AL: "A Survey of Vision-Based Human Action Evaluation Methods", SENSORS, vol. 1 9, no. 1 9, 24 September 2019 (2019-09-24), page 4129, XP093109409, concerns a comprehensive survey of approaches and techniques in action evaluation research, including motion detection and preprocessing using skeleton data, handcrafted feature representation methods, and deep learning-based feature representation methods. SUMMARY The invention is defined by the appended claims. BRIEF DESCRIPTION OF THE DRAWINGS Further features, nature, and various advantages of the disclosed subject matter will be more apparent from the following detailed description and the accompanying drawings in which: Fig. 1 is a simplified illustration of a schematic diagram in accordance with embodiments.Fig. 2 is a simplified illustration of a schematic diagram in accordance with embodiments.Fig. 3 is a simplified illustration of a diagram in accordance with embodiments.Fig. 4 is a simplified illustration of a flow diagram in accordance with embodiments.Fig. 5 is a simplified illustration of a diagram in accordance with embodiments.Fig. 6 is a simplified illustration of a diagram in accordance with embodiments.Fig. 7 is a simplified illustration of a diagram in accordance with embodiments.Fig. 8 is a simplified illustration of a diagram in accordance with embodiments.Fig. 9 is a simplified illustration of a diagram in accordance with embodiments.Fig. 10 is a simplified illustration of a diagram in accordance with embodiments.Fig. 11 a simplified illustration of a flowchart in accordance with embodiments. DETAILED DESCRIPTION The proposed features discussed below may be used separately or combined in any order. Further, the embodiments may be implemented by processing circuitry (e.g., one or more processors or one or more integrated circuits). In one example, the one or more processors execute a program that is stored in a non-transitory computer-readable medium. Fig. 1 illustrates a simplified block diagram of a communication system 100 according to an embodiment of the present disclosure. The communication system 100 may include at least two terminals 102 and 103 interconnected via a network 105. For unidirectional transmission of data, a first terminal 103 may code video data at a local location for transmission to the other terminal 102 via the network 105. The second termi