EP-3965007-B1 - ACTION RECOGNITION APPARATUS, LEARNING APPARATUS, AND ACTION RECOGNITION METHOD

EP3965007B1EP 3965007 B1EP3965007 B1EP 3965007B1EP-3965007-B1

Inventors

NEO, ATSUSHI
OGIHARA, Yukiko

Dates

Publication Date: 20260506
Application Date: 20210702

Claims (11)

A system comprising: a learning apparatus, comprising: a processor executing programs; and a storage device storing the programs; and an action recognition apparatus, comprising: a processor executing programs; and a storage device storing the programs, wherein the processor of the learning apparatus generates a group of action classification models by performing, according to a plurality of conditions: an acquisition process of acquiring training data including body frame information of an object to be learned, the body frame information comprising positions of a plurality of body frame points; a removal control process of removing a part of the body frame information of the object to be learned; a dimension reduction process of generating one or more components by dimension reduction for generating statistical components in multivariate analysis, on the basis of the partially removed body frame information of the object to be learned that was obtained by the removal control process; a control process of controlling an ordinal number indicating each dimension of components in an ascending order starting with the first variable among said one or more components, on the basis of an allowable calculation amount by sequentially acquiring calculation amounts associated with each component in ascending order starting with the first component and setting a maximum ordinal number to be one less than the ordinal number for when the allowable calculation amount is initially exceeded; an action learning process of learning actions of the object to be learned, generating an action classification model for classifying actions of the object to be learned and associating the action classification model with removal information regarding the partially removed body frame information on the basis of the component group starting with the first variable up to a component of the maximum ordinal number indicating the dimension controlled by the control process, and actions of the object to be learned; wherein the action recognition apparatus can access the group of action classification models each learned for a component group, using component groups in an ascending order starting with a first variable attained from a body frame information of an object to be learned through dimension reduction for generating statistical components in multivariate analysis and actions of the object to be learned; and wherein the processor of the action recognition apparatus performs: a detection process of detecting body frame information for an object to be recognized from to-be-analyzed data; a determination process of determining a body frame point that cannot be acquired and of determining partial body frame information for the object to be recognized; a dimension reduction process of generating one or more components and respective contribution ratios of said components through the dimension reduction, on the basis of the partial body frame information of the object to be recognized that was determined by the determination process; a determination process of determining an ordinal number indicating each dimension of the components in an ascending order starting with the first variable among said one or more components, on the basis of the respective contribution ratios; a selection process of selecting, among the group of action classification models, a specific action classification model in which learning was performed using the a combination of the same partially removed body frame information as the partially removed body information of the object to be recognized and the same component group as a specific component group from the first variable up to a component of the ordinal number indicating the dimension determined by the determination process; and an action recognition process of inputting the specific component group into the specific action classification model selected by the selection process, thereby outputting recognition results indicating actions of the object to be recognized.
The system according to claim 1, wherein the processor of the learning apparatus performs a calculation process of calculating a plurality of vertex angles constituting joint angles of the object to be learned on the basis of the body frame information for the object to be learned, and wherein, in the dimension reduction process, the processor generates said one or more components on the basis of the body frame information of the object to be learned and the vertex angles of the object to be learned that were calculated by the calculation process; and wherein the processor of the action recognition apparatus performs a calculation process of calculating a plurality of vertex angles constituting the joint angles of the object to be recognized on the basis of the body frame information for the object to be recognized, and wherein, in the dimension reduction process, the processor of the action recognition apparatus generates said one or more components and the contribution ratios on the basis of the body frame information of the object to be recognized and the vertex angles of the object to be recognized that were calculated by the calculation process.
The system according to claim 1, wherein the processor of the learning apparatus performs a calculation process of calculating an amount of movement of the object to be learned on the basis of a plurality of body frame information for the object to be learned taken at different timings, and wherein, in the dimension reduction process, the processor of the learning apparatus generates said one or more components on the basis of the body frame information of the object to be learned and the amount of movement of the object to be learned that was calculated by the calculation process; and wherein the processor of the action recognition apparatus performs a calculation process of calculating an amount of movement of the object to be recognized on the basis of a plurality of body frame information of the object to be recognized taken at different timings, and wherein, in the dimension reduction process, the processor generates said one or more components and the contribution ratios on the basis of the body frame information of the object to be recognized and the amount of movement of the object to be recognized that was calculated in the calculation process.
The system according to claim 1, wherein the processor of the learning apparatus performs a first normalization process of normalizing a size of the body frame information of the object to be learned, and wherein, in the dimension reduction process, the processor of the learning apparatus generates said one or more components on the basis of the body frame information for the object to be learned that has undergone first normalization by the first normalization process; and wherein the processor of the action recognition apparatus performs a first normalization process of normalizing a size of body frame information for the object to be recognized, and wherein, in the dimension reduction process, the processor of the action recognition apparatus generates said one or more components and said contribution ratios on the basis of the body frame information of the object to be recognized that has undergone first normalization by the first normalization process.
The system according to claim 4, wherein the processor of the learning apparatus performs a second normalization process of normalizing a possible value range of the body frame information and vertex angles of the object to be learned, and wherein, in the dimension reduction process, the processor of the learning apparatus generates said one or more components on the basis of the body frame information and vertex angles of the object to be learned that have undergone second normalization by the second normalization process; and wherein the processor of the action recognition system performs a second normalization process of normalizing a possible value range of a body frame information and vertex angles of the object to be recognized, and wherein, in the dimension reduction process, the processor of the action recognition system generates said one or more components and said contribution ratios on the basis of the body frame information of the object to be recognized and the vertex angles that have undergone second normalization by the second normalization process.
The system according to claim 1, wherein, in the determination process, the processor of the action recognition system determines an ordinal number indicating each dimension of components in an ascending order starting with the first variable necessary for the contribution ratios to exceed a threshold value.
The system according to claim 1, wherein the processor of the action recognition apparatus performs an interpolation process of interpolating the position of a body frame point that cannot be acquired for the object to be recognized, if any, and wherein, in the dimension reduction process, the processor of the action recognition apparatus generates said one or more components and said contribution ratios on the basis of the body frame information of the object to be recognized that was interpolated by the interpolation process.
The system according to claim 1, wherein: the dimension reduction process of generating one or more components comprises a principal component analysis process of generating one or more principal components by principal component analysis for generating statistical principal components in multivariate analysis.
The system according to claim 8, wherein the processor of the action recognition apparatus performs an interpolation process of interpolating the position of a body frame point that cannot be acquired for the object to be recognized, if any, and wherein, in the principal component analysis process, the processor of the action recognition apparatus generates said one or more principal components and said contribution ratios on the basis of the body frame information for the object to be recognized that that was interpolated by the interpolation process.
An action recognition method performed by a learning apparatus that includes: a processor executing programs; and a storage device storing the programs and an action recognition apparatus that includes: a processor executing programs; and a storage device storing the programs, wherein the learning apparatus generates a group of action classification models by performing, according to a plurality of conditions, via a learning method comprising: an acquisition process of acquiring training data including body frame information of an object to be learned, the body frame information comprising positions of a plurality of body frame points; a removal control process of removing a part of the body frame information of the object to be learned; a dimension reduction process of generating one or more components by dimension reduction for generating statistical components in multivariate analysis, on the basis of the partially removed body frame information of the object to be learned that was obtained by the removal control process; a control process of controlling an ordinal number indicating each dimension of components in an ascending order starting with the first variable among said one or more components, on the basis of an allowable calculation amount by sequentially acquiring calculation amounts associated with each component in ascending order starting with the first component and setting a maximum ordinal number to be one less than the ordinal number for when the allowable calculation amount is initially exceeded; an action learning process of learning actions of the object to be learned, generating an action classification model for classifying actions of the object to be learned and associating the action classification model with removal information regarding the partially removed body frame information on the basis of the component group starting with the first variable up to a component of the maximum ordinal number indicating the dimension controlled by the control process, and actions of the object to be learned; and wherein the action recognition apparatus can access the group of action classification models each learned for a component group, using component groups in an ascending order starting with a first variable attained from a body frame information of an object to be learned through dimension reduction for generating statistical components in multivariate analysis and actions of the object to be learned; and wherein the action recognition apparatus performs an action recognition method comprising: a detection process of detecting body frame information for an object to be recognized from to-be-analyzed data; a determination process of determining a body frame point that cannot be acquired and of determining partial body frame information for the object to be recognized; a dimension reduction process of generating one or more components and respective contribution ratios of said components through the dimension reduction, on the basis of the partial body frame information of the object to be recognized that was determined by the determination process; a determination process of determining an ordinal number indicating each dimension of the components in an ascending order starting with the first variable among said one or more components, on the basis of the respective contribution ratios; a selection process of selecting, among the group of action classification models, a specific action classification model in which learning was performed using the a combination of the same partially removed body frame information as the partially removed body information of the object to be recognized and the same component group as a specific component group from the first variable up to a component of the ordinal number indicating the dimension determined by the determination process; and an action recognition process of inputting the specific component group into the specific action classification model selected by the selection process, thereby outputting recognition results indicating actions of the object to be recognized.
The method according to claim 10, wherein: the dimension reduction process of generating one or more components comprises a principal component analysis process of generating one or more principal components by principal component analysis for generating statistical principal components in multivariate analysis.

Description

CLAIM OF PRIORITY The present application claims priority from Japanese patent application JP 2020-148759 filed on September 4, 2020 and Japanese patent application JP 2021-037260 filed on March 9, 2021. BACKGROUND The present invention relates to an action recognition apparatus, a learning apparatus, and an action recognition method. As background art of the relevant technical field, Japanese Patent Application Laid-Open Publication No. 2012-101284 discloses an intention estimation apparatus that identifies whether a given action by a person is the intended action without relying on a biological signal such as the surface myoelectric potential. This intention estimation apparatus acquires action information using a measurement method for the position of the person engaging in the action and the angle of the action to restrict the actions of the person to a range that can be achieved by the person, extracts the joint angle of the person engaging in the action as well as position information of the tip position of the part of the person engaging in the action and employs multivariate analysis, and uses a threshold for identifying whether the action performed by the person was intended, thereby identifying whether the action was intended without relying on biological signals such as the surface myoelectric potential. In the technique disclosed in Japanese Patent Application Laid-Open Publication No. 2012-101284, in order to estimate the intention of the action by the person, a binary determination is made as to whether the action was intended, and thus, it is not possible to classify the intention of complex actions of a plurality of types, resulting in the risk that the accuracy in estimating the intention of the action is markedly decreased. Yao et al., Proc IEEE International Conference on Multimedia and Expo 2017 discloses 3D human action recognition based on the spatial-temporal moving skeleton descriptor SUMMARY An object of the present invention is to recognize, at a high accuracy, a plurality of types of actions of an object subjected to recognition. An aspect of an action recognition apparatus disclosed in this application is as set forth in claim 1. An aspect of an action recognition method disclosed in this application is as set forth in claim 10. According to a representative embodiment of the present invention, it is possible to recognize, at high accuracy, a plurality of types of actions of an object subjected to recognition. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a descriptive view showing a system configuration example for an action recognition system of Embodiment 1.FIG. 2 is a block diagram for illustrating a hardware configuration example of each of computers.FIG. 3 is a descriptive view showing an example of the learning data.FIG. 4 is a block diagram showing a functional configuration example of the action recognition system according to Embodiment 1.FIG. 5 is a block diagram showing a detailed functional configuration example of the body frame information processing units.FIG. 6 is a descriptive view indicating a detailed calculation method for the joint angles executed by the joint angle calculation unit.FIG. 7 is a descriptive view showing an example of a detailed calculation method for the amount of movement between frames executed by the movement amount calculation unit.FIG. 8 is a descriptive view indicating a detailed normalization method for the body frame information executed by the normalization unit.FIG. 9 is a descriptive view showing a detailed example of a training signal retained by the training signal DB.FIG. 10 is a descriptive view showing an example in which the principal components generated by the principal component analysis unit with the training signal as input data are plotted on a principal component space.FIG. 11 is a descriptive view showing a detailed method in which the action learning unit learns actions and the action recognition unit classifies the actions.FIG. 12 is a graph that indicates the progression of the cumulative contribution ratio used by the dimension count decision unit in determining the dimension count.FIG. 13 is a flowchart showing an example of detailed process steps of a learning process performed by the server (learning apparatus) according to Embodiment 1.FIG. 14 is a flowchart showing an example of detailed process steps of a body frame information process according to Embodiment 1.FIG. 15 is a flowchart showing an example of action recognition process steps performed by the client (action recognition apparatus) according to Embodiment 1.FIG. 16 is a block diagram showing a functional configuration example of the action recognition system according to Embodiment 2.FIG. 17 is a flowchart showing an example of detailed process steps of a learning process performed by the server (learning apparatus) a