US-12626530-B2 - Joint detection apparatus, learning-model generation apparatus, joint detection method, learning-model generation method, and computer readable recording medium

US12626530B2US 12626530 B2US12626530 B2US 12626530B2US-12626530-B2

Abstract

A learning-model generation apparatus 10 includes: an all-feature-amount-outputting unit that output, from image data of an object and for each joint of the object, a feature amount representing the joint; a feature-amount-generating unit that generates, from the feature amounts of the individual joints of the object and as training feature amounts, feature amounts in a case in which the feature amount of a certain joint is missing; and a learning-model-generating unit that, by using training data including the generated training feature amounts, generates a machine learning model by machine-learning positional relationships between the other joints in the case in which the feature amount of the certain joint is missing.

Inventors

Asuka ISHII

Assignees

NEC CORPORATION

Dates

Publication Date: 20260512
Application Date: 20220201
Priority Date: 20210226

Claims (9)

1 . A joint detection apparatus comprising: at least one memory storing instructions; and at least one processor configured to execute the instructions to: output, from image data of an object and for each joint of the object, a first feature amount representing the joint; and receive input of the first feature amounts of the individual joints of the object, and, for each joint of the object, output a second feature amount representing the joint using a machine learning model that has performed machine learning of positional relationships between other joints in cases in which a feature amount of a certain joint is missing, wherein the machine learning model is constructed using a convolutional neural network, and wherein each of the first feature amount and the second feature amount includes a heat map representing a likelihood that a joint in the image is present.
2 . The joint detection apparatus according to claim 1 , wherein the at least one processor is further configured to execute the instructions to: detect coordinates of the joints of the object using the second feature amounts of the individual joints of the object.
3 . The joint detection apparatus according to claim 1 , wherein the at least one processor is further configured to execute the instructions to: receive input of the first feature amounts of the individual joints of the object, and, for each joint of the object, output a second feature amount representing the joint using a machine learning model that has performed, for each of a plurality of the certain joints, machine learning of positional relationships between other joints in a case in which a feature amount of the certain joint is missing.
4 . A joint detection method comprising: outputting, from image data of an object and for each joint of the object, a first feature amount representing the joint; and receiving input of the first feature amounts of the individual joints of the object, and, for each joint of the object, outputting a second feature amount representing the joint using a machine learning model that has performed machine learning of positional relationships between other joints in cases in which a feature amount of a certain joint is missing, wherein the machine learning model is constructed using a convolutional neural network, and wherein each of the first feature amount and the second feature amount includes a heat map representing a likelihood that a joint in the image is present.
5 . The joint detection method according to claim 4 further comprising: detecting coordinates of the joints of the object using the second feature amounts of the individual joints of the object.
6 . The joint detection method according to claim 4 , wherein, in the outputting of the partial feature amounts, input of the first feature amounts of the individual joints of the object is received, and, for each joint of the object, a second feature amount representing the joint is output using a machine learning model that has performed, for each of a plurality of the certain joints, machine learning of positional relationships between other joints in a case in which a feature amount of the certain joint is missing.
7 . A computer non-transitory readable recording medium that includes a program recorded thereon, the program including instructions that cause a computer to: output, from image data of an object and for each joint of the object, a first feature amount representing the joint; and receive input of the first feature amounts of the individual joints of the object, and, for each joint of the object, output a second feature amount representing the joint using a machine learning model that has performed machine learning of positional relationships between other joints in cases in which a feature amount of a certain joint is missing, wherein the machine learning model is constructed using a convolutional neural network, and wherein each of the first feature amount and the second feature amount includes a heat map representing a likelihood that a joint in the image is present.
8 . The non-transitory computer readable recording medium according to claim 7 , wherein the program further includes instructions that cause the computer to detect coordinates of the joints of the object using the second feature amounts of the individual joints of the object.
9 . The non-transitory computer readable recording medium according to claim 7 , wherein, in the outputting of the partial feature amounts, input of the first feature amounts of the individual joints of the object is received, and, for each joint of the object, a second feature amount representing the joint is output using a machine learning model that has performed, for each of a plurality of the certain joints, machine learning of positional relationships between other joints in a case in which a feature amount of the certain joint is missing.

Description

This application is a National Stage Entry of PCT/JP2022/003766 filed on Feb. 1, 2022, which claims priority from Japanese Patent Application 2021-029411 filed on Feb. 26, 2021, the contents of all of which are incorporated herein by reference, in their entirety. TECHNICAL FIELD The present invention relates to a joint detection apparatus and a joint detection method for detecting joints of a living body from an image, and further relates to a computer readable recording medium that includes recorded thereon a program for realizing the joint detection apparatus and the joint detection method. Furthermore, the invention relates to a learning-model generation apparatus and a learning-model generation method for generating a learning model for detecting joints of a living body from an image, and further relates to a computer readable recording medium that includes recorded thereon a program for realizing the learning-model generation apparatus and the learning-model generation method. BACKGROUND ART In recent years, systems for estimating human pose from images have been proposed. Such systems are expected to be used in fields such as video monitoring and user interfaces. For example, an image monitoring system capable of estimating human pose would make it possible to estimate what a person captured by a camera is doing, and monitoring accuracy can thus be improved. Furthermore, a user interface capable of estimating human pose would make it possible to perform input via gestures. For example, Non-Patent Document 1 discloses a system for estimating human pose, or more specifically, the pose of a human hand from images. The system disclosed in Non-Patent Document 1 first acquires image data including an image of a hand, and then inputs the acquired image data to a neural network that has performed machine learning of image feature amounts of individual joints and causes the neural network to output, for each joint, a heat map that represents the likelihood of presence of the joint via color and density. Subsequently, the system disclosed in Non-Patent Document 1 inputs the output heat maps to a neural network that has performed machine learning of the relationships between joints and corresponding heat maps. Furthermore, a plurality of such neural networks are prepared, and the results output from one neural network is input to another neural network. Thus, the positions of joints on heat maps are refined. In addition, Patent Document 1 also discloses a system for estimating hand pose from images. Similarly to the system disclosed in Non-Patent Document 1, the system disclosed in Patent Document 1 also estimates coordinates of joints using a neural network. LIST OF RELATED ART DOCUMENTS Patent Document Patent Document 1: Japanese Patent Laid-Open Publication No. 2017-191576 Non-Patent Document Non-Patent Document 1: Christian Zimmermann, Thomas Brox, “Learning to Estimate 3D Hand Pose from Single RGB Images”, [online], University of Freiburg, [retrieved on Feb. 8, 2021], Internet: <URL: https://openaccess.thecvf.com/content_ICCV_2017/papers/Zimmermann_Learning_to_Estimate_ICCV_2017_paper.pdf> SUMMARY OF INVENTION Problems to be Solved by the Invention While coordinates of joints of a human hand can be estimated from an image as described above by using the systems disclosed in Non-Patent Document 1 and Patent Document 1, these systems are problematic in that estimation accuracy decreases as described in the following. First of all, some of the many joints that a living body has may not be visible in an image. In such a case, with the systems disclosed in Non-Patent Document 1 and Patent Document 1, the joints that are not visible in the image may be located at incorrect positions in the heat maps. Furthermore, due to this, when the positions of joints are refined by neural networks, even the joints that are visible in the image are positioned incorrectly as a result of being dragged by the incorrect positions of the joints that are not visible in the image. An example object of the invention is to provide a joint detection apparatus, a learning-model generation apparatus, a joint detection method, a learning-model generation method, and a computer readable recording medium with which the accuracy of the estimation of joint positions can be improved. Means for Solving the Problems In order to achieve the above-described object, a joint detection apparatus includes: an all-feature-amount-outputting unit that outputs, from image data of an object and for each joint of the object, a first feature amount representing the joint; anda partial-feature-amount-outputting unit that receives input of the first feature amounts of the individual joints of the object, and, for each joint of the object, outputs a second feature amount representing the joint using a machine learning model that has performed machine learning of positional relationships between other joints in cases in which a feature amount of a certain joint i