CN-122024273-A - Human body size prediction method based on mobile end cross-view diagram attention

CN122024273ACN 122024273 ACN122024273 ACN 122024273ACN-122024273-A

Abstract

The application discloses a human body size prediction method based on the attention of a cross-view image of a mobile end, which relates to the crossing field of computer vision and clothing engineering, and comprises the steps of collecting a cross-view image of a human body and preprocessing the image; extracting features of the cross-view images, obtaining human body embedded representation irrelevant to view angles through cross-view image attention fusion, mapping the human body embedded representation to a microplasma template, constructing a human body size prediction model to obtain target sizes, carrying out semantic calibration and uncertainty estimation on the target sizes, outputting size prediction results and confidence intervals, and executing quality control and rollback according to the prediction results and the confidence intervals. According to the application, the user only needs to input the cross-visual angle image shot by the mobile device into the constructed human body size prediction model, so that high-precision human body size data can be automatically output, and an efficient, convenient and low-cost solution is provided for tailoring.

Inventors

CHI CHENG
ZHOU GUANQING
SHA SHA
WANG LEI
SHI YANLI

Assignees

武汉纺织大学
松滋弘翰服装有限公司

Dates

Publication Date: 20260512
Application Date: 20251112

Claims (10)

1. The human body size prediction method based on the attention of the cross-view diagram of the mobile terminal is characterized by comprising the following steps: s01, acquiring a human body cross-view image and preprocessing the image; S02, extracting depth features of the preprocessed cross-view image, and obtaining a key point thermodynamic diagram and contour pixel points by combining key point detection and contour detection; S03, obtaining human body embedded representation irrelevant to a visual angle through fusing depth characteristics, key point thermodynamic diagrams and contour pixel points through visual angle crossing graphs; S04, mapping the human body embedded representation to a micro body template, and constructing a human body size prediction model to obtain a target size; S05, carrying out semantic calibration and uncertainty estimation on the target size, and outputting a size prediction result and a confidence interval; S06, executing quality control and rollback according to the prediction result and the confidence interval.
2. The method for predicting human body size based on the attention of the cross-view of the mobile terminal as set forth in claim 1, wherein the step S01 comprises: S11, human body cross-view angle images are acquired by a smart phone or a camera and comprise at least two view angles of the front face, the left side face, the right side face and the back face of a human body; s12, image preprocessing comprises distortion correction, visual angle normalization and scale anchoring, wherein the scale anchoring comprises the steps of establishing a pixel-to-centimeter conversion ratio based on a reference object calibration method or a height calibration method, and the pixel-to-centimeter conversion formula is as follows: ; Wherein the method comprises the steps of For pixel density, screen size refers to the length of the diagonal of the smartphone or camera screen.
3. The method for predicting human body size based on the attention of the cross-view of the mobile terminal as set forth in claim 1, wherein the step S02 comprises: adopting MobileNetV-large network pre-trained by an ImageNet data set, removing a classification layer at the top of the network, and carrying out knowledge distillation and channel pruning optimization operation on the network to extract depth characteristics of the pre-processed cross-view images; Based on the preprocessed cross-view image, detecting and extracting two-dimensional pixel coordinates of key points of the human body number posture by adopting MEDIAPIPE model, and outputting a key point thermodynamic diagram; Based on the preprocessed image, accurately extracting the external contour of the human body by adopting a Canny edge detection algorithm, and obtaining the human body contour pixel points.
4. The method for predicting human body size based on the attention of the cross-view of the mobile terminal as set forth in claim 1, wherein the step S03 comprises: s31, constructing a cross-view graph structure, wherein the cross-view graph structure consists of nodes and edges, the nodes comprise human body key point nodes and contour sampling point nodes from different views, and the edges comprise cross-view angle edges and view angle inner edges; S32, constructing node mixed features, wherein the node mixed features comprise geometric coordinate information, semantic feature descriptors and local appearance descriptors, and further obtaining the features of each node; S33, carrying out feature fusion on node features based on a cross-view graph attention mechanism, calculating attention weights among nodes by using a graph attention network and updating the node features; S34, carrying out global pooling on all node characteristics to obtain an embedded vector with a fixed length, namely, the embedded vector is the human body embedded representation irrelevant to the visual angle.
5. The method for predicting human body size based on mobile terminal cross-view diagram attention as in claim 4, wherein in step S31, the cross-view angle edge is connected with the same semantic part under different views, and the view angle inner edge is connected with adjacent body part nodes in the same view angle and is constructed according to standard human body skeleton topology.
6. The method for predicting human body size based on mobile-end cross-view attention as set forth in claim 4, wherein in step S32, the feature of each node i is that The three-dimensional splicing structure is formed by splicing the following three parts: 1) Geometric coordinate information, namely two-dimensional pixel coordinates (p x ,p y ) of the nodes, and carrying out normalization processing; 2) The semantic feature descriptor only comprises key point nodes, namely a feature vector f sem extracted from the output depth feature map through bilinear interpolation according to the key point coordinates; 3) The local appearance descriptor only uses the contour point node as a center, extracts a local image block and calculates a local binary pattern feature f lbp ; Thus, the node characteristics are expressed as: for a keypoint node: ; for contour point nodes: ; wherein [ ] represents a splicing operation.
7. The method for predicting human body size based on cross-view attention of mobile terminal as set forth in claim 4, wherein the cross-view attention fusion in step S33 comprises L layers of attention network GAT, and H attention heads are set for each layer, and the specific implementation process is as follows: Calculating a feature attention coefficient based on the nodes in S32; Normalizing the attention coefficient to obtain attention weight; For each attention header h, the characteristics of node i are updated in combination with the attention weight and activation function, and the outputs of the multiple attention headers are stitched or averaged as the final node characteristics.
8. The method for predicting human body size based on the attention of the cross-view of the mobile terminal as set forth in claim 1, wherein the step S04 comprises: S41, predicting initial parameters and differentially rendering, predicting initial shape parameters and gesture parameters of a micro-body template based on the human body embedded representation, and generating a three-dimensional grid, a differentially rendered two-dimensional contour and key points; Inputting the view-angle independent human body embedded representation generated in the step S03 into an initial parameter regressor, preliminarily predicting the shape parameter beta init and the posture parameter theta init, of the micro body template model SMPL through the differentiable function M (beta, theta) of the SMPL model to generate an initial three-dimensional human body grid V 3D , and projecting the three-dimensional human body grid V 3D back into a two-dimensional space by utilizing a differentiable renderer R to obtain a rendered two-dimensional contour map S render and a two-dimensional key point map J render; S42, performing physical constraint residual calculation, including calculation of re-projection loss between a rendering result and real two-dimensional information and biomechanical constraint loss based on a physical knowledge network; For the re-projection loss, the difference between the rendering result and the real 2D information extracted in S02 is calculated, including 1) the contour loss: wherein S gt is the true human body contour extracted from the artwork, 2) key point loss: where N is the total number of key points, Is MEDIAPIPE detected 2D keypoint coordinates, And Respectively representing rendering coordinates and real coordinates of the ith key point; For biomechanical constraint loss, a lightweight physical knowledge network PK-Net is introduced, the inputs of PK-Net are the SMPL parameters [ beta, theta ] and the three-dimensional grid vertices V 3D generated by the parameters, and the output is a physical rationality score L physics ,L physics which is regularization loss; s43, fine tuning and optimizing instance-level parameters, and carrying out fine tuning on the exclusive three-dimensional shape of the current input sample on the basis of the trained model; fixing all trained network parameters in the human body size prediction model, and minimizing an image fitting and physical rationality loss function L inference by a gradient descent algorithm by taking only the SMPL parameters beta and theta as optimizable variables, wherein the method comprises the following steps of Obtaining optimized SMPL parameters beta and theta, wherein lambda 1 、λ 2 、λ 3 is the weight coefficient of each loss, and directly optimizing the SMPL parameters by using a gradient descent algorithm: ; S44, driving the SMPL model by using the optimized parameters beta ∗ and theta ∗ , generating a final three-dimensional grid, and calculating the human body size along a predefined path on the optimized three-dimensional grid through a virtual flexible rule.
9. The method for predicting human body size based on the attention of the cross-view diagram of the mobile terminal according to claim 8, wherein the specific network structure, training data and training method of the physical knowledge network PK-Net are as follows: The network structure comprises a PK-Net as a lightweight discriminator network, wherein the core structure is a multi-layer perceptron, specifically, an input layer receives a characteristic vector formed by jointly splicing a SMPL shape parameter beta, an attitude parameter theta and a key human body size descriptor calculated from a three-dimensional grid V 3D , the characteristic vector is sequentially processed through three full-connection layers, the number of neurons is 512, 256 and 128 respectively, a batch normalization layer and SiLU activation function are respectively arranged behind each full-connection layer, and finally, an output layer restricts the output within the range of (0, 1) through a Sigmoid activation function to generate a scalar physical rationality score L physics ; Training data is constructed by: The positive sample is a reasonable shape, namely the positive sample is directly selected from static and neutral gesture samples of the MV Human Net public data set, and is endowed with a label 0; negative samples, namely unreasonable shapes, are artificially generated by applying random disturbance conforming to specific distribution to the SMPL parameters beta and theta of positive samples, including proportional disturbance, attitude disturbance and shape extreme, and all the negative samples are endowed with a label 1; The training method comprises the steps of 1) a pre-training stage, namely, independently training PK-Net by using the constructed positive and negative sample data sets until the positive and negative sample data sets can distinguish reasonable and unreasonable physical forms, 2) an integrated fine-tuning stage, namely, integrating the pre-trained PK-Net as a fixed physical constraint module into an end-to-end training process of a human body size prediction model, wherein in the stage, parameters of the PK-Net are frozen, L physics output by the PK-Net is used as one of total loss functions, or fine-tuning the PK-Net and the main model, namely, the human body size prediction model, together with a certain learning rate to enable the PK-Net to better adapt to the form distribution in the main model optimization process, and finally generating the PK-Net through a gradient reverse propagation guide optimization process when the three-dimensional human body predicted by the main model has abnormal limb proportion and the condition that the joint angle is out of limits, so that the model is in line with the input image observation, and the physical and high-fidelity dimension prediction model is met.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 9 when the program is executed by the processor.

Description

Human body size prediction method based on mobile end cross-view diagram attention Technical Field The invention belongs to the crossing field of computer vision and clothing engineering, relates to human body size prediction, and in particular relates to a human body size prediction method based on the attention of a cross-view diagram of a mobile end. Background With the development of information technology in the clothing industry, intelligent customization technology has become one of the main technologies for promoting clothing customization development. Automatic measurement of the human body plays a vital role in realizing intelligent customization technology. The traditional contact type human body dimension measuring technology is low in measuring efficiency and depends on the expertise of a measurer, and the high-precision three-dimensional human body scanning technology is difficult to popularize due to high price and complex operation. The method can not meet the measurement requirements of modern electronic commerce and personalized customization on remote, real-time, low-cost and convenience. In contrast, two-dimensional image-based methods exhibit the potential to be inexpensive, easy to operate, and easy to popularize by mobile devices. However, there are still many technical problems to be solved in the existing measurement methods based on two-dimensional images, especially the techniques facing the mobile terminal. For example, most methods rely on single view images, which are taken only from the front or side, and cannot acquire three-dimensional information of the human body. The current mainstream technology mostly adopts an end-to-end black box regression model, directly maps image features to size values, is opaque in decision process, and has poor robustness and generalization capability for the variation of non-standard body types or shooting conditions. Furthermore, existing mobile measurement applications lack judgment of image input quality and error handling mechanisms. Particularly, in the practical application scene of the mobile terminal, more challenges exist that the reliability of the features is directly affected due to non-uniform image viewing angle and scale, distortion of a mobile phone lens and differences of imaging quality of different devices caused by self-shooting of a user. Although some research of multi-view fusion exists, simple feature stitching or average pooling is generally adopted, complex geometric relationships between different views cannot be effectively modeled, and complementarity of multi-view information is difficult to fully utilize. In summary, it is difficult to achieve a good balance among measurement accuracy, model robustness, mobile end suitability and user experience in the prior art. Therefore, developing a multi-view human body size prediction method which can fully utilize the convenience of a mobile terminal and realize high accuracy, strong robustness and user friendliness through an innovative algorithm has become a key for pushing the technology to commercial application. Disclosure of Invention The invention aims to solve the problems that the prior art is difficult to achieve good balance among measurement precision, model robustness, mobile terminal adaptability, user experience and the like, provides a human body size prediction method based on the mobile terminal cross-view diagram attention, provides a technical solution which fully utilizes mobile terminal convenience, high precision, strong robustness and is user-friendly for the field of intelligent clothing customization, and has great application value and practical significance. The above object of the present application is achieved by the following technical solutions: s01, acquiring a human body cross-view image and preprocessing the image; S02, extracting depth features of the preprocessed cross-view image, and obtaining a key point thermodynamic diagram and contour pixel points by combining key point detection and contour detection; S03, obtaining human body embedded representation irrelevant to a visual angle through fusing depth characteristics, key point thermodynamic diagrams and contour pixel points through visual angle crossing graphs; S04, mapping the human body embedded representation to a micro body template, and constructing a human body size prediction model to obtain a target size; S05, carrying out semantic calibration and uncertainty estimation on the target size, and outputting a size prediction result and a confidence interval; S06, executing quality control and rollback according to the prediction result and the confidence interval. Optionally, step S01 includes: S11, human body cross-view angle images are acquired by a smart phone or a camera and comprise at least two view angles of the front face, the left side face, the right side face and the back face of a human body; s12, image preprocessing comprises distortion correction, visual angle n