CN-112307815-B - Image processing method, device, electronic equipment and readable storage medium
Abstract
The embodiment of the application provides an image processing method, an image processing device, electronic equipment and a readable storage medium, wherein the method comprises the steps of acquiring a face image of a user; and obtaining the sight focus position of the user based on the face image by using a neural network model. Based on the method provided by the embodiment of the application, the accuracy of estimating the sight focus position of the user can be effectively improved.
Inventors
- GUO TIANCHU
- LIU YONGCHAO
- LIU XIABING
- ZHANG HUI
- HAN ZAIJUN
- CUI CHANGGUI
- GUO RONGJUN
- Shu Bingren
Assignees
- 北京三星通信技术研究有限公司
- 北京三星通信技术研究有限公司
- 三星电子株式会社
- 三星电子株式会社
Dates
- Publication Date
- 20260421
- Application Date
- 20190726
- Priority Date
- 20190726
Claims (17)
- 1. An image processing method, comprising: Acquiring an output line-of-sight position from an input face image of a user using a neural network model; Based on the outputted gaze location, obtaining a gaze location of the user by at least one of: Determining a position adjustment parameter of the user, and adjusting the output sight line position through the position adjustment parameter to obtain the sight line position of the user; Determining a predicted loss of the outputted gaze location, determining a confidence level of the outputted gaze location based on the predicted loss, and predicting a gaze location of a user based on the confidence level of the outputted gaze location; Wherein the position adjustment parameter is obtained by: acquiring an adjusted facial image of the user, wherein the adjusted facial image corresponds to the calibration object; obtaining a sight position corresponding to the adjusted face image based on the adjusted face image by using a neural network model; and determining the position adjustment parameters according to the line-of-sight position corresponding to the adjusted face image and the position of the calibration object.
- 2. The method of claim 1, wherein predicting the gaze location of the user based on the confidence of the outputted gaze location comprises: If the confidence is greater than a set threshold, determining the output sight line position as the sight line position of the user, or And if the confidence coefficient is not greater than the set threshold value, adjusting the output sight line position to obtain the sight line position of the user, or determining the sight line position corresponding to the face image of the previous frame as the sight line position of the user.
- 3. The method of claim 1, wherein determining a confidence level for the outputted line-of-sight position based on the predicted loss comprises: Determining at least two sets of perturbations of the predictive loss to the facial image; Respectively correcting the face image based on at least two groups of disturbance to obtain at least two corrected face images; inputting the at least two corrected facial images into a neural network model to obtain the corresponding output sight positions of the at least two corrected images; and obtaining the confidence according to the output sight positions corresponding to the at least two corrected images.
- 4. A method according to claim 3, wherein said deriving a confidence level from the line-of-sight positions of the outputs corresponding to the at least two corrected images comprises: determining standard deviation according to the output sight positions corresponding to the at least two corrected images; the inverse of the standard deviation was taken as the confidence.
- 5. A method according to claim 3, wherein said determining at least two disturbances of the predicted loss to the facial image comprises at least one of: Determining at least two disturbances to the facial image in the at least two directions based on the predicted losses in each of the at least two directions; the at least two perturbations are determined based on at least two perturbation coefficients.
- 6. The method of claim 1, wherein obtaining the gaze location of the output comprises: Cutting the face image to obtain a global face image and a local face image of the face image; Respectively inputting the global face image and the local face image into a neural network model to obtain the output sight position and the local sight position; The determining a predicted loss of the gaze location of the output, based on the predicted loss, determining a confidence of the gaze location of the output, comprising: Determining a predicted loss of gaze location of the output and a predicted loss of the local gaze location; determining at least two disturbances of the outputted line-of-sight position with respect to a global facial image and at least two disturbances of the local facial image with respect to the predicted loss of the local line-of-sight position; based on at least two disturbances corresponding to each image in the global face image and the local face image, respectively correcting each corresponding image to obtain at least four corrected images; inputting the at least four corrected images into a neural network model to obtain output sight positions corresponding to the at least four corrected images; And obtaining the confidence based on the corresponding output sight positions of the at least four corrected images.
- 7. The method of claim 3 or 6, wherein determining a disturbance of the face image by the predictive loss comprises: determining a gradient change of the prediction loss for each pixel point in the face image; determining disturbance of prediction loss to each pixel point according to the gradient change corresponding to each pixel point; correcting the facial image based on the disturbance to obtain a corrected image, including: And superposing the disturbance corresponding to each pixel point and the original pixel value of the pixel point corresponding to the face image to obtain a corrected image.
- 8. The method according to any one of claims 1 to 6, wherein the neural network model is trained by: Acquiring a training sample set, wherein the training sample set comprises various sample images; and training the initial neural network model based on the sample images until the loss function converges, so as to obtain a trained neural network model.
- 9. The method of claim 8, wherein training the initial neural network model based on the sample images comprises: and determining the prediction loss of the neural network model to each sample image during each training, correcting each sample image according to the prediction loss, and carrying out the next training of the neural network model based on each corrected sample image.
- 10. The method of claim 8, wherein training the initial neural network model based on the sample images comprises: Acquiring a first neural network model; Training the first neural network model at least twice based on each sample image to obtain a first neural network model after each training; Predicting each training sample through the neural network model after each training to obtain a prediction result of each sample image corresponding to the neural network model after each training; and deleting the sample images in the training sample set based on the difference between the prediction result of each time corresponding to each sample image and the real result of the sample, so as to obtain the processed sample images.
- 11. The method according to claim 10, wherein, when the first neural network model is trained at least twice based on the sample image, the sample image at the time of the previous training is a sample image obtained by deleting a set number or set proportion of sample images having smaller differences between the predicted result of the sample image and the true result of the sample image, among the sample images employed at the time of the previous training.
- 12. The method of claim 8, wherein training the initial neural network model based on the sample images comprises: inputting each sample image into a teacher network model to obtain an output result of each sample image; Training a neural network model based on each sample image by taking each output result as a real result of each corresponding sample image; Wherein, the teacher network model is any teacher network model randomly selected from a teacher queue; each output result is used as a real result of each corresponding sample image, and each time the neural network model is trained based on each sample image, the method further comprises the following steps: Adding the neural network model after each training into the teacher queue; the model is empty when the teacher queue is initialized, and the real results of the sample images are real results corresponding to the labeling labels of the sample images.
- 13. The method of claim 8, wherein training the initial neural network model based on the sample images comprises: initializing a part of model parameters of the neural network model after each training, taking the other part of model parameters and the part of initialized model parameters as new model parameters of the neural network model, and carrying out the next training of the neural network model.
- 14. The method of claim 13, wherein initializing a portion of model parameters of the neural network model after each training comprises: Determining the importance degree of each filter in the neural network model; determining a target filter needing parameter initialization according to the importance degree of each filter; Initializing model parameters of each target filter; the initializing the model parameters of each target filter includes: decomposing a filter parameter matrix of a neural network layer where a target filter is positioned to obtain an orthogonal matrix of the filter parameter matrix; for a neural network layer where the target filters are located, determining feature vectors corresponding to the target filters in an orthogonal matrix corresponding to the neural network layer according to the positions of the target filters in the neural network layer in the corresponding neural network layer; Determining the two norms of the feature vector of each target filter in the same neural network layer according to the feature vector corresponding to each target filter in the same neural network layer; and for each target filter, determining initialized parameters of the target filter according to the feature vector corresponding to the target filter and the corresponding two norms in the neural network layer to which the target filter belongs.
- 15. The method of claim 9, wherein determining the predicted loss of each sample image by the neural network model for each training, and correcting each sample image based on the predicted loss comprises: determining the prediction loss of the neural network model to each sample image during each training; For each sample image, determining a gradient change of a prediction loss of the sample image for each pixel point in the sample image; determining disturbance of prediction loss to each pixel point according to the gradient change corresponding to each pixel point; And superposing the disturbance corresponding to each pixel point and the original pixel value of the pixel point corresponding to the sample image to obtain a corrected sample image.
- 16. An electronic device comprising a memory and a processor; The memory stores a computer program; The processor for invoking the computer program to perform the method of any of claims 1 to 15.
- 17. A computer readable storage medium, characterized in that the storage medium has stored therein a computer program which, when executed by a processor, implements the method of any one of claims 1 to 15.
Description
Image processing method, device, electronic equipment and readable storage medium Technical Field The present application relates to the field of computer technologies, and in particular, to an image processing method, an image processing apparatus, an electronic device, and a readable storage medium. Background Currently, with the development of science and technology, various electronic devices have become an integral part of life of people. In many application scenarios, it is sometimes necessary to estimate the focus of the line of sight of the user when using the electronic device, i.e. the focus of the line of sight of the user, for example, to select an application with the line of sight and start it (equivalent to using the line of sight as a mouse), or to push an advertisement according to the position of the line of sight, etc. The application scenes all need real-time and accurate estimation of the sight position of the user on the screen of the electronic equipment. However, in the existing implementation estimation scheme, estimation accuracy in practical application is to be improved. Disclosure of Invention The application aims to provide an image processing method, an image processing device, electronic equipment and a readable storage medium, so as to improve the accuracy of estimating the key point of a user's sight. The scheme provided by the embodiment of the application is as follows: in a first aspect, an embodiment of the present application provides an image processing method based on a neural network model, where the method includes: acquiring a face image of a user; and obtaining the sight focus position of the user based on the facial image by using the neural network model. In a second aspect, an embodiment of the present application provides a training method for a neural network model, where the method includes: Acquiring a training sample set, wherein the training sample set comprises various sample images; And training the initial target neural network model based on each sample image until the loss function converges, so as to obtain a trained target neural network model. In a third aspect, an embodiment of the present application provides an image processing apparatus, including: The image acquisition module is used for acquiring a face image of a user; and the sight focus position determining module is used for obtaining the sight focus position of the user based on the face image by using the neural network model. In a fourth aspect, an embodiment of the present application provides a training apparatus for a neural network model, including: The sample acquisition module acquires a training sample set, wherein the training sample set comprises various sample images; And the model training module is used for training the initial target neural network model based on each sample image until the loss function converges, so as to obtain a trained target neural network model. In a fifth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores a computer program, and the processor is configured to invoke the computer program to perform the method provided in the first aspect or the second aspect of the present application. In a sixth aspect, embodiments of the present application provide a computer readable storage medium having stored therein a computer program which when executed by a processor implements the method provided in the first or second aspect of the present application. The beneficial effects of the technical solution provided by the present application will be described in detail below with reference to specific embodiments and drawings, and are not described herein. Drawings In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below. Fig. 1 is a schematic flow chart of a training method of a neural network model according to an embodiment of the present application; FIG. 2 shows a flow diagram of a training method in an example of the application; fig. 3 is a schematic flow chart of an image processing method according to an embodiment of the present application; FIG. 4 shows a schematic view of a screen calibration object in an example of the application; fig. 5 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. Detailed Description Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as