CN-114913274-B - Child image synthesis method based on street view

CN114913274BCN 114913274 BCN114913274 BCN 114913274BCN-114913274-B

Abstract

The application provides a child image synthesis method based on streetscape, which comprises the steps of detecting body orientation attributes of children in a foreground image dataset, detecting optimal reference adult targets and body orientation attributes thereof in the adult-related streetscape image dataset, extracting foreground images and streetscape images of any two pieces of same body orientation attributes, scaling the foreground images according to the size of a detection frame of the optimal reference adult targets, pasting the foreground images on the surrounding area of the optimal reference adult targets in the streetscape images, and solving the problem that the foreground images in synthesized image contents are inconsistent with surrounding environment of the attached area of the streetscape images, thereby ensuring the nature and rationality of the foreground and the background, ensuring the reasonable figure proportion of the adults and the children, and improving the generalization capability of detecting children in various streetscape scenes according to the child image synthesis in different urban streetscapes of different countries.

Inventors

Hong Zhiyang
LI QIU
CHEN CHANGQUAN
FU LEI
LI SHANLU
ZHANG XIAOTIAN
CHEN HAITAO
GONG XIAOLONG
LI XIAOKAI
XU BO

Assignees

盛视科技股份有限公司

Dates

Publication Date: 20260505
Application Date: 20220408

Claims (8)

1. The child image synthesis method based on the street view is characterized by comprising the following steps of: Detecting body orientation attributes of children in the foreground image data set of the children, and detecting optimal reference adult targets and body orientation attributes of the children in the street view image data set with adults; extracting any two foreground images with the same body orientation attribute and street view images; Scaling the foreground image according to a detection frame size of the best reference adult target, comprising the steps of: The method comprises the steps of carrying out ground segmentation processing on a street view image to segment a ground area, randomly selecting foreground pasting candidate areas with the same shape and size on the periphery of a detection frame of an optimal reference adult target of the street view image according to the shape and size of a mask image of the zoomed foreground image, judging whether the bottom of the foreground pasting candidate areas is in the ground area or not, randomly selecting the foreground pasting candidate areas with the same shape and size on the periphery of the detection frame of the optimal reference adult target of the street view image again if the bottom of the foreground pasting candidate areas is in the ground area, if the foreground pasting candidate areas are in the ground area, selecting the current foreground pasting candidate areas as the pasting areas of the foreground image, carrying out cross-over calculation on the pasting areas and the detection frame of the optimal reference adult target, if the cross-over ratio is smaller than a threshold value, finally determining the pasting areas, otherwise, randomly selecting the foreground pasting candidate areas with the same shape and size on the periphery of the detection frame of the optimal reference adult target of the street view image again, and judging whether the bottom of the foreground pasting candidate areas is in the ground area again; the foreground image is pasted on the best reference adult target surrounding area in the street view image.
2. The street view-based child image synthesis method according to claim 1, wherein the body orientation attribute includes at least 4 classes, forward, backward, left, and right.
3. The street view-based child image composition method of claim 1, wherein the method of detecting the best reference adult target in the street view image dataset with adults comprises the steps of: detecting all adult targets in the street view image dataset with adults; and selecting the best reference adult target with the highest detection score from all adult targets.
4. The method of street view based children's image synthesis of claim 1, wherein the method of scaling the foreground image according to the detection frame size of the best reference adult target comprises the steps of: Acquiring detection frame coordinates of an optimal reference adult target; Calculating the width and/or height of a detection frame of the optimal reference adult target; Scaling the foreground image to a reasonable ratio of the width and/or height of the foreground image to the width and/or height of the detection frame of the best reference adult target.
5. The street view-based child image synthesizing method according to claim 4, wherein the reasonable ratio calculating method comprises the following steps: Detecting the age of a child in the foreground image; the size of the reasonable ratio is set according to the age.
6. The street view-based child image synthesis method according to any one of claims 1 to 5, further comprising, after scaling the foreground image according to the detection frame size of the best reference adult target, the steps of: performing corrosion operation treatment on the mask image of the foreground image; and carrying out Gaussian blur processing on the mask image of the foreground image after the corrosion operation processing.
7. The street view-based child image synthesis method according to claim 6, further comprising the step of, after scaling the foreground image according to a detection frame size of the best reference adult target: selecting a pasting area of the foreground image in the street view image; Expanding the cutting and pasting area in the street view image according to a certain proportion by taking the pasting area as the center to obtain a local background image containing the pasting area; and calculating the color and illumination information of the local background image, and adjusting the color and illumination information of the foreground image by adopting an adaptive threshold method to match the local background image.
8. The method for synthesizing a child image based on street view according to claim 7, wherein the method for pasting the foreground image to the best reference adult target surrounding area in the street view image comprises the steps of: Pasting the foreground image with the color and illumination information regulated and the mask image of the foreground image to a pasting area of the street view image; and obtaining street view images with children and adults.

Description

Child image synthesis method based on street view Technical Field The application belongs to the technical field of computer vision, and particularly relates to a child image synthesis method based on street view. Street view technology Along with the wider application of the convolutional neural network (Convolutional Neural Network) in the field of computer vision, the dependence on image data is heavier and heavier, and the accuracy of the CNN network on different tasks is greatly affected by the amount of the image data. However, not all scene image data is sufficient in practical computer vision tasks, and images of some task scenes may be only a few or tens of thousands, which requires thousands, tens of thousands, even hundreds of thousands, millions, tens of millions, billions of image data in CNNs, which is far from sufficient, and the amount of image data is limiting the application of CNN networks. Therefore, how to obtain the image data of the specific scene is particularly important, and huge manpower and financial resources are required to collect the image data of each scene in the actual process, so that the problem can be well solved by image synthesis. Image composition refers to cutting off a specific foreground on one picture and pasting the specific foreground on another picture (background), so that a new image is obtained. The image synthesis realizes that images are from few to many, provides powerful support for extremely lacking image data under different task scenes, and provides sufficient training data for training of the CNN model. Besides the application of data enhancement, the image synthesis technology has wide application, such as portrait background changing, virtual social interaction, artistic creation, automatic generation of advertisement pictures and the like. In the city management system, a missing child in the street view image is detected along with the convolutional neural network, and the image data volume required for improving the detection precision is huge, so that the image synthesis is required. However, the environment complexity of the street view is higher than that of the conventional environment, the styles of different countries, different cities and different streets are different, the problems that the foreground and the background of the child image are uncoordinated, unnatural and reasonable are solved by adopting the conventional image synthesis method, such as the foreground is stuck to various unreasonable positions on the sky, the river, the wall, the automobile and the like, the scene with too large or too small foreground is matched with the background, and the inconsistency with the background content is caused on the street view of the target background placed by the foreground child with unreasonable orientation is selected. And the generalization capability of the CNN model for detecting children in various street scenes is seriously reduced. Disclosure of Invention The embodiment of the application aims to provide a child image synthesis method based on a street view, which aims to solve the technical problems that in the prior art, the child image synthesized based on the street view scene is unreasonable and the generalization capability of detecting children in the street view scene is influenced in the traditional image synthesis process. In order to achieve the purpose, the application adopts the technical scheme that the method for synthesizing the child image based on the street view comprises the following steps: Detecting body orientation attributes of children in the foreground image data set of the children, and detecting optimal reference adult targets and body orientation attributes of the children in the street view image data set with adults; extracting any two foreground images with the same body orientation attribute and street view images; scaling the foreground image according to a detection frame size of the optimal reference adult target; the foreground image is pasted on the best reference adult target surrounding area in the street view image. Preferably, the body facing attribute includes at least 4 categories, forward, rearward, left, and right. Preferably, the method for detecting the best reference adult target in the street view image dataset with the adult comprises the following steps: detecting all adult targets in the street view image dataset with adults; and selecting the best reference adult target with the highest detection score from all adult targets. Preferably, the method of scaling a foreground image according to a detection frame size of an optimal reference adult target comprises the steps of: Acquiring detection frame coordinates of an optimal reference adult target; Calculating the width and/or height of a detection frame of the optimal reference adult target; Scaling the foreground image to a reasonable ratio of the width and/or height of the foreground image to the width and/or height of