Search

CN-121981877-A - Intelligent image style real-time conversion method and system based on deep learning

CN121981877ACN 121981877 ACN121981877 ACN 121981877ACN-121981877-A

Abstract

The invention provides a method and a system for real-time conversion of intelligent image styles based on deep learning, which relate to the technical field of deep learning and comprise the steps of extracting content characteristics through a multi-level convolutional neural network, analyzing space complexity distribution, and establishing a reverse mapping relation of the spatial position and the style fusion proportion, and carrying out iterative adjustment based on the semantic deviation index and the style intensity parameter set by the user to finally realize style conversion. The invention can realize self-adaptive style fusion while maintaining the semantic content of the image, and improve the quality and efficiency of image style conversion.

Inventors

  • FANG JIANPING
  • LIU YONG

Assignees

  • 深圳市束光文化科技有限公司

Dates

Publication Date
20260505
Application Date
20260126

Claims (10)

  1. 1. The intelligent image style real-time conversion method based on deep learning is characterized by comprising the following steps of: The method comprises the steps of obtaining an input image and a style reference image, extracting features of the input image through a multi-level convolutional neural network to obtain multi-level content features, and carrying out spatial gradient analysis on the multi-level content features to obtain spatial complexity distribution; Establishing a reverse mapping relation between the spatial position and the style fusion proportion according to the spatial complexity distribution, so that the spatial complexity distribution value and the style characteristic representation fusion proportion are in negative correlation, and obtaining an initial fusion characteristic; respectively inputting the initial fusion features and the multi-level content features into a pre-trained semantic extractor to calculate the distance measurement of the semantic feature space so as to obtain a semantic deviation index; Based on the global style intensity parameter set by the user and the semantic deviation index, the fusion proportion in the reverse mapping relation is iteratively adjusted through a reverse propagation mechanism, so that a regulation fusion feature is obtained, and the regulation fusion feature is reconstructed through a decoding network to obtain an output image.
  2. 2. The method of claim 1, wherein the step of spatially gradient analyzing the multi-level content features to obtain a spatial complexity distribution comprises: performing multi-direction gradient calculation on each level of the multi-level content features to obtain gradient components of each spatial position in the horizontal direction and the vertical direction, and performing amplitude synthesis on the gradient components to obtain gradient amplitudes of each spatial position; respectively carrying out statistics aggregation on the gradient amplitude values in the neighborhood ranges of different scales to obtain multi-scale gradient statistics, calculating the variation degree among the multi-scale gradient statistics, marking the area with the variation degree exceeding a preset degree threshold as a multi-scale complex area, and marking the area with the variation degree not exceeding the preset degree threshold as a single-scale flat area; and constructing spatial complexity distribution comprehensively reflecting the fineness degree and the scale change characteristic of the content structure based on the spatial distribution of the multi-scale complex region and the single-scale flat region.
  3. 3. The method of claim 1, wherein establishing a reverse mapping relationship between spatial locations and style fusion proportions according to the spatial complexity distribution such that spatial complexity distribution values are inversely related to the style feature representation fusion proportions, the step of obtaining initial fusion features comprising: Performing structural similarity analysis on the multi-level content features, mapping each spatial position to a structural representation space, calculating structural similarity among the positions, and clustering and grouping according to a similarity threshold value to obtain a plurality of structural similarity region groups; Performing multi-scale decomposition on the style characteristic representation, and separating to obtain coarse-granularity style components containing low-frequency global components and fine-granularity style components containing high-frequency local components; After the spatial complexity distribution is normalized, mapping the spatial complexity distribution into coarse granularity fusion weights through a first monotonically decreasing function, and mapping the spatial complexity distribution into fine granularity fusion weights through a second monotonically decreasing function with the decreasing rate being greater than that of the first monotonically decreasing function; for each structural similar region group, uniformly adjusting coarse granularity fusion weights and fine granularity fusion weights of all positions in the group to weight average values in the group respectively to obtain region-consistent fusion weight distribution; And respectively carrying out weighted fusion on the coarse-granularity style component and the fine-granularity style component and the multi-level content features according to corresponding fusion weight distribution, and then merging to obtain initial fusion features.
  4. 4. A method according to claim 3, wherein the step of performing structural similarity analysis on the multi-level content features, mapping each spatial location to a structural representation space and calculating structural similarity between locations, clustering according to a similarity threshold to obtain a plurality of groups of structural similarity regions comprises: Extracting a local neighborhood feature block from each spatial position in the multi-level content feature, and performing geometric transformation enhancement on the local neighborhood feature block to generate a plurality of sample feature blocks with unchanged structure and changed geometric attribute; inputting the original feature blocks and the sample feature blocks into a twin coding network with shared parameters, mapping the original feature blocks and the sample feature blocks into low-dimensional structural representation vectors, training the twin coding network through a contrast learning loss function to minimize the distance between the original feature blocks at the same position and the transformed samples thereof in an embedded space and maximize the distance between the original feature blocks at different positions, and obtaining structural representation space with geometrical transformation invariance; coding the local neighborhood feature blocks of all the spatial positions by utilizing the trained twin coding network to obtain structural representation vectors corresponding to the spatial positions; And constructing a similarity adjacency matrix, marking spatial position pairs with structural similarity larger than a set similarity threshold as similar adjacency relations, extracting connected components through graph connectivity analysis, and identifying each connected component as a structural similarity region group.
  5. 5. The method of claim 1, wherein the step of inputting the initial fusion feature and the multi-level content feature into a pre-trained semantic extractor to calculate a distance metric for a semantic feature space to obtain a semantic deviation index comprises: decomposing the initial fusion feature into a content-holding component and a style-introducing component through a feature separation network, wherein the content-holding component characterizes reserved original content semantic information, and the style-introducing component characterizes fused style semantic information; Inputting the multi-level content features into a pre-trained semantic extractor to obtain an original content semantic representation, and inputting the content retention components into the pre-trained semantic extractor to obtain a fused content semantic representation; calculating a direction offset angle between the original content semantic representation and the fused content semantic representation in a semantic feature space, wherein the direction offset angle represents the direction change degree of the content semantic vector in the semantic feature space; Inputting the style-introduced component into the pre-trained semantic extractor to obtain a style semantic representation, and calculating the modular length of the style semantic representation in a semantic feature space as style semantic strength; and calculating the weighted combination of the direction offset angle and the style semantic intensity to obtain a semantic deviation index, wherein the weight coefficient of the direction offset angle is larger than that of the style semantic intensity.
  6. 6. The method of claim 1, wherein the step of iteratively adjusting the fusion ratio in the reverse mapping relationship by a reverse propagation mechanism based on the global style intensity parameter set by the user and the semantic deviation index to obtain the regulatory fusion feature comprises: Constructing a multi-objective optimization function, wherein the multi-objective optimization function comprises a style intensity matching item, a semantic keeping item and a space smoothness item, the style intensity matching item measures the deviation between the current overall style intensity and a global style intensity parameter set by a user, the semantic keeping item measures the exceeding degree of a semantic deviation index relative to a semantic keeping threshold value, and the space smoothness item measures the gradient of the fusion proportion parameter of each space position in a neighborhood; setting a dynamic weight coefficient based on the current iteration number for each optimization term in the multi-objective optimization function, wherein the dynamic weight coefficient is updated according to the iteration process; Taking the multi-objective optimization function as a loss function, and iteratively adjusting the fusion proportion in the reverse mapping relation by counter-propagating calculation relative to the gradient of the parameter in the reverse mapping relation; And recalculating the fusion characteristics based on the adjusted fusion proportion, and outputting the current fusion characteristics as the regulation fusion characteristics when the multi-objective optimization function meets the preset termination condition.
  7. 7. The method of claim 6, wherein the step of setting dynamic weight coefficients based on the current iteration number for each optimization term in the multi-objective optimization function comprises: In each iteration process, calculating the current values of the grid strength matching item, the semantic keeping item and the space smoothness item in the multi-objective optimization function respectively, and comparing the current values of the optimization items with corresponding objective convergence thresholds to obtain normalized convergence progress of the optimization items; Calculating coupling adjustment factors among the optimization items based on the normalization convergence progress of each optimization item, wherein the coupling adjustment factors are obtained by calculating variances among the normalization convergence progress of different optimization items, and the larger the variances are, the larger the coupling adjustment factor values are; Calculating the deviation between the normalized convergence progress of each optimization term and the average convergence progress of all the optimization terms as a weight adjustment direction index, and multiplying the weight adjustment direction index by the coupling adjustment factor to obtain the weight adjustment quantity of each optimization term; and adding the current weight coefficient of each optimization term and the corresponding weight adjustment amount to obtain an updated dynamic weight coefficient, and carrying out normalization processing on the updated dynamic weight coefficient so that the sum of the weight coefficients of all the optimization terms is kept to be a constant value.
  8. 8. An intelligent image style real-time conversion system based on deep learning, for implementing the method of any one of the preceding claims 1-7, characterized in that it comprises: The system comprises a feature extraction and analysis module, a multi-level convolution neural network, a style reference image, a style feature representation and a style feature analysis module, wherein the feature extraction and analysis module is used for obtaining an input image and the style reference image; The self-adaptive fusion module is used for establishing a reverse mapping relation between the space position and the style fusion proportion according to the space complexity distribution, so that the space complexity distribution value and the style characteristic representation fusion proportion are in negative correlation, and an initial fusion characteristic is obtained; The semantic deviation evaluation module is used for respectively inputting the initial fusion features and the multi-level content features into a pre-trained semantic extractor to calculate the distance measurement of the semantic feature space so as to obtain a semantic deviation index; and the iterative optimization and reconstruction module is used for carrying out iterative adjustment on the fusion proportion in the reverse mapping relation through a reverse propagation mechanism based on the global style intensity parameter set by the user and the semantic deviation index to obtain a regulation fusion feature, and reconstructing the regulation fusion feature through a decoding network to obtain an output image.
  9. 9. An electronic device, comprising: A processor; A memory for storing processor-executable instructions; Wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 7.
  10. 10. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 7.

Description

Intelligent image style real-time conversion method and system based on deep learning Technical Field The invention relates to a deep learning technology, in particular to an intelligent image style real-time conversion method and system based on deep learning. Background The image style conversion technology is an important research direction in the fields of computer vision and deep learning, and aims to preserve the content of one image and adopt the artistic style of another image at the same time so as to generate an image with a new visual effect. Most of the existing style conversion methods lack perceptibility on image space complexity, and often apply style conversion with the same intensity to all areas in an image, so that excessive stylization is caused in the areas with rich details, and the style characteristics are insufficient in smooth areas, and the balance of the overall visual effect is influenced. Disclosure of Invention The embodiment of the invention provides an intelligent image style real-time conversion method and system based on deep learning, which can solve the problems in the prior art. In a first aspect of an embodiment of the present invention, there is provided an intelligent image style real-time conversion method based on deep learning, including: The method comprises the steps of obtaining an input image and a style reference image, extracting features of the input image through a multi-level convolutional neural network to obtain multi-level content features, and carrying out spatial gradient analysis on the multi-level content features to obtain spatial complexity distribution; Establishing a reverse mapping relation between the spatial position and the style fusion proportion according to the spatial complexity distribution, so that the spatial complexity distribution value and the style characteristic representation fusion proportion are in negative correlation, and obtaining an initial fusion characteristic; respectively inputting the initial fusion features and the multi-level content features into a pre-trained semantic extractor to calculate the distance measurement of the semantic feature space so as to obtain a semantic deviation index; Based on the global style intensity parameter set by the user and the semantic deviation index, the fusion proportion in the reverse mapping relation is iteratively adjusted through a reverse propagation mechanism, so that a regulation fusion feature is obtained, and the regulation fusion feature is reconstructed through a decoding network to obtain an output image. The step of performing spatial gradient analysis on the multi-level content features to obtain spatial complexity distribution comprises the following steps: performing multi-direction gradient calculation on each level of the multi-level content features to obtain gradient components of each spatial position in the horizontal direction and the vertical direction, and performing amplitude synthesis on the gradient components to obtain gradient amplitudes of each spatial position; respectively carrying out statistics aggregation on the gradient amplitude values in the neighborhood ranges of different scales to obtain multi-scale gradient statistics, calculating the variation degree among the multi-scale gradient statistics, marking the area with the variation degree exceeding a preset degree threshold as a multi-scale complex area, and marking the area with the variation degree not exceeding the preset degree threshold as a single-scale flat area; and constructing spatial complexity distribution comprehensively reflecting the fineness degree and the scale change characteristic of the content structure based on the spatial distribution of the multi-scale complex region and the single-scale flat region. Establishing a reverse mapping relation between a spatial position and a style fusion proportion according to the spatial complexity distribution, so that a spatial complexity distribution value is inversely related to the style characteristic representation fusion proportion, and the step of obtaining an initial fusion characteristic comprises the following steps: Performing structural similarity analysis on the multi-level content features, mapping each spatial position to a structural representation space, calculating structural similarity among the positions, and clustering and grouping according to a similarity threshold value to obtain a plurality of structural similarity region groups; Performing multi-scale decomposition on the style characteristic representation, and separating to obtain coarse-granularity style components containing low-frequency global components and fine-granularity style components containing high-frequency local components; After the spatial complexity distribution is normalized, mapping the spatial complexity distribution into coarse granularity fusion weights through a first monotonically decreasing function, and mapping the spatial comple