CN-121353676-B - UI design diagram segmentation method based on front-end semantic understanding and related equipment

CN121353676BCN 121353676 BCN121353676 BCN 121353676BCN-121353676-B

Abstract

The invention discloses a UI design diagram segmentation method and related equipment based on front-end semantic understanding, wherein the method comprises the steps of obtaining a UI design diagram to be segmented, preprocessing the UI design diagram, and obtaining a processing design diagram; the method comprises the steps of carrying out feature extraction by utilizing a pre-trained multi-mode deep neural network based on a processing design drawing to obtain multi-mode features, carrying out component semantic segmentation on the processing design drawing based on the multi-mode features to obtain component data, carrying out rule clustering on atomic-level components based on an original boundary and component categories to construct a multi-level semantic hierarchy, and outputting a UI design drawing segmentation result according to the multi-level semantic hierarchy. The method can effectively realize semantic organization from the atomic assembly to the composite assembly, solves the problems that the prior method has poor adaptability to complex UI and is difficult to capture implicit semantics, and can be widely applied to the technical field of image processing.

Inventors

LI XUELONG
LI BINGYU
ZHAO ZHIYUAN
GAO JUNYU
SUN HAO

Assignees

中电信人工智能科技(北京)有限公司

Dates

Publication Date: 20260512
Application Date: 20251217

Claims (9)

1. A method for segmenting a UI design drawing based on front-end semantic understanding, the method comprising the steps of: Acquiring a UI design drawing to be segmented, and preprocessing the UI design drawing to obtain a processing design drawing; Based on the processing design diagram, performing feature extraction by utilizing a pre-trained multi-modal deep neural network to obtain multi-modal features, wherein the multi-modal features comprise visual features, text features and layout features; Based on the multi-mode characteristics, carrying out component semantic segmentation on the processing design drawing to obtain component data, wherein the component data comprises an original boundary and a component category of each atomic-level component in the processing design drawing; based on the original boundary and the component category, carrying out rule clustering on the atomic level components, and constructing a multi-level semantic hierarchy; outputting a UI design diagram segmentation result according to the multi-level semantic hierarchy; When the result of the feature extraction is the visual feature, the feature extraction is performed by using a pre-trained multi-modal depth neural network, including the following steps: loading convolution layers of a pre-training visual encoder into an original visual model, and performing freezing operation on the convolution layers in the original visual model to initialize the convolution layers to obtain an initial visual model, wherein the freezing operation comprises a first number of convolution layers before freezing and a second number of convolution layers after reserving, and the sum of the first number and the second number is the total number of the convolution layers of the initial visual model; training the initial visual model through a design drawing marked with a visual feature label, constructing average classification accuracy according to a training feature drawing output by a penultimate convolution layer in the initial visual model and the visual feature label, and carrying out parameter adjustment on the initial visual model based on the average classification accuracy to obtain a target visual model; And inputting the processing design diagram into the target visual model, and performing feature dimension reduction and coding on a target feature diagram output by a penultimate layer convolution layer in the target visual model through global average pooling and L2 normalization to obtain the visual features.
2. The method of claim 1, wherein the preprocessing the UI design drawing comprises at least one of: Scaling the UI design drawing based on a target size, and recording a mapping relation between the original size of the UI design drawing and the target size; Converting the UI design drawing into a preset color space; Performing image noise removal processing on the UI design diagram through Gaussian filtering; sharpening the fuzzy area of the UI design drawing; And carrying out gray scale normalization processing on the UI design drawing.
3. The method according to claim 1, wherein when the result of the feature extraction is the text feature, the feature extraction using a pre-trained multi-modal deep neural network comprises the steps of: Performing text recognition on the processing design diagram based on a preset optical character recognition engine to obtain a recognition text; positioning a text region boundary box corresponding to each recognition text through a multi-scale text detection algorithm; identifying text content in the text region bounding box by using a cyclic neural network or a transformer model to obtain a structured text; and inputting the structured text into a pre-trained large language model, and obtaining text semantic embedding serving as the text characteristic through model forward propagation processing.
4. The method according to claim 1, wherein when the result of the feature extraction is the layout feature, the feature extraction using a pre-trained multi-modal deep neural network comprises the steps of: When the HTML source code of the UI design drawing is acquired, extracting layout features based on a DOM structure from the processing design drawing; The DOM structure-based layout feature extraction comprises the following operations: Constructing a DOM tree structure based on the HTML source code, wherein the DOM tree structure comprises element levels of all DOM elements; converting CSS style information in the HTML source code to obtain boundary frame coordinates of each DOM element; Performing matching recognition of key elements on the DOM elements through a tag name and/or a class name, and generating layout features based on the key elements, wherein the layout features comprise the boundary box coordinates of the key elements, and the hierarchical relationship and the number of brother elements determined based on the element hierarchy; When the HTML source code of the UI design drawing is not acquired, carrying out layout feature inference based on visual analysis on the processing design drawing; wherein the visual analysis-based layout feature inference comprises the following operations: dividing the processing design diagram into a text area and a non-text area by combining optical character recognition with an image segmentation algorithm; outputting the boundary box coordinates of the text region based on the result of the optical character recognition by taking each text region as a text element; Performing contour detection on the non-text region, and further strengthening a weak edge through a self-adaptive threshold value and edge enhancement to obtain a non-text element and a corresponding boundary frame coordinate; Converting to obtain the element level based on the upper and lower and coverage relations of the text element and the non-text element in the processing design diagram; the layout features are generated based on the bounding box coordinates and the element hierarchy.
5. The method according to claim 1, wherein the performing component semantic segmentation on the process design graph based on the multi-modal feature to obtain component data comprises the steps of: Processing by utilizing a semantic segmentation network to obtain an atomic-level component and a corresponding original boundary in the processing design diagram; and carrying out semantic tag judgment on the detection area corresponding to each original boundary through a lightweight classifier to obtain the component category of each atomic-level component.
6. The method according to claim 1, wherein the performing rule clustering on the atomic-level components to construct a multi-level semantic hierarchy includes the following steps: Taking the atomic-level component as a fine-grained region at a component level, taking the original boundary as a component level bounding box and taking the component class as a component level semantic tag; determining the spatial relationship among the atomic-level components based on the original boundary, and further performing component clustering of functional association based on the component category to obtain a module-level middle granularity region; Marking a module-level boundary box based on the area range corresponding to the middle granularity area, and judging semantic tags of detection areas corresponding to the module-level boundary box to obtain module-level semantic tags of each middle granularity area; Dividing the processing design diagram into a plurality of large areas based on page layout logic by utilizing a rule template, classifying each medium granularity area into the corresponding large area, and obtaining a page-level coarse granularity area; and marking a page-level boundary box based on the region range corresponding to the coarse-grained region, and judging semantic tags of detection regions corresponding to the page-level boundary box to obtain page-level semantic tags of each coarse-grained region.
7. A UI design graph splitting device based on front-end semantic understanding, the device comprising: the first module is used for acquiring a UI design drawing to be segmented, preprocessing the UI design drawing and obtaining a processing design drawing; The second module is used for extracting features by utilizing a pre-trained multi-modal deep neural network based on the processing design diagram to obtain multi-modal features, wherein the multi-modal features comprise visual features, text features and layout features; The third module is used for carrying out component semantic segmentation on the processing design diagram based on the multi-mode characteristics to obtain component data, wherein the component data comprises an original boundary and a component category of each atomic-level component in the processing design diagram; a fourth module, configured to perform rule clustering on the atomic level component based on the original boundary and the component class, and construct a multi-level semantic hierarchy; a fifth module, configured to output a UI design graph segmentation result according to the multi-level semantic hierarchy; When the result of the feature extraction is the visual feature, the feature extraction is performed by using a pre-trained multi-modal depth neural network, including the following steps: loading convolution layers of a pre-training visual encoder into an original visual model, and performing freezing operation on the convolution layers in the original visual model to initialize the convolution layers to obtain an initial visual model, wherein the freezing operation comprises a first number of convolution layers before freezing and a second number of convolution layers after reserving, and the sum of the first number and the second number is the total number of the convolution layers of the initial visual model; training the initial visual model through a design drawing marked with a visual feature label, constructing average classification accuracy according to a training feature drawing output by a penultimate convolution layer in the initial visual model and the visual feature label, and carrying out parameter adjustment on the initial visual model based on the average classification accuracy to obtain a target visual model; And inputting the processing design diagram into the target visual model, and performing feature dimension reduction and coding on a target feature diagram output by a penultimate layer convolution layer in the target visual model through global average pooling and L2 normalization to obtain the visual features.
8. An electronic device comprising a memory storing a computer program and a processor implementing the method of any of claims 1 to 6 when the computer program is executed by the processor.
9. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 6.

Description

UI design diagram segmentation method based on front-end semantic understanding and related equipment Technical Field The invention relates to the technical field of image processing, in particular to a UI design diagram segmentation method and related equipment based on front-end semantic understanding. Background Although existing UI design graph segmentation methods have made some progress in visual level recognition and partitioning, they generally have the following technical problems, which limit their effectiveness and level of intelligence in practical applications: (1) The existing method mostly stays in the recognition of visual features and the division of physical areas, and the meaning and the function of UI elements are difficult to truly understand. For example, they can recognize that a rectangular area is a "button", but cannot understand whether this button is used for "submit form" or "jump page", nor can it distinguish between two visually similar buttons that are semantically different (e.g., a "login" button and a "registration" button). This lack of semantic understanding results in a low level of intelligentization of the segmentation results, making it difficult to meet higher level automation requirements, such as intelligent code generation, intelligent UI test case generation, etc. (2) The adaptability to complicated and diversified UI is poor, namely the segmentation accuracy and the robustness of the existing method tend to be remarkably reduced when facing the UI design drawings with various design styles, complex layout and a large number of custom components. This is because purely visual feature-based methods have difficulty capturing the logic and semantic information underlying the UI design, which is critical to properly partitioning the complex UI. For example, one navigation bar may visually appear in multiple forms (landscape, portrait, folded, etc.), and existing methods may require separate training or rule definition for each form, making versatility difficult. (3) It is difficult to process non-explicit semantic information that the semantics of many UI elements are not entirely determined by their visual manifestations, but may also depend on context, text content, and even potential interactive behavior. Existing methods often have difficulty in making efficient use of these non-explicit semantic information for segmentation. For example, the semantics of a text entry box (search box, user name entry box, comment box) often depend on the label or placeholder text next to it, not just its visual shape. When the existing method is used for processing the situations, additional manual labeling or complex post-processing is often needed, and the workload and the error rate are increased. (4) The granularity of the segmentation result is not matched with the semantics, namely the prior method can only carry out rough-granularity region segmentation (such as dividing a page into a head part, a main body and a bottom part) or can only carry out fine-granularity element recognition (such as recognizing a single button), but is difficult to flexibly switch between different granularities and ensures that the segmentation result is highly matched with the semantic structure of the UI. For example, a "product list" field may contain a plurality of "product cards", each of which in turn contains elements such as "product picture", "product name", "price", etc. Existing methods may have difficulty identifying the semantic region of the "product list" at a time and identifying both semantic sub-regions and elements within it. Disclosure of Invention The embodiment of the invention mainly aims to provide a UI design diagram segmentation method, device, electronic equipment, storage medium and program product based on front-end semantic understanding, and aims to solve at least one problem in the prior art. In order to achieve the above object, an aspect of an embodiment of the present invention provides a method for partitioning a UI design drawing based on front-end semantic understanding, where the method includes: Acquiring a UI design drawing to be segmented, and preprocessing the UI design drawing to obtain a processing design drawing; based on the processing design diagram, performing feature extraction by utilizing a pre-trained multi-modal deep neural network to obtain multi-modal features, wherein the multi-modal features comprise visual features, text features and layout features; based on multi-mode characteristics, carrying out component semantic segmentation on the processing design drawing to obtain component data, wherein the component data comprises an original boundary and a component category of each atomic-level component in the processing design drawing; Based on the original boundary and the component category, carrying out rule clustering on the atomic-level components, and constructing to obtain a multi-level semantic hierarchy; and outputting the UI des