CN-121979556-A - Design draft code generation and visual error iteration correction method and system based on multi-agent cooperation

CN121979556ACN 121979556 ACN121979556 ACN 121979556ACN-121979556-A

Abstract

The invention discloses a design draft code generation and visual error iteration correction method and system based on multi-agent cooperation, and aims to solve the problems of low generation efficiency, poor visual reduction degree and lack of an automatic correction mechanism of a front-end code in the prior art. The system relies on LANGGRAPH to construct a four-node state diagram, and four processing nodes are automatically initialized, design data acquisition and structural protocol generation, protocol-based component code generation and visual verification and iterative correction are performed on the scaffold of the tandem engineering. And dynamically generating an engineering scaffold through the large language model intelligent body, converting Figma design data into a structured protocol, and performing iterative optimization by utilizing a judger-repairer mode until the error is lower than a threshold value. The invention realizes the high-fidelity automatic delivery from the design manuscript to the code, and outputs the front-end engineering code and the visual reduction analysis report.

Inventors

ZHANG MIN
TU ZHONGYING
WEI QIUHAN
XU YUCHENG
LIN DONG
ZHANG KEYI
LIU BANGBANG
CHAI JIANAN
FANG FANG
ZHAO YING

Assignees

上海人工智能创新中心

Dates

Publication Date: 20260505
Application Date: 20260408

Claims (10)

1. A design draft code generation and visual error iteration correction method based on multi-agent cooperation is characterized by comprising the following steps: dynamically generating engineering scaffolds based on large language model agents; automatically converting Figma design data into a structured front-end component description protocol based on engineering scaffolds; automatically converting a structured front-end component description protocol to front-end code, and And performing front-end code automatic verification and iterative repair.
2. The method of claim 1, wherein dynamically generating engineering scaffolds based on large language model agents comprises: Extracting the item name from Figma links; in the local file system, constructing a directory tree according to the project names to finish deterministic initialization of the directory structure of the working space; And generating engineering scaffolds by using the system prompt word constraint based on the large language model intelligent agent.
3. The method of claim 1, wherein the engineering scaffold-based automatic conversion of Figma design data into a structured front-end component description protocol comprises: performing recursive traversal on the design data, extracting space coordinate information of each node from the design data, and constructing a lightweight hierarchical coordinate representation; the hierarchical coordinate data and the design manuscript thumbnail are used as multi-mode input together, the semantic component hierarchical structure is deduced through a large model, and a component tree is obtained through division; Carrying out single-pass post-processing on the output result of the large model, and simultaneously finishing field standardization, original design node backfilling and file path labeling; performing recursive traversal on the component tree, identifying repeated similar component examples, calling a large model to extract attribute definition and actual content data of each example, and combining a plurality of repeated examples into a single template node; Automatically identifying corresponding pictures in all nodes and storing the pictures to corresponding file paths; A structured front-end component description protocol is obtained.
4. A method according to claim 3, wherein each component in the component tree contains a number of design elements not below a given threshold; the maximum nesting depth of the component tree is not higher than a given threshold; The component tree divides the sub-components by visual space location.
5. The method of claim 1, wherein automatically converting the structured front-end component description protocol into front-end code comprises: traversing the structured front-end component description protocol to obtain a node list; generating a concurrency code for the node list through the concurrency pool, wherein the node type comprises a leaf node component or a non-leaf node component; writing a cache file immediately after each node is generated, so as to prevent interruption and loss of progress; after all nodes are generated, the root component is injected into the entry file.
6. The method of claim 1, wherein the performing front-end code automatic verification and iterative repair comprises: The server starts the intelligent agent to execute the installation of the dependency on the generated engineering, constructs the compiling and automatically restores the grammar errors, and starts the local development server after success; Positioning the actual rendering boundary frame coordinates of each component in the page, calculating a two-dimensional absolute error component by component with Figma design coordinates recorded in a structured front-end component description protocol, and summarizing and calculating global MAE and SAE; outputting a component list with deviation exceeding a threshold value; marking deviation areas on the rendering screenshot and the design manuscript thumbnail respectively through a visualization tool pair deviation component to generate a rendering annotation drawing, a design manuscript annotation drawing and a merging comparison drawing; Judger the agent receives the coordinate data of the deviation component, figma layout metadata, a source code file path, a merging contrast diagram and a history repairing record, and outputs a structural diagnosis result containing error types and repairing instructions; refiner the agent executes directional modification on the source code of the target component according to the diagnosis result, and returns the modification times and the modification abstract of the modification result; after the restoration of all deviation components is completed, closing the current development server, re-executing engineering construction and operability verification, starting the local development server after the verification is passed, and entering the next iteration; if MAE is less than or equal to 3px, executing Git submission and terminating iteration, and generating an interactive HTML report of the design draft and the final rendering result side by side comparison screenshot, the pixel difference thermodynamic diagram and MAE quantized data of each component based on the last round of verification result.
7. The method of claim 2, wherein the directory tree comprises: A my-app configured as a generated application source code catalog; a process configured as an intermediate product catalog; checkpoint, which is configured to breakpoint resume database directory.
8. The method of claim 5, wherein the leaf node component is an atomic level UI element that does not contain sub-components; the non-leaf node assembly is a container-level assembly, and a plurality of sub-assemblies are arranged in a combined mode according to a layout rule.
9. A design draft code generation and visual error iterative correction system based on multi-agent cooperation, characterized by comprising a processor and a memory, wherein the memory stores a computer program, and the processor implements the method of any one of claims 1 to 8 when executing the computer program.
10. The system according to claim 9, wherein the system comprises: an engineering scaffold automatic initialization module configured to dynamically generate an engineering scaffold based on a large language model agent; a design data acquisition and structured protocol generation module configured to automatically convert Figma design data to a structured front end component description protocol based on an engineering scaffold; A component code generation module configured to automatically convert the structured front-end component description protocol into front-end code, and The front-end code automatic verification and iterative repair module is configured to perform front-end code automatic verification and iterative repair.

Description

Design draft code generation and visual error iteration correction method and system based on multi-agent cooperation Technical Field The invention relates to the technical field of front-end development automation, in particular to a design draft code generation and visual error iteration correction method and system based on multi-agent cooperation. Background Along with the acceleration of the iteration speed of the internet product, the requirement of front-end UI development on efficiency is increasingly urgent, and a designer needs to realize 'design restoration' by manual coding of the front-end engineer after finishing the drawing of a UI design draft through Figma. The key challenge of the process is that the design draft describes the interface appearance with visual pixel coordinates and visual attributes, and the code needs to describe the interface behavior and data flow with componentized and structured language, so that obvious semantic gaps exist between the interface behavior and the data flow, the time consumption of manual restoration is long, and the restoration deviation is difficult to quantify. The core goal of the Design-to-Code technology is to make the above semantic gap be closed by an automation means, directly convert the Design into an operable front-end Code, the technical evolution of the Design-to-Code technology is approximately subjected to three stages, namely, the first stage is image direct transcoding based on deep learning, the method of Pix2Code is proposed in paper Pix2Code GeneratingCodefromaGraphicalUserInterfaceScreenshot (arXiv: 1705.07962) in 2017 by Beltramelli, a combined architecture of CNN and LSTM is adopted, a UI screenshot is used as input to generate a markup language Code, and the Code is generated in iOS, The second stage is multi-modal Code generation based on large language model, lauren ç on et al (2024) issues WebSight data set (arXiv: 2403.09029) containing 200 ten thousand sets of webpage screenshot and HTML/CSS Code pairing data for training visual language model to generate functional HTML Code, si et al (2024) puts forward Design2Code benchmark in NAACL2025 conference (ACLAnthology:2025.naacl-long.199), manually collates 484 real webpage test samples, constructs multi-dimensional automatic evaluation system, evaluates and discovers GPT-4V, The third stage is multi-mode transcoding for a real engineering design file, gui et al published Figma Code AutomatingMultimodalDesigntoCodeintheWild in ICLR2026 conference (OpenReviewID: caXZB6bI 31), construct Figma design file and Code pairing dataset containing 3,055 samples (containing 213 high quality samples), and benchmark test 10 main stream open sources and commercial MLLM, and the result shows that commercial model has advantages in visual fidelity, but is still limited to direct mapping of Figma metadata original visual attribute in terms of layout responsiveness and Code maintainability, and can not understand component semantic structure effectively. The current mainstream design draft transcoding technology is mainly divided into two types, namely a plug-in tool based on rules, which is represented by Locofy, anima and builder.io, and a generating tool based on a Large Language Model (LLM), which is represented by FigmaDevMode, figmaMake. The method comprises the steps of running a rule-based plug-in tool in a Figma plug-in form, obtaining node structure data of a design file through FIGMARESTAPI or PluginAPI, traversing a node tree to extract original visual attributes such as x/y coordinates, width and height, filling colors, tracing, font attributes, effects and the like, judging whether the node is a AutoLayout node according to the attributes of the node layoutMode, if yes, attempting to map to a CSSFlexbox layout, otherwise, default generating a position that absolute positioning of an absolute position of a form of a hard coding pixel value is performed, finally writing the attributes into a component template to generate ReactJSX or an HTML/CSS code file, if yes, encoding Figma design draft into a Base64 image through a large language model generating tool, or serializing Figma node metadata into a JSON text, spelling the context of a prompt word, designating a target technical stack, code specification requirements and output formats, executing single-round reasoning through a multi-mode large language model such as GPT-4o, claude3.5Sonnet, gemini1.5 and the like, directly outputting a complete component code, and directly presenting the generated result to a user to determine whether to modify the user manually. The prior art schemes have the obvious defects that firstly, the objective quantitative verification mechanism of visual reduction degree is lacking, the visual difference between a generated page and an original design draft cannot be objectively and quantitatively evaluated after code generation, only manual visual comparison or static code preview is relied on, and