CN-121996863-A - Webpage generation method and device, electronic equipment and storage medium
Abstract
The present disclosure relates to a web page generation method, apparatus, electronic device, computer readable storage medium, and computer program product. The method comprises the steps of responding to a user instruction, obtaining webpage configuration information and a source picture, generating a hypertext markup language (HTML) code containing picture tags based on a multi-mode model by combining the webpage configuration information and the source picture, generating a multi-mode context based on the webpage configuration information, the source picture, picture text description information of a current picture tag, picture text description information of a previous picture tag and the generated webpage picture corresponding to the previous picture tag according to the sequence of each picture tag in the HTML code, generating the webpage picture corresponding to the current picture tag through the multi-mode model and the multi-mode context, and generating a target webpage based on all the generated webpage pictures and the HTML code after the webpage picture corresponding to each picture tag in the HTML code is obtained. By adopting the method, the webpage generation efficiency is improved.
Inventors
- DENG ZHIJIE
- KOU SIQI
- MA YE
- CHEN QUAN
- JIANG PENG
Assignees
- 北京达佳互联信息技术有限公司
- 上海交通大学
Dates
- Publication Date
- 20260508
- Application Date
- 20251217
Claims (10)
- 1. A web page generation method, comprising: responding to a user instruction, and acquiring webpage configuration information and a source picture; based on a multi-mode model, combining the webpage configuration information and the source picture to generate a hypertext markup language (HTML) code containing a picture tag, wherein the picture tag is associated with picture text description information of a corresponding picture; generating a multi-mode context based on the webpage configuration information, the source picture, the picture text description information of the current picture tag, the picture text description information of the previous picture tag and the generated webpage picture corresponding to the previous picture tag according to the sequence of the picture tags in the HTML code, and generating the webpage picture corresponding to the current picture tag through the multi-mode model and the multi-mode context; and after obtaining the webpage pictures corresponding to each picture tag in the HTML code, generating a target webpage based on all the generated webpage pictures and the HTML code.
- 2. The web page generation method according to claim 1, wherein the generating the hypertext markup language HTML code including the picture tag based on the multimodal model by combining the web page configuration information and the source picture comprises: analyzing webpage configuration information based on a multi-mode model to obtain visual content display rules and webpage design requirement information, and analyzing visual content of the source picture to obtain source picture characteristics; Based on an autoregressive generation mechanism, fusing the visual content display rule, the webpage design requirement information and the source picture characteristics, and generating HTML codes containing picture tags by word unit prediction.
- 3. The method according to claim 1, wherein the generating the multi-modal context based on the web page configuration information, the source picture, the picture text description information of the current picture tag, the picture text description information of the previous picture tag, and the generated web page picture corresponding to the previous picture tag in the order of the picture tags in the HTML code includes: Sequentially extracting first picture text description information corresponding to the current picture tag and second picture text description information associated with the previous picture tag according to the arrangement sequence of each picture tag in the HTML code; And splicing the webpage pictures generated by the webpage configuration information, the visual content display rule and the webpage design requirement information, the source picture, the first picture text description information, the second picture text description information and the previous picture label into a multi-mode context based on a preset splicing sequence.
- 4. The method for generating a web page according to claim 3, wherein the generating the web page picture corresponding to the current picture tag through the multimodal model and the multimodal context includes: And carrying out feature extraction and feature fusion on the multi-modal context through the multi-modal model, constructing a multi-modal context feature vector, and generating a webpage picture corresponding to the current picture tag based on the multi-modal context feature vector.
- 5. The method for generating a web page according to claim 4, wherein the feature extraction and feature fusion are performed on the multi-modal context through the multi-modal model, and constructing a multi-modal context feature vector includes: discretizing the text information in the multi-mode context through a text word segmentation device in the multi-mode model to obtain structural text characteristics, and extracting visual characteristics of a source picture in the multi-mode context and a previously generated webpage picture through an understanding editor to obtain visual characteristics; And carrying out feature fusion on the structural text features and the visual features through a multi-mode self-attention mechanism, extracting association relations among different mode features, and constructing multi-mode context feature vectors.
- 6. The method for generating a web page according to claim 4, wherein generating a web page picture corresponding to the current picture tag based on the multi-modal context feature vector comprises: analyzing and processing the multi-mode context feature vector through a generation expert layer of the multi-mode model to obtain a condition coding vector; And denoising the conditional encoding vector step by step based on a speed prediction mechanism in the multi-mode model, and generating a webpage picture corresponding to the current picture label.
- 7. The web page generation method of claim 1, wherein before the generating the multi-modal context based on the web page configuration information, the source picture, the picture text description information of the current picture tag, the picture text description information of the previous picture tag, and the corresponding generated web page picture of the previous picture tag, the method further comprises: If the current picture tag is the first picture tag in the HTML code, generating an initial multi-mode context based on the webpage configuration information, the source picture and picture text description information associated with the current picture tag; And generating a first webpage picture corresponding to the first picture tag through the initial multi-mode context and the multi-mode model.
- 8. The web page generation method according to claim 1, wherein the generating a target web page based on all the generated web page pictures and the HTML code includes: according to the positions of the picture labels contained in the HTML codes, associating all the generated webpage pictures with the HTML codes; calling a webpage rendering engine, loading the HTML codes and all associated webpage pictures, analyzing the structural rules of the HTML codes and rendering visual elements in the webpage pictures; and integrating all the visual elements obtained by rendering according to the webpage design requirement information, and outputting a visual target webpage.
- 9. A web page generation apparatus, comprising: the acquisition unit is configured to respond to the user instruction and acquire webpage configuration information and a source picture; The first generation unit is configured to execute a hypertext markup language (HTML) code containing a picture tag based on a multimodal model and combining the webpage configuration information and the source picture, wherein the picture tag is associated with picture text description information of a corresponding picture; A second generating unit configured to generate a multi-modal context based on the web page configuration information, the source picture, the picture text description information of the current picture tag, the picture text description information of the previous picture tag, and the generated web page picture corresponding to the previous picture tag in the order of the picture tags in the HTML code, and generate a web page picture corresponding to the current picture tag through the multi-modal model and the multi-modal context; And the third generation unit is configured to generate a target webpage based on all generated webpage pictures and the HTML code after obtaining the webpage pictures corresponding to the picture tags in the HTML code.
- 10. An electronic device, comprising: A processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the web page generation method of any of claims 1 to 8.
Description
Webpage generation method and device, electronic equipment and storage medium Technical Field The present disclosure relates to the field of computer science and technology, and more particularly, to a web page generation method, apparatus, electronic device, computer readable storage medium, and computer program product. Background With the development of computer technology, technology for automatically generating web pages by using an artificial intelligence model has emerged. The technique can automatically generate HTML code of a web page and required visual content in the web page, such as pictures in the web page, and the like. Conventionally, a web page developer invokes a large language model (simply referred to as a large model) and issues a user instruction thereto, and the large model responds to the user instruction to generate complete web page HTML code from web page configuration information in the user instruction. The complete HTML code contains an < img > tag (a tag used to insert pictures) that is typically populated with the picture literal description information required to generate the picture using alt attributes (alternate text attributes). Then, inputting the picture text description information and the source product graph in each < img > tag in the HTML code into an image editing model, editing the source product graph according to the image editing model, generating a series of pictures corresponding to each < img > tag in the HTML code, finally embedding the generated pictures into the generated HTML code, and rendering to form a complete multi-mode webpage. However, in the conventional technology, the web page generation process is staged, so that consistency of visual display content in the generated web page is poor, and the workflow of web page generation is complex, so that the web page generation efficiency is low. Disclosure of Invention The present disclosure provides a web page generation method, apparatus, electronic device, computer readable storage medium, and computer program product. The technical scheme of the present disclosure is as follows: according to a first aspect of an embodiment of the present disclosure, there is provided a web page generating method, including: responding to a user instruction, and acquiring webpage configuration information and a source picture; based on a multi-mode model, combining the webpage configuration information and the source picture to generate a hypertext markup language (HTML) code containing a picture tag, wherein the picture tag is associated with picture text description information of a corresponding picture; generating a multi-mode context based on the webpage configuration information, the source picture, the picture text description information of the current picture tag, the picture text description information of the previous picture tag and the generated webpage picture corresponding to the previous picture tag according to the sequence of the picture tags in the HTML code, and generating the webpage picture corresponding to the current picture tag through the multi-mode model and the multi-mode context; and after obtaining the webpage pictures corresponding to each picture tag in the HTML code, generating a target webpage based on all the generated webpage pictures and the HTML code. In one embodiment, the generating, based on the multimodal model, the hypertext markup language HTML code including the picture tag by combining the web page configuration information and the source picture includes: analyzing webpage configuration information based on a multi-mode model to obtain visual content display rules and webpage design requirement information, and analyzing visual content of the source picture to obtain source picture characteristics; Based on an autoregressive generation mechanism, fusing the visual content display rule, the webpage design requirement information and the source picture characteristics, and generating HTML codes containing picture tags by word unit prediction. In one embodiment, the generating the multi-modal context according to the order of the picture tags in the HTML code based on the web page configuration information, the source picture, the picture text description information of the current picture tag, the picture text description information of the previous picture tag, and the generated web page picture corresponding to the previous picture tag includes: Sequentially extracting first picture text description information corresponding to the current picture tag and second picture text description information associated with the previous picture tag according to the arrangement sequence of each picture tag in the HTML code; And splicing the webpage pictures generated by the webpage configuration information, the visual content display rule and the webpage design requirement information, the source picture, the first picture text description information, the second picture text de