CN-122019908-A - Webpage classification method and system based on Delaunay triangulation double-graph neural network
Abstract
The invention discloses a webpage classification method and a system based on Delaunay triangulation double-graph neural network, belonging to the technical field of webpage classification, wherein the method comprises the steps of obtaining an original graph and node characteristics of webpage classification; the method comprises the steps of constructing an original graph, constructing a shadow graph by adopting a Delaunay triangulation method, respectively carrying out graph convolution operation on the original graph and the shadow graph to obtain local structure information and global relation information of the nodes, fusing the local structure information and the global relation information to generate final node representation, classifying all nodes by adopting a neural network classifier based on the final node representation, and obtaining the final node representation. The method and the device effectively enhance the expression capacity and information transmission efficiency of the graph and improve the classification effect of the webpage on the premise of not damaging the structural semantics of the original graph.
Inventors
- CUI SHICHENG
- LI XINXIN
- HU ZHENYU
- WANG SHUO
- Shi Kailai
- JIANG RUOTIAN
Assignees
- 南京工程学院
Dates
- Publication Date
- 20260512
- Application Date
- 20260128
Claims (8)
- 1. The webpage classification method based on Delaunay triangulation double-graph neural network is characterized by comprising the following steps of: Step S1, acquiring an original image and node characteristics of webpage classification, wherein the original image comprises a plurality of nodes and directed edges among the nodes, the nodes are webpages, the directed edges are hyperlinks among the webpages, and the node characteristics are webpage contents; Step S2, reducing the node characteristics to a two-dimensional space, and constructing a shadow graph by adopting a Delaunay triangulation method; s3, performing graph convolution operation on the original graph and the shadow graph respectively to obtain local structure information and global relation information of the nodes; And S4, fusing the local structure information and the global relation information to generate final node representation, and classifying all nodes by adopting a neural network classifier based on the final node representation.
- 2. The method for classifying web pages based on Delaunay triangulation dual-graph neural network according to claim 1, wherein the step S2 specifically comprises: Step S21, using UMAP algorithm or PCA algorithm to reduce the dimension of all node characteristics to two-dimensional space to obtain two-dimensional coordinates of the node, and putting the two-dimensional coordinates into a coordinate set Wherein, among them, Representing nodes Is used for the two-dimensional coordinates of (c), Representing the number of nodes; step S22, constructing an initial triangle capable of enclosing all elements of the coordinate set, placing the initial triangle into the triangle set, and initializing the edge set of the shadow graph ; Step S23, traversing each element of the coordinate set in turn Locating the current triangle from the triangle set, when the element is Inside the current triangle, then it is in element Dividing the current triangle into three new triangles for the vertexes, and updating the triangle set after local optimization of the new triangles through overturning operation; step S24, removing all triangles containing initial triangle vertexes in the triangle set, and extracting edges from the rest triangle set to obtain an edge set of the shadow graph The final shadow graph is formed in combination with the node set of the original graph.
- 3. The webpage classification method based on Delaunay triangulation dual-graph neural network according to claim 2, wherein in step S23, the new triangle is updated after being locally optimized by the flipping operation, specifically: step e1, element to element Adding edges generated when dividing new triangles for vertexes into queues to be optimized, putting the three divided new triangles into a triangle set, and deleting elements The current triangle is positioned when in positioning; Step e2, taking one edge out of the queue to be optimized Finding shared edges Is of the shape of two triangles And ; Step e3, opposite sides Performing an air-to-external circle test, and if the test fails, forming a side Contravenes the criterion of the external circle of the air and turns over the edge Generating edges Forming two optimized triangles And And delete the original two triangles from the triangle set And Triangle is added And If the test is successful, edges The method meets the criterion of the empty circle and does not process the triangle set; step e4, deleting the edge from the queue to be optimized And e2, outputting the updated triangle set if the queue to be optimized is empty, and returning to the step e2 if the queue to be optimized is not empty.
- 4. The method for classifying web pages based on Delaunay triangulation dual-graph neural network according to claim 1, wherein in step S3, two graph neural networks are used to perform a graph convolution operation on the original graph and the shadow graph, respectively.
- 5. The method for classifying web pages based on Delaunay triangulation double-graph neural network according to claim 1, wherein the step S4 of fusing the local structure information and the global relationship information comprises the steps of splicing and linearly transforming the local structure information and the global relationship information.
- 6. A web page classification system based on Delaunay triangulation dual-graph neural network, comprising: The system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring an original image and node characteristics of webpage classification, the original image comprises a plurality of nodes and directed edges among the nodes, the nodes are webpages, the directed edges are hyperlinks among the webpages, and the node characteristics are webpage contents; the shadow image construction module is used for reducing the node characteristics to a two-dimensional space, and constructing a shadow image by adopting a Delaunay triangulation method, wherein the shadow image and the original image form a double-image structure; The two-stage message transfer module is used for respectively carrying out graph convolution operation on the original graph and the shadow graph to obtain local structure information and global relation information of the nodes; the double fusion module is used for fusing the local structure information and the global relation information to generate a final node representation; And the classification module is used for classifying all the nodes by adopting a neural network classifier based on the final representation of the nodes.
- 7. A computer device comprising a processor and a memory, wherein the processor, when executing a computer program stored in the memory, implements the steps of the Delaunay triangulation dual graph neural network based web page classification method of any one of claims 1-5.
- 8. A computer readable storage medium for storing a computer program which when executed by a processor performs the steps of the method for classifying web pages based on Delaunay triangulation dual graph neural network according to any one of claims 1 to 5.
Description
Webpage classification method and system based on Delaunay triangulation double-graph neural network Technical Field The invention belongs to the technical field of network classification, and particularly relates to a webpage classification method and system based on a Delaunay triangulation double-graph neural network. Background The large-scale website has massive webpage data, webpage classification refers to tasks of dividing the webpage into different categories according to characteristics such as webpage content, theme or link structure, and the like, so that systematic organization and management of the webpage are realized, and more accurate search recommendation or content distribution is supported. However, due to the huge size of websites, the number of web pages is very large, the link relationship is complex, and the link between web pages and the content association are automatically classified by adopting a neural network (Graph Neural Networks, GNNs). However, the existing GNN model generally has the problems of over-smoothing and over-rolling (over-squashing), which makes it difficult for the model to effectively capture the global structure and deep semantic association of the graph, and prevents the further development and application of the web page classification technology in the actual large-scale scene. To alleviate the above problems, some of the prior art proposes reconnecting (GRAPH REWIRING) the graph structure, i.e., modifying the edge structure of the original graph to enhance information propagation capabilities. However, the existing graph reconnection method is mostly based on heuristic rules or random strategies, so that false relations are easily introduced, the semantic structure of the original graph is destroyed, and the interpretability and the robustness of the model are reduced. Disclosure of Invention Aiming at the defects in the prior art, the invention provides a webpage classification method and a webpage classification system based on a Delaunay triangulation dual-graph neural network, which can effectively enhance the graph expression capability and the information transmission efficiency and improve the webpage classification effect on the premise of not damaging the original graph structure semantics. The invention provides the following technical scheme: in a first aspect, a method for classifying web pages based on Delaunay triangulation dual graph neural network is provided, including: Step S1, acquiring an original image and node characteristics of webpage classification, wherein the original image comprises a plurality of nodes and directed edges among the nodes, the nodes are webpages, the directed edges are hyperlinks among the webpages, and the node characteristics are webpage contents; Step S2, reducing the node characteristics to a two-dimensional space, and constructing a shadow graph by adopting a Delaunay triangulation method; s3, performing graph convolution operation on the original graph and the shadow graph respectively to obtain local structure information and global relation information of the nodes; And S4, fusing the local structure information and the global relation information to generate final node representation, and classifying all nodes by adopting a neural network classifier based on the final node representation. Optionally, the step S2 specifically includes: Step S21, using UMAP algorithm or PCA algorithm to reduce the dimension of all node characteristics to two-dimensional space to obtain two-dimensional coordinates of the node, and putting the two-dimensional coordinates into a coordinate set Wherein, among them,Representing nodesIs used for the two-dimensional coordinates of (c),Representing the number of nodes; step S22, constructing an initial triangle capable of enclosing all elements of the coordinate set, placing the initial triangle into the triangle set, and initializing the edge set of the shadow graph ; Step S23, traversing each element of the coordinate set in turnLocating the current triangle from the triangle set, when the element isInside the current triangle, then it is in elementDividing the current triangle into three new triangles for the vertexes, and updating the triangle set after local optimization of the new triangles through overturning operation; step S24, removing all triangles containing initial triangle vertexes in the triangle set, and extracting edges from the rest triangle set to obtain an edge set of the shadow graph The final shadow graph is formed in combination with the node set of the original graph. Optionally, in the step S23, the triangle set is updated after the new triangle is locally optimized through the flipping operation, specifically: step e1, element to element Adding edges generated when dividing new triangles for vertexes into queues to be optimized, putting the three divided new triangles into a triangle set, and deleting elementsThe current triangle is positioned when in po