Search

CN-117312489-B - Modeling method for characterization model of network community group and user

CN117312489BCN 117312489 BCN117312489 BCN 117312489BCN-117312489-B

Abstract

The invention aims to provide a graph pre-training method for unified modeling of a network community group and a user, which comprises the steps of extracting semantic information and structural information from two different objects at a community level and a user level, constructing a heterogeneous graph model with hierarchical characteristics, wherein in the pre-training process of the model, the self-supervision tasks which can adapt to the objects at different levels are introduced, the training model can represent the characteristics of the objects at different levels and the interrelationships among the levels, and in the pre-training process of the model, the model can promote the mutual representation among the different levels by designing a hierarchical iterative training method, so that various complex inference tasks taking the community and the user as target objects are completed.

Inventors

  • WEI ZHONGYU
  • Zhang xinnong

Assignees

  • 复旦大学

Dates

Publication Date
20260512
Application Date
20230825

Claims (9)

  1. 1. A modeling method of a network community group and a user characterization model is characterized by comprising the steps of modeling a community level and a user level in a cooperative manner, wherein the method comprises the following steps: Screening the network community group by a user belonging to the network community group; Extracting semantic information of each network community group and semantic information of each user individual in the network community group, extracting structure information among different network community groups, structure information among different user individuals, and extracting structure information among the network community groups and users, wherein the method comprises the following steps: constructing node types, wherein nodes in a community level are characterized by network community clusters, and nodes in a user level are characterized by individual users; Constructing node relations, wherein for relations among network community groups, hyperlinks among communities are used as construction rules, and for relations among individuals, social interactions of users are used as construction rules; The method comprises the steps of providing semantic information to nodes, wherein for the nodes in a community level, the representation of the semantic information is extracted from texts with super-link edges, carrying out LIWC and other dictionary information statistics on all original texts containing the super-links to be used as macroscopic semantic information of network community groups; Extracting semantic information of different network community groups and structural information among the different network community groups; Extracting semantic information of different users and structural information among different users from users in the network community group according to individuals; extracting structural information between the network community group and the user; And integrating all semantic information and all structural information to construct a heterogeneous social network comprising a community level and a user level.
  2. 2. The method of claim 1, wherein the steps of partitioning and information extraction for different levels comprise: for each network community group, selecting text characteristics in communities as semantic information of communities, and for different network community groups, selecting interaction modes among the network community groups as structural information among communities; For each user, selecting text features issued by the users as semantic information of the users, and selecting social interaction modes among the users as structural information among the users; And regarding the attribution states of the users for different network community groups as structural information between a community level and a user level.
  3. 3. A web community cluster and user characterization model constructed by the modeling method of any of claims 1 or 2.
  4. 4. A method of pre-training a web community cluster and user characterization model as claimed in claim 3 wherein the method comprises: Covering network community group nodes and user nodes on the heterogeneous graph respectively, and outputting the generation of covered positions as conceptual representation of covered nodes through a node attribute encoder; Covering various relations formed by community nodes and user nodes on the heterogeneous graph, predicting the probability of edges between the nodes of the unconnected edge pair node pairs through an edge generation encoder, and expressing structural information on the heterogeneous graph through a negative comparison estimation optimization guidance model.
  5. 5. The training apparatus of claim 4 applied to a web community cluster and user characterization model, said apparatus comprising: the heterogeneous diagram construction unit is used for constructing a social network heterogeneous diagram according to the existing semantic information and structural information in the data and taking the social network heterogeneous diagram as an input structure for pre-training of the characterization model; The structure coding unit is used for coding and outputting the relation between nodes on the graph through a node attribute coder and an edge generation coder; The self-supervision learning unit generates and learns the covered nodes and edges on the graph through two types of self-supervision tasks, namely node attribute reconstruction and edge reconstruction; And the layering iteration unit is used for generating and learning the community level and the user level respectively through iterative training of different node types.
  6. 6. The model training apparatus of claim 5 wherein the step of performing the hierarchical iteration unit comprises: In the community level stage, sampling is carried out by taking a community node as a target node, and self-supervision task training related to communities is correspondingly used in the self-supervision learning unit; After training a certain number of rounds in the current stage, the current model state is saved, the next stage of target node is used for training, the operation is iterated in the whole training process, and each iteration always starts from the user level to the community level.
  7. 7. A method for task assessment of a web community cluster and user characterization model according to claim 3, wherein the method comprises a user community attribution task with a user as an inference object and a community aggressiveness detection task with a community as an inference object.
  8. 8. The method for task assessment according to claim 7, wherein the step of "assigning tasks to communities for which users are inferred" includes predicting a community list to which users belong based on input user coding features.
  9. 9. The method for task assessment according to claim 7, wherein the step of "detecting a task for a community attack on which the community is an inference object" comprises predicting an attack tag class of the community based on the input community coding feature.

Description

Modeling method for characterization model of network community group and user Technical Field The invention relates to the field of computers, in particular to a graph pre-training method for unified modeling of network community clusters and users. Background Web community clusters play an increasingly important role in modern society, and in order to study the population characteristics and representation information in different web community clusters, researchers construct various models by taking the whole web community cluster or individual users as study objects, and design corresponding downstream tasks to evaluate the models, which all achieve excellent results in respective fields. Current research methods often model by setting a single task object and discussing the performance of the model in the downstream task associated with that object, ignoring the complex relationship web community clusters that inherently exist between different objects in the web community cluster. Disclosure of Invention The embodiment of the specification aims to provide a graph pre-training method for unified modeling of a network community group and a user. The collaborative modeling of community-level and user-level provided by embodiments of the present description may provide a unified framework for application to a variety of downstream task goals. Based on the collaborative modeling method, the embodiment of the specification provides a hierarchical iterative pre-training method, and a plurality of self-supervision learning tasks are introduced at different stages, so that the collaborative understanding of a training model to a user level and a community level can be realized, and the representation of the two levels can be better learned. In order to achieve the above purpose, the embodiment of the present disclosure provides a collaborative modeling method for a community level and a user level, where the collaborative modeling method includes performing prior division on a web community group, extracting semantic information of different web community groups and structure information between different web community groups, extracting semantic information of different users and structure information between different users according to individuals in the web community group, extracting structure information between the web community groups and the users, and integrating all semantic information and structure information to construct a heterogeneous social network including a community level and a user level. In one embodiment, the steps of dividing and extracting information of different levels comprise selecting text features inside communities as semantic information of communities for each network community group, selecting interaction modes among the network community groups for different network community groups to serve as structural information among communities, selecting text features issued by users for each user to serve as semantic information of the users, selecting social interaction modes among the users to serve as structural information among the users for different users, and selecting attribution states of the users for different network community groups to serve as structural information among community levels and user levels for different network community groups and users. The embodiment of the specification also provides a model pre-training method, which comprises a heterogeneous graph construction unit, a structure coding unit, a self-supervision learning unit and a layering iteration unit, wherein the heterogeneous graph construction unit is used for constructing a social network heterogeneous graph according to semantic information and structural information existing in data and is used as an input structure for representing model pre-training, the structure coding unit is used for coding and outputting the relation between nodes on the graph through a node attribute coder and an edge generation coder, the self-supervision learning unit is used for generating and learning covered nodes and edges on the graph through two types of self-supervision tasks of node attribute reconstruction and edge reconstruction, and the layering iteration unit is used for generating and learning community levels and user levels respectively through iterative training of different node types. In one embodiment, the method comprises a heterogeneous graph construction unit, wherein the heterogeneous graph construction unit is used for constructing a social network heterogeneous graph comprising two node types and three side types through semantic information and structural information of a community level and a user level. In one embodiment, the method comprises a structure coding unit and a self-supervision learning unit, wherein the structure coding unit and the self-supervision learning unit cover network community group nodes and user nodes on the heterogeneous graph respectively, the node attribute