CN-121834537-B - Social robot identification method, terminal and medium based on multi-scale entropy features

CN121834537BCN 121834537 BCN121834537 BCN 121834537BCN-121834537-B

Abstract

The invention relates to the technical field of information processing, and discloses a social robot identification method, a terminal and a medium based on multi-scale entropy characteristics. The method comprises the steps of obtaining social network behavior data of a user to be detected, respectively constructing initial binary time sequences aiming at predefined behavior types, setting a plurality of time windows with different granularities, mapping the initial binary time sequences into coarse granularity behavior sequences with different time scales, calculating corresponding behavior information entropy of the coarse granularity behavior sequences with each time scale to form a behavior entropy set, calculating variation coefficients of entropy values among the different time scales based on the behavior entropy set to quantify dynamic fluctuation characteristics of the user behaviors, fusing the behavior entropy set and the variation coefficients to construct comprehensive feature vectors, inputting the comprehensive feature vectors into a trained supervision classification model, and outputting classification results of the user being a real user or a social robot. The method and the device improve the recognition capability of the complex camouflage behavior of the social network robot.

Inventors

LV LINYUAN
Jiang Shengyue
XU XIAOKE

Assignees

中国科学技术大学
北京师范大学珠海校区

Dates

Publication Date: 20260512
Application Date: 20260311

Claims (7)

1. The social robot identification method based on the multi-scale entropy features is characterized by comprising the following steps of: S1, acquiring social network behavior data of a user to be detected in a life cycle of the user to be detected, and respectively constructing initial binary time sequences aiming at predefined behavior types; the pre-defined behavior types comprise posting, forwarding and replying, wherein the initial binary time sequence is constructed in a mode that if a user generates a specific type of behavior at a specific time point, the value of the time point is recorded as 1, otherwise, the value of the time point is recorded as 0; s2, setting a plurality of time windows with different granularities, and mapping an initial binary time sequence into a plurality of coarse granularity behavior sequences with different time scales; s3, calculating corresponding behavior information entropy of the coarse granularity behavior sequence of each time scale to form a behavior entropy set, wherein the calculation formula of the behavior information entropy is as follows: In the formula, Representing behavior On a time scale The following information entropy; Representing binary states in coarse-grained behavior sequences Experience probability of occurrence; , Global behavior sets for each user; Normalizing the calculated behavior information entropy to limit the value range within 0,1 by traversing K different time scales Build behavior Entropy value set at K different time scales I.e., a set of behavioral entropies; s4, calculating variation coefficients of entropy values among different time scales based on the behavior entropy set to quantify dynamic fluctuation characteristics of user behaviors, wherein a calculation formula of the variation coefficients is as follows: In the formula, Is a behavior Corresponding coefficient of variation; Is a collection Is set in the standard deviation of (2), Is a collection Is the average value of (2); s5, fusing the behavior entropy set and the variation coefficient to construct a comprehensive feature vector, inputting the comprehensive feature vector into a trained supervision classification model, and outputting a classification result of the user as a real user or a social robot.
2. The method for identifying social robots based on multi-scale entropy features according to claim 1, wherein in step S2, the time windows with different granularities are non-overlapping time windows, and the calculation formula for mapping the initial binary time sequence into the coarse granularity behavior sequence with different time scales is as follows: In the formula, Representing coarse-grained behavior states within a kth time window; Is a time scale; a starting reference point for a kth time window; For the initial binary time series, Representing behavior At the time of Which occurs.
3. The social robot recognition method based on the multi-scale entropy features according to claim 2, wherein in step S5, the specific process of constructing the comprehensive feature vector is as follows: For posting Forwarding And reply to Three behavior types, namely extracting the entropy of behavior information under different time scales, and constructing a multi-scale information entropy feature matrix of a user u : In the formula, 、 And Respectively representing three behavior types of posting, forwarding and replying on the mth time scale The following entropy of the behavior information, ; Acquiring posts Forwarding And reply to Coefficient of variation corresponding to three behaviors 、 And Constructing a variation coefficient feature vector of a user u : Entropy feature matrix of multi-scale information And coefficient of variation feature vector And performing splicing aggregation to form a comprehensive feature vector representing the dynamic behavior mode of the user.
4. The method for identifying social robots based on multi-scale entropy features according to claim 1, wherein in step S5, the supervised classification model is a random forest classifier, and prior to inputting the integrated feature vector into the supervised classification model, historical user data is processed by a downsampling technique to equalize category data distribution of the social robots and real users.
5. The social robot recognition method based on the multi-scale entropy features of claim 1 is characterized in that in step S1 to step S4, all behavior types are not intersected with each other in the processes of constructing an initial binary time sequence, performing multi-scale time sequence mapping, extracting multi-scale behavior information entropy and calculating a variation coefficient so as to preserve inherent time sequence evolution features of single behavior types, and in step S5, the behavior entropy sets extracted independently and the variation coefficient are spliced to realize feature level fusion of multi-behavior dimensions.
6. Computer terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, realizes the steps of the social robot recognition method based on multi-scale entropy features according to any of claims 1 to 5.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the social robot recognition method based on multi-scale entropy features according to any of claims 1 to 5.

Description

Social robot identification method, terminal and medium based on multi-scale entropy features Technical Field The invention relates to the technical field of information processing, in particular to a social robot identification method, a terminal and a medium based on multi-scale entropy characteristics. Background With the rapid development of social media and content sharing platforms, the manner and scale of information dissemination has changed dramatically. Online social networks become an important channel for the proliferation of ideas and views and have profound effects on billions of users worldwide. The social robot refers to an algorithm agent deployed on a social media platform and can automatically execute specific behaviors according to a preset human target. Various detection frameworks are proposed by researchers at present, aiming at accurately identifying social robots from real users. Existing social robot detection methods can be broadly divided into three categories, content-based, network structure-based, and behavioral-based. The content-based method mainly depends on text or multimedia information issued by a user, but under the background that the capability of generating content by a large language model is rapidly enhanced, the method is more and more difficult to effectively identify information generated by a robot, and the distinguishing capability of the method is continuously weakened. The network-based method mainly utilizes a connection structure between users, social topological characteristics and the like for identification, but the method has obvious challenges in terms of efficiency and expandability due to high dynamic and huge scale of the social network structure. In contrast, behavior-based methods have been paid more attention in recent years, and by analyzing microscopic dynamic features such as time distribution and interaction rules of a user behavior sequence, the method has stronger interpretability and adaptability, and can realize more robust detection effect on the premise of not depending on content or network structure. However, the current mainstream behavior modeling method generally relies on manually predefined statistical features or fixed granularity time windows, and such static means are highly dependent not only on specific scenes, but also almost completely fail in environments where the user behavior structure is complex or camouflage exists. Especially in a social platform, an attacker continuously evolves a behavior mode, and the traditional method is extremely easy to collapse the characteristics, so that the model cannot identify the social robot with strong camouflage. In addition, existing methods typically conduct behavioral modeling at a single time scale, and cannot simultaneously capture dynamic changes of users between short-term bursts and long-term laws. The coarse granularity represents a multi-layer structure which ignores behavior evolution, so that the classification effect is drastically reduced when the model faces to accounts with irregular rhythms or mixed behaviors, and the main performance bottleneck of the detection system is formed. Meanwhile, although some methods introduce a deep learning framework, the internal features of the method lack transparency, and the output result is difficult to understand or trace by human beings. Such "unexplainability" is especially fatal in security-sensitive scenarios, and once an error occurs, it is not only difficult to troubleshoot the cause, but also may cause serious erroneous judgment. As social robot attack approaches become more complex, a single behavior feature construction strategy cannot meet the actual detection requirements. The existing method generally lacks a unified modeling mechanism for different behavior types (such as posting, forwarding and commenting), so that the model generalization capability is poor, and the method has little capability for cross-platform and cross-context robot detection. Disclosure of Invention In order to solve the technical problems in the prior art, the invention provides a social robot recognition method, a terminal and a medium based on multi-scale entropy characteristics, and the recognition capability of complex camouflage behaviors of a social network robot is remarkably improved by combining independent modeling and time sequence fluctuation analysis of multiple behavior types. In order to achieve the above purpose, the present invention provides the following technical solutions: the invention discloses a social robot identification method based on multi-scale entropy characteristics, which comprises the following steps: S1, acquiring social network behavior data of a user to be detected in a life cycle of the user to be detected, and respectively constructing initial binary time sequences aiming at predefined behavior types; s2, setting a plurality of time windows with different granularities, and mapping an initial binary ti