Search

CN-122021561-A - News text generation method based on text style

CN122021561ACN 122021561 ACN122021561 ACN 122021561ACN-122021561-A

Abstract

The invention discloses a news text generation method based on text styles, which relates to the technical field of false news text research, and comprises the following steps of S1, replacing a sentence with a credible but false information by using a seq2seq model through a loading language technology, and S2, carrying out feature analysis and detection on the generated text by using a pre-training language model. The invention is based on the obviously different line styles and potential intentions, a new training corpus is generated through the seq2seq technology, the training sample replaces the text styles and the potential intended sentences of the real news with trusted but false information, and a false news detection method based on the text styles is provided.

Inventors

  • GUO HENG

Assignees

  • 河北工程大学

Dates

Publication Date
20260512
Application Date
20240131

Claims (10)

  1. 1. A method for generating news text based on text style, comprising: S1, replacing a sentence with a true news highlight by using a seq2seq model through a loading language technology to obtain trusted but false information; S2, performing feature analysis and detection on the generated text by using a pre-training language model; In S1, the seq2seq model comprises two parts, namely an encoder and a decoder, the encoder and the decoder consist of RNN neural networks, the main function of the encoder is to encode an indefinite length input into a fixed size feature vector c and then transmit the feature vector c to the decoder, the main function of the decoder is to output an indefinite length sequence in turn by using the feature vector c, for a text generation task, one text sequence x= [ X1, X2, ], xm is given, another text sequence y= [ Y1, Y2, ], and the training goal of the text generation model based on the seq2seq frame is to maximize the probability of generating the text sequence Y given the input text sequence X, i.e., p (X1, X2, xm|y1, Y2, ym).
  2. 2. The method for generating news text based on text style according to claim 1, wherein the encoder encodes the input text sequence X using an RNN neural network f enc (), and uses the hidden layer output hm of the RNN neural network at the time m as the feature vector c of the input sequence, specifically using the following formula: c=h m Wherein, f enc (·) is the RNN neural network, and the parameter θ dec ,e x is the word vector input at time t.
  3. 3. The method according to claim 2, wherein the decoder uses another RNN neural network f dec (), a feature vector c and a feed-forward neural network g (), to sequentially generate the target text sequence Y, the decoder first initializes the neural network f dec (), then uses the output value yt-1 at time t-1 to generate the output yt at time t, let st represent the hidden layer output of the network f dec (), and ot (0, 1) |v|) is the posterior probability of all words in the vocabulary: s 0 =c s t =f dec (s t-1 ,e yt-1 ,θ dec ) y t =arg maxO t Wherein f dec (·) is a decoding recurrent neural network, g (·) is a feedforward neural network, its activation function is a softmax function, θ dec and θ 0 are parameters in f dec (·) and g (·) respectively, ey-1 is a word vector of t-1, y0 and yn represent a special sign < SOS > for the beginning and a special sign < eos > for the end of i.
  4. 4. The method for generating news text based on text style according to claim 3, wherein in S2, the generated text is characterized in that a pre-training language model XLNet is used to obtain word vectors with propaganda skill style, deep semantic information is obtained through a BiLSTM bi-directional gating unit, and finally, different feature weights are given according to importance of features by utilizing an attribute mechanism, and text authenticity detection is performed.
  5. 5. The method for generating news text based on text style according to claim 4, wherein in the step S2, the generated text is subjected to feature analysis and detection through a built false news detection model, and the false news detection model comprises a full connection layer, a Attetion layer, a BiLSTM layer, a XLNet layer and an Input layer.
  6. 6. The method for generating news text based on text style according to claim 5, wherein the XLNet layer adds the position information gθ to the objective function of the AR model by using a dual-stream self-attention mechanism, specifically as follows:
  7. 7. The method for generating news text based on text style according to claim 6, wherein the objective function of the AR model is updated as a "double stream" formula, wherein double stream "is Querystream and Contentstream, querystream can see the position information of the current word, but not the content information thereof, contentstream can see the content information of the current word and the position information thereof, and the update formula of" double stream "is as follows: wherein g is the Query hidden state, h is the content hidden state, m is the number of layers of XLNet, Q is the Query vector Query, K is the vector Key to be checked, V is the content vector Value, and Q, K, V obtains the corresponding matrix through Linear.
  8. 8. The method for generating news text based on text style according to claim 7, wherein a false news detection system is designed and implemented according to the false news detection model, and the false news detection system is developed locally under a Win10 operating system, and uses Python as a system development language and pytorch11.12.1 as a neural network framework.
  9. 9. The text style based news text generation method of claim 8, wherein the false news detection system includes: And the data crawling module: Acquiring news data and news comment data on a social platform through a crawler frame, and storing the news data in a database, wherein the news data comprises news titles, news release time, news contents, comment contents corresponding to the news and comment time; False news detection module: Obtaining text characteristics with propaganda sentences by merging XLNet which are already trained, then obtaining higher-level characteristics of the text by a bidirectional long-short-time memory network, finally giving different weights by a self-attention mechanism, and finally outputting a detection result; and an analysis result visualization module.
  10. 10. The text style based news text generation method of claim 8, wherein the false news detection system further comprises: User management sub-module: The users in the user management module are divided into an administrator account and a common account, wherein the common account can execute basic operation; The administrator account can operate all accounts by setting the administrator account as an internal account, and after the user is registered, the related information of the user is stored in a database table; And a detection and recording module: the module mainly stores news detection records, and the detection record information comprises news headlines, detection dates, news release periods and detection results.

Description

News text generation method based on text style Technical Field The invention relates to the technical field of false news text research, in particular to a news text generation method based on text styles. Background With the development of social media diversification and network technology, smart phones and social media platforms are exponentially growing, and meanwhile, false information fills the whole network, so that adverse effects are caused for individuals and society, and the propagation of the false information is strived. Some approaches to machine learning, while exhibiting superior performance in discrimination, are significantly deficient in the extraction of text style features and potential intent. Through retrieval, the patent with the Chinese patent application number of CN202311235587.X discloses a method suitable for detecting malicious comments and false news simultaneously, which is characterized in that an interpretable parameter set which is possibly effective in single malicious comments or false news detection and a corresponding classification model set are selected for two-by-two combination, training and verification are carried out based on a combined set C formed by the existing malicious comment data set A and the false news data set B, and an evaluation index of evaluation value = accuracy rate + precision rate + recall rate + F1 is adopted for screening results, so that an optimal combination of the interpretable parameter and the classification model is obtained. The detection method in the patent has the defect that text style characteristics and potential intention cannot be extracted well, and the detection method needs to be improved. Disclosure of Invention The invention aims to solve the defects in the prior art, and provides a news text generation method based on text styles. In order to achieve the above purpose, the present invention adopts the following technical scheme: A news text generation method based on text style includes: S1, replacing a sentence with a true news highlight by using a seq2seq model through a loading language technology to obtain trusted but false information; S2, performing feature analysis and detection on the generated text by using a pre-training language model; In S1, the seq2seq model comprises two parts, namely an encoder and a decoder, the encoder and the decoder consist of RNN neural networks, the main function of the encoder is to encode an indefinite length input into a fixed size feature vector c and then transmit the feature vector c to the decoder, the main function of the decoder is to output an indefinite length sequence in turn by using the feature vector c, for a text generation task, one text sequence x= [ X1, X2, ], xm is given, another text sequence y= [ Y1, Y2, ], and the training goal of the text generation model based on the seq2seq frame is to maximize the probability of generating the text sequence Y given the input text sequence X, i.e., p (X1, X2, xm|y1, Y2, ym). Preferably, the encoder encodes the input text sequence X by using an RNN neural network f enc (·) and takes the hidden layer output hm of the RNN neural network at the moment m as the feature vector c of the input sequence, specifically adopting the following formula: c=hm Wherein, f enc (·) is the RNN neural network, and the parameter θ dec,ex is the word vector input at time t. Preferably, the decoder uses another RNN neural network f dec (-), a feature vector c and a feedforward neural network g (-) to sequentially generate the target text sequence Y, the decoder first initializes the neural network f dec (-) with the feature vector c and then uses the output value yt-1 at time t-1 to generate the output yt at time t, let st represent the hidden layer output of the network f dec (-), and ot (0, 1) v| is the posterior probability of all words in the vocabulary: s0=c st=fdec(st-1,eyt-1,θdec) yt=arg maxOt Wherein f dec (·) is a decoding recurrent neural network, g (·) is a feedforward neural network, its activation function is a softmax function, θ dec and θ 0 are parameters in f dec (·) and g (·) respectively, ey-1 is a word vector of t-1, y0 and yn represent a special sign < SOS > for the beginning and a special sign < eos > for the end of i. In the step S2, the generated text is subjected to feature analysis and detection in such a way that a pre-training language model XLNet is used for acquiring word vectors with propaganda skill styles, deep semantic information is acquired through a BiLSTM bidirectional gating unit, and finally different feature weights are given according to the importance of the features by utilizing an attribute mechanism, so that the text authenticity is detected. In the step S2, the generated text is subjected to feature analysis and detection through a built false news detection model, wherein the false news detection model comprises a full connection layer, attetion layers, biLSTM layers, XLNet layers and Input layers. P