Search

CN-116245109-B - Text processing method and device and electronic equipment

CN116245109BCN 116245109 BCN116245109 BCN 116245109BCN-116245109-B

Abstract

The disclosure provides a text processing method, and relates to the technical fields of content auditing, sensitive word matching and the like. The method comprises the steps of obtaining a text to be issued and a reference word list comprising a plurality of sensitive words and first weights corresponding to the sensitive words, traversing the text to be issued based on the reference word list, determining a target sensitive word set contained in the text to be issued, processing the text to be issued by using a text processing model under the condition that the number of the target sensitive words in the target sensitive word set is smaller than a first threshold value and the number of the head target sensitive words is smaller than a second threshold value, and determining the sensitivity probability of the text to be issued according to the first weights and the sensitivity probability of the target sensitive words. Thereby improving the reliability of text processing.

Inventors

  • ZHANG HUAZHENG
  • BAO CHENFU
  • WANG YANG
  • Lv Zhonghou
  • HUANG YINGREN
  • TIAN WEIJUAN
  • Gan Yixian
  • GAO MENGHAN

Assignees

  • 北京百度网讯科技有限公司

Dates

Publication Date
20260505
Application Date
20221223

Claims (12)

  1. 1. A text processing method, the method comprising: Acquiring a text to be published and a reference word list, wherein the reference word list comprises a plurality of sensitive words and first weights corresponding to the sensitive words; traversing the text to be issued based on the reference word list, and determining a target sensitive word set contained in the text to be issued; processing the text to be issued by using a text processing model under the condition that the number of target sensitive words in the target sensitive word set is smaller than a first threshold value and the number of head target sensitive words is smaller than a second threshold value, and determining the sensitivity probability of the text to be issued, wherein the head target sensitive words are target sensitive words with a first weight greater than a third threshold value; Determining whether to perform sealing processing on the text to be issued according to the first weight of each target sensitive word and the sensitive probability; the determining whether to perform the blocking processing on the text to be issued according to the first weight of each target sensitive word and the sensitive probability includes: Determining the ratio of the first weight corresponding to each target sensitive word to the sum of the first weights corresponding to all target sensitive words in the target sensitive word set as the coefficient of the first weight corresponding to each target sensitive word, correcting the sensitivity probability based on the sum of the products of the first weights and the coefficients of the target sensitive words, and performing sealing inhibition processing on the text to be issued when the corrected sensitivity probability is larger than a fourth threshold value; Or alternatively And increasing the sensitivity probability when the maximum value in the first weight corresponding to the target sensitive word is in a first preset range and the sensitivity probability is larger than a fifth threshold value, and decreasing the sensitivity probability when the maximum value in the first weight corresponding to the target sensitive word is in a second preset range and the sensitivity probability is larger than the fifth threshold value, wherein the minimum value of the first preset range is larger than or equal to the maximum value of the second preset range and the maximum value of the first preset range is smaller than the third threshold value, and performing the forbidden processing on the text to be issued when the corrected sensitivity probability is larger than a fourth threshold value.
  2. 2. The method of claim 1, further comprising: Acquiring an update request of a first weight of the sensitive word, wherein the update request comprises the sensitive word to be updated and the type of the sensitive word to be updated; When the type of the sensitive word to be updated is a missing data type and the reference word list does not contain the sensitive word to be updated, adding the sensitive word to be updated into the reference word list, and setting a first weight corresponding to the sensitive word to be updated as a default weight; When the type of the sensitive word to be updated is a missing data type and the reference word list contains the sensitive word to be updated, increasing a first weight corresponding to the sensitive word to be updated in the reference word list; And reducing the first weight corresponding to the sensitive word to be updated in the reference word list under the condition that the error type of the sensitive word to be updated is the false call data type and the reference word list contains the sensitive word to be updated.
  3. 3. The method of claim 2, further comprising: The update request also comprises a second weight corresponding to the sensitive word to be updated, and the first weight corresponding to the sensitive word to be updated in the reference word list is updated by using the second weight.
  4. 4. The method of claim 1, further comprising: and deleting any sensitive word in the reference word list under the condition that the first weight corresponding to any sensitive word and the updated first weight are smaller than a sixth threshold value in a preset time period.
  5. 5. The method of claim 1, further comprising: And performing sealing processing on the text to be issued under the condition that the number of the target sensitive words is larger than a first threshold value or the number of the head target sensitive words is larger than a second threshold value.
  6. 6. A text processing apparatus, the apparatus comprising: the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring a text to be published and a reference word list, and the reference word list comprises a plurality of sensitive words and first weights corresponding to the sensitive words; The determining module is used for traversing the text to be issued based on the reference word list and determining a target sensitive word set contained in the text to be issued; The prediction module is used for processing the text to be issued and determining the sensitivity probability of the text to be issued under the condition that the number of target sensitive words in the target sensitive word set is smaller than a first threshold and the number of head target sensitive words is smaller than a second threshold, wherein the head target sensitive words are target sensitive words with a first weight larger than a third threshold; The processing module is used for determining whether to perform sealing and forbidden processing on the text to be issued according to the first weight of each target sensitive word and the sensitive probability; The processing module is specifically configured to determine a ratio of a first weight corresponding to each target sensitive word to a sum of first weights corresponding to all target sensitive words in the target sensitive word set as a coefficient of the first weight corresponding to each target sensitive word; correcting the sensitivity probability based on the sum of products of the first weights and coefficients of the target sensitive words, and performing sealing and forbidden processing on the text to be issued under the condition that the corrected sensitivity probability is larger than a fourth threshold value; Or alternatively When the maximum value in the first weight corresponding to the target sensitive word is in a first preset range and the sensitivity probability is larger than a fifth threshold value, increasing the sensitivity probability; reducing the sensitivity probability when the maximum value in the first weight corresponding to the target sensitivity word is in a second preset range and the sensitivity probability is larger than a fifth threshold, wherein the minimum value of the first preset range is larger than or equal to the maximum value of the second preset range and the maximum value of the first preset range is smaller than the third threshold; and under the condition that the corrected sensitivity probability is larger than a fourth threshold value, performing sealing and forbidden processing on the text to be issued.
  7. 7. The apparatus of claim 6, further comprising an update module to: Acquiring an update request of a first weight of the sensitive word, wherein the update request comprises the sensitive word to be updated and the type of the sensitive word to be updated; When the type of the sensitive word to be updated is a missing data type and the reference word list does not contain the sensitive word to be updated, adding the sensitive word to be updated into the reference word list, and setting a first weight corresponding to the sensitive word to be updated as a default weight; When the type of the sensitive word to be updated is a missing data type and the reference word list contains the sensitive word to be updated, increasing a first weight corresponding to the sensitive word to be updated in the reference word list; And reducing the first weight corresponding to the sensitive word to be updated in the reference word list under the condition that the error type of the sensitive word to be updated is the false call data type and the reference word list contains the sensitive word to be updated.
  8. 8. The apparatus of claim 7, wherein the update module is further to: The update request also comprises a second weight corresponding to the sensitive word to be updated, and the first weight corresponding to the sensitive word to be updated in the reference word list is updated by using the second weight.
  9. 9. The apparatus of claim 7, wherein the update module is further to: and deleting any sensitive word in the reference word list under the condition that the first weight corresponding to any sensitive word and the updated first weight are smaller than a sixth threshold value in a preset time period.
  10. 10. The apparatus of claim 7, wherein the processing module is further to: And performing sealing processing on the text to be issued under the condition that the number of the target sensitive words is larger than a first threshold value or the number of the head target sensitive words is larger than a second threshold value.
  11. 11. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
  12. 12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.

Description

Text processing method and device and electronic equipment Technical Field The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of content wind control, sensitive word detection and the like, and specifically relates to a text processing method, a text processing device and electronic equipment. Background With the advent of the internet era, massive network resources have made people more and more convenient and faster for daily life, social communication, learning and working, etc. However, people enjoy the convenience brought by the Internet, and meanwhile, a plurality of people release bad information by using the Internet, so that a plurality of bad effects are caused. Therefore, auditing and filtering the content to be released is important. Disclosure of Invention The disclosure provides a text processing method, a text processing device and electronic equipment. According to an aspect of the present disclosure, there is provided a text processing method including: acquiring a text to be published and a reference word list, wherein the reference word list comprises a plurality of sensitive words and first weights corresponding to each sensitive word; Traversing the text to be published based on the reference word list, and determining a target sensitive word set contained in the text to be published; Under the condition that the number of target sensitive words in the target sensitive word set is smaller than a first threshold value and the number of head target sensitive words is smaller than a second threshold value, processing a text to be published by using a text processing model, and determining the sensitivity probability of the text to be published, wherein the head target sensitive words are target sensitive words with a first weight larger than a third threshold value; and determining whether to perform sealing and forbidden processing on the text to be published according to the first weight and the sensitivity probability of each target sensitive word. According to another aspect of the present disclosure, there is provided a text processing apparatus including: the acquisition module is used for acquiring a text to be published and a reference word list, wherein the reference word list comprises a plurality of sensitive words and first weights corresponding to each sensitive word; The determining module is used for traversing the text to be published based on the reference word list and determining a target sensitive word set contained in the text to be published; The prediction module is used for processing the text to be published and determining the sensitivity probability of the text to be published by utilizing the text processing model under the condition that the number of target sensitive words in the target sensitive word set is smaller than a first threshold and the number of head target sensitive words is smaller than a second threshold, wherein the head target sensitive words are target sensitive words with the first weight being larger than a third threshold; the processing module is used for determining whether to perform sealing and forbidden processing on the text to be published according to the first weight and the sensitivity probability of each target sensitive word. According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the above embodiments. According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method according to the above-described embodiments. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification. Drawings The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein: Fig. 1 is a schematic flow chart of a text processing method according to an embodiment of the disclosure; FIG. 2 is a flow chart of another text processing method according to an embodiment of the disclosure; FIG. 3 is a flowchart illustrating another text processing method according to an embodiment of the present disclosure; FIG. 4 is a flowchart of another text processing method according to an embodiment of the present disclosure; fig. 5 is a schematic structural diagram of another text processing device according to an embodiment of the present disclosure; Fig. 6 is a block diagram of an