CN-121996557-A - Method, apparatus, device, medium and program product for detecting output text of large model
Abstract
The application provides a method, a device, equipment, a medium and a program product for detecting output text of a large model, which can be applied to the technical field of artificial intelligence and the technical field of financial science and technology. The method comprises the steps of obtaining a text to be tested output by a large model, matching the text to be tested with a preset dictionary tree to obtain a matching result, wherein the large model carries out a generating task according to input information of a user, the nodes of the preset dictionary tree respectively represent a plurality of preset characters related to preset abnormal words, the side relationship among the nodes represents the position relationship among the nodes and the preset characters in a preset character string, the preset character string represents the preset abnormal words, and the abnormal words with semantic violation risks are determined from the text to be tested according to the matching result.
Inventors
- CAO ZHE
Assignees
- 中国工商银行股份有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260126
Claims (12)
- 1. A method of detecting output text of a large model, the method comprising: Obtaining a text to be tested output by a large model, wherein the large model executes a generating task according to input information of a user to obtain the text to be tested; Matching the text to be detected with a preset dictionary tree to obtain a matching result, wherein a plurality of nodes of the preset dictionary tree respectively represent a plurality of preset characters related to preset abnormal words, the side relation among a plurality of nodes represents the position relation among a plurality of preset characters in a preset character string, and the preset character string represents the preset abnormal words; and determining abnormal words with semantic violation risks from the text to be detected according to the matching result.
- 2. The method of claim 1, wherein the matching the text to be tested with a preset dictionary tree to obtain a matching result includes: And matching the plurality of characters to be detected of the text to be detected with the plurality of nodes of the preset dictionary tree to obtain a path matching result, wherein the path matching result represents a target preset character string represented by a target path in the preset dictionary tree and is matched with the target character string to be detected of the text to be detected, and the target path comprises the position relation between a plurality of preset characters in the target preset character string and a plurality of preset characters.
- 3. The method according to claim 2, wherein the matching the plurality of characters to be tested of the text to the plurality of nodes of the preset dictionary tree to obtain the path matching result includes: Matching the n-1 character to be detected with an n-1 node corresponding to the n-1 character to be detected to obtain a character matching result of the n-1 character to be detected, wherein n is an integer greater than 1; Under the condition that a character matching result of an n-1 character to be tested represents that the n-1 character to be tested is matched with an n-1 node, determining an n node with an edge relation with the n-1 node, wherein the hierarchical relation of the n node is lower than that of the n-1 node; And matching the nth node with the nth character to be detected to obtain a character matching result of the nth character to be detected.
- 4. The method of claim 3, wherein the matching the plurality of characters to be tested of the text to the plurality of nodes of the preset dictionary tree to obtain the path matching result further comprises: Under the condition that the character matching result of the nth-1 character to be tested represents the nth-1 character to be tested and is not matched with the nth-1 node, determining node combination according to the nth-1 associated node with an edge relation with the nth-1 node; Determining an associated node combination matched with the node combination from the preset dictionary tree, wherein the field represented by the associated node combination is the same as the field represented by the node combination; And matching the n-1 character to be detected with the n-1 associated node corresponding to the associated node combination to obtain a target character matching result of the characteristic matching.
- 5. The method according to claim 1, wherein the method further comprises: word segmentation is carried out on the text to be detected, so that a plurality of words to be detected are obtained; Respectively carrying out hash calculation on a plurality of words to be detected by using a plurality of hash functions to obtain respective hash values to be detected of the plurality of words to be detected; Respectively carrying out hash calculation on a plurality of preset abnormal words by using a plurality of hash functions to obtain respective preset hash values of the preset abnormal words; Matching the hash values to be detected with the preset hash values to obtain a hash value matching result; taking the hash value to be detected, which is characterized by being consistent with the preset hash value, as a target hash value, and updating the text to be detected based on the word to be detected indicated by the target hash value to obtain updated text to be detected; The matching the text to be detected with a preset dictionary tree to obtain a matching result comprises the following steps: And matching the updated text to be detected with the preset dictionary tree to obtain the matching result.
- 6. The method according to claim 1, wherein the method further comprises: and determining a target thread matched with the text quantity of the text to be detected from a plurality of threads according to the thread quantity of idle threads in a thread pool, wherein the target thread is used for matching the text to be detected with a preset dictionary tree to obtain a matching result.
- 7. The method of claim 6, wherein determining a target thread from a plurality of thread pools that matches the number of text of the text under test based on the number of threads of free threads in the thread pools comprises: Determining a plurality of target threads matched with the text quantity from a plurality of idle threads under the condition that the thread quantity of the idle threads meets a preset quantity threshold value, or And under the condition that the number of the idle threads does not meet a preset number threshold, creating a specified number of new idle threads in the thread pool based on the number of the texts of the text to be tested, and determining a plurality of target threads from the idle threads in the current thread pool.
- 8. The method according to claim 1, wherein the method further comprises: determining abnormal input information in a plurality of input information of the large model aiming at the plurality of input information and abnormal words of the text to be detected corresponding to the plurality of input information; determining an evaluation result of the large model according to the quantity of the input information and the quantity of the abnormal input information; and optimizing the large model according to the evaluation result of the large model to obtain an optimized large model.
- 9. An apparatus for detecting output text of a large model, the apparatus comprising: The system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a text to be detected output by a large model, and the large model executes a generating task according to input information of a user to obtain the text to be detected; The matching module is used for matching the text to be detected with a preset dictionary tree to obtain a matching result, a plurality of nodes of the preset dictionary tree respectively represent a plurality of preset characters related to preset abnormal words, the side relation among the nodes represents the position relation between the side relation among the nodes and the preset characters in a preset character string, and the preset character string represents the preset abnormal words, and And the determining module is used for determining abnormal words with semantic violation risks from the text to be detected according to the matching result.
- 10. An electronic device, comprising: One or more processors; a memory for storing one or more computer programs, Characterized in that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1-8.
- 11. A computer-readable storage medium, on which a computer program or instructions is stored, which, when executed by a processor, carries out the steps of the method according to any one of claims 1-8.
- 12. A computer program product comprising a computer program or instructions which, when executed by a processor, implement the steps of the method according to any one of claims 1 to 8.
Description
Method, apparatus, device, medium and program product for detecting output text of large model Technical Field The present application relates to the field of artificial intelligence and financial technology, and more particularly, to a method, apparatus, device, medium and program product for detecting output text of a large model. Background Large language models, by virtue of their powerful natural language understanding and generating capabilities, exhibit great application potential in numerous fields. Before the large language model is put into use, the large language model needs to be tested to avoid abnormal results of the large language model output. In the related art, in the process of testing a large language model, excessive human resources are required to be consumed, the testing efficiency is low when the large language model faces mass data, and the testing accuracy is difficult to improve. Disclosure of Invention In view of the foregoing, the present application provides a method, apparatus, device, medium, and program product for detecting output text of a large model. According to the first aspect of the application, a method for detecting an output text of a large model is provided, which comprises the steps of obtaining a text to be detected output by the large model, wherein the large model executes a generating task according to input information of a user to obtain the text to be detected, matching the text to be detected with a preset dictionary tree to obtain a matching result, wherein a plurality of nodes of the preset dictionary tree respectively represent a plurality of preset characters related to preset abnormal words, a side relationship among the plurality of nodes represents a position relationship among the plurality of preset characters in a preset character string, the preset character string represents the preset abnormal words, and the abnormal words with semantic violation risks are determined from the text to be detected according to the matching result. According to the embodiment of the application, the text to be tested is matched with the preset dictionary tree to obtain a matching result, wherein the matching result comprises that a plurality of characters to be tested of the text to be tested are matched with a plurality of nodes of the preset dictionary tree to obtain a path matching result, the path matching result represents a target preset character string represented by a target path in the preset dictionary tree, the target preset character string is matched with the target character string to be tested in the text to be tested, and the target path comprises the position relation between a plurality of preset characters in the target preset character string and a plurality of preset characters. According to the embodiment of the application, a plurality of characters to be tested of a text to be tested and a plurality of nodes of a preset dictionary tree are matched to obtain a path matching result, wherein the path matching result comprises the steps of matching an n-1 th character to be tested with an n-1 th node corresponding to the n-1 th character to obtain a character matching result of the n-1 th character to be tested, determining an n-th node with an edge relation with the n-1 th node under the condition that the n-1 th character to be tested is matched with the n-1 th node by the character matching result of the n-1 th character, wherein the hierarchy relation of the n-th node is lower than that of the n-1 th node, and matching the n-th node with the n-th character to obtain a character matching result of the n-th character to be tested. According to the embodiment of the application, a plurality of characters to be tested of a text to be tested are matched with a plurality of nodes of a preset dictionary tree to obtain a path matching result, and the method further comprises the steps of determining a node combination according to an n-1 associated node with an edge relation with an n-1 node under the condition that the character matching result of the n-1 character to be tested represents the n-1 character and is not matched with the n-1 node, determining an associated node combination matched with the node combination from the preset dictionary tree, wherein fields represented by the associated node combination are identical with fields represented by the node combination, and matching the n-1 node with the n-1 associated node corresponding to the associated node combination to obtain a target character matching result with the representation matched. According to the embodiment of the application, the method further comprises the steps of segmenting a text to be tested to obtain a plurality of words to be tested, respectively carrying out hash calculation on the plurality of words to be tested by utilizing a plurality of hash functions to obtain respective hash values to be tested of the plurality of words to be tested, respectively carrying