Search

CN-122021616-A - Chat data processing method and device based on cooperation of dynamic fragmentation and multiple agents

CN122021616ACN 122021616 ACN122021616 ACN 122021616ACN-122021616-A

Abstract

The invention discloses a chat data processing method and device based on synergy of dynamic fragments and multiple agents, which comprises the steps of obtaining original chat data, searching and segmenting the original chat data based on keywords to obtain a plurality of fragments, constructing and obtaining a fragment set, wherein each fragment is a chat text in which the keywords searched in the original chat data are located and a text formed by N pieces of chat texts before and after the chat text, calculating the weight of each fragment in the fragment set according to the basic weight corresponding to each keyword, screening out Q important fragments, respectively inputting each fragment in the Q important fragments into the agents in each dimension to obtain labels in the corresponding dimension and confidence thereof, constructing a confidence matrix, calculating the final confidence of each label based on the confidence matrix, and screening out the final labels corresponding to each fragment in the Q important fragments. The invention solves the problems of label breakage, insufficient cooperation and low credibility.

Inventors

  • WANG DELIANG
  • WEN RUOHUI
  • CHEN JUNSHAN
  • SU ZAITIAN
  • CHEN YUN

Assignees

  • 厦门市美亚柏科信息安全研究所有限公司

Dates

Publication Date
20260512
Application Date
20251218

Claims (10)

  1. 1. A chat data processing method based on the cooperation of dynamic slicing and multiple agents is characterized by comprising the following steps: constructing a plurality of dimension agents based on a large language model, wherein the agent of each dimension can predict a label of the corresponding dimension and the confidence level thereof based on the input text; Constructing a keyword library corresponding to the tag, wherein the keyword library comprises intelligent agents with corresponding dimensions, keywords and basic weights corresponding to the keywords; Acquiring original chat data, traversing each keyword, searching and segmenting the original chat data based on the keywords to obtain a plurality of fragments and constructing a fragment set, wherein each fragment is a chat text in which the keyword searched in the original chat data is located and a text formed by N pieces of chat texts before and after the chat text; inputting each of the Q important fragments into an intelligent agent in each dimension respectively, obtaining a label in the corresponding dimension and the confidence thereof, and constructing a confidence matrix; and calculating the final confidence coefficient of each label based on the confidence coefficient matrix, and screening out the final label corresponding to each of the Q important fragments.
  2. 2. The method for processing chat data based on collaboration of dynamic shards and multiple agents according to claim 1, wherein the agents of multiple dimensions comprise a character recognition agent, a behavior analysis agent and an intention recognition agent, and the corresponding labels comprise a character label, a behavior label and an intention label comprising at least one category.
  3. 3. The chat data processing method based on the cooperation of dynamic slicing and multi-agent according to claim 1, wherein the calculating of the weight of each slicing in the slicing set according to the basic weight corresponding to each keyword specifically comprises: counting the number of times that single keywords in the keyword library appear in the fragment set And the total number of occurrences of all keywords in the sharded collection And calculate the word frequency weight of the single keyword The following formula is shown: ; counting the number of fragments of a single keyword which are searched in the fragment set And the total number of segments in the set of segments for which all keywords are retrieved And calculate the distribution weight of the single keyword The following formula is shown: ; Word frequency weighting based on individual keywords Distributed weights And calculating the final weight of the single keyword by the basic weight, wherein the final weight is shown in the following formula: ; Wherein, the Representing the final weight of a single keyword, Representing the basis weight of a single keyword, And And respectively representing weight coefficients corresponding to the word frequency weights and the distribution weights.
  4. 4. The chat data processing method based on the cooperation of dynamic slicing and multi-agent according to claim 1, wherein the method for screening Q important slices according to the weight of each slice in the slice set specifically comprises: Calculating the final weight sum of all keywords appearing in each fragment in the fragment set to obtain the weight of each fragment in the fragment set; And sorting the weight of each fragment in the fragment set from large to small, and screening Q fragments with the top ranking as important fragments.
  5. 5. The chat data processing method based on the collaboration of dynamic slicing and multi-agent according to claim 1, wherein the dimension of the behavior agent in the confidence matrix corresponding to each of the Q important slices is a row, the columns are all labels, the elements are the confidence of each label in each dimension, and if no label is in the dimension, the corresponding confidence is 0.
  6. 6. The chat data processing method based on the cooperation of dynamic slicing and multi-agent according to claim 5, wherein calculating the final confidence of each label based on the confidence matrix and screening out the final label corresponding to each of the Q important slices specifically comprises: the method comprises the steps of obtaining the weight of an agent in each dimension and weighting the confidence coefficient of each label corresponding to each dimension in a confidence coefficient matrix corresponding to each of Q important fragments to obtain the final confidence coefficient of each label corresponding to each of the Q important fragments, wherein the final confidence coefficient is represented by the following formula: ; Wherein, the Represent the first The final confidence of the individual tags, Represent the first The weight of the agent in each dimension, M represents the total number of dimensions, Represent the first The first in the individual dimensions Confidence of the individual tags; And taking the label corresponding to the maximum value in the final confidence of all labels corresponding to each of the Q important fragments as the final label corresponding to each of the Q important fragments.
  7. 7. A chat data processing apparatus based on dynamic fragmentation and multi-agent collaboration, comprising: a multi-agent construction module configured to construct multi-dimensional agents based on a large language model, the agents of each dimension being capable of predicting a label and its confidence level for the corresponding dimension based on the entered text; The keyword library construction module is configured to construct a keyword library corresponding to the tag, wherein the keyword library comprises intelligent agents with corresponding dimensions, keywords and basic weights corresponding to the keywords; The segmentation screening module is configured to acquire original chat data, traverse each keyword, search and segment the original chat data based on the keywords to obtain a plurality of segments, and construct a segmentation set, wherein each segment is a chat text in which the keywords searched in the original chat data are located and a text formed by N pieces of chat texts before and after the chat text; The label screening module is configured to input each of the Q important fragments into an intelligent agent in each dimension respectively, obtain labels in the corresponding dimension and the confidence thereof and construct a confidence matrix, calculate the final confidence of each label based on the confidence matrix and screen out the final label corresponding to each of the Q important fragments.
  8. 8. An electronic device, comprising: one or more processors; storage means for storing one or more programs, When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
  9. 9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.
  10. 10. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-6.

Description

Chat data processing method and device based on cooperation of dynamic fragmentation and multiple agents Technical Field The invention relates to the field of data processing, in particular to a chat data processing method and device based on cooperation of dynamic fragmentation and multiple agents. Background In the forensic industry, there is an increasing need for analysis and processing of chat content. When the group chat/private chat records are processed, the information may contain key information such as various case related labels, role labels, behavior patterns, psychological characteristics and the like. Traditional data processing methods, such as static tiling and single agent analysis, have been difficult to meet the increasingly complex and enormous data processing requirements. At present, when the evidence obtaining industry processes group chat and private chat data, the following methods are mainly adopted: 1. Static slicing, namely slicing chat content according to a fixed length or a fixed time window, and then analyzing each slice independently. The method is simple and easy to implement, but information omission or redundancy easily occurs when processing chat contents which dynamically change. 2. Single agent analysis, namely, using single agent to carry out label analysis on chat content, such as case related labels and the like. The method has good effect when processing single type data, but is easy to generate label conflict when processing multi-type and multi-dimensional labels, and has inaccurate or incomplete result. Disclosure of Invention The application aims to provide a chat data processing method and device based on the cooperation of dynamic slicing and multiple agents. In a first aspect, the present invention provides a chat data processing method based on collaboration of dynamic fragmentation and multiple agents, including the following steps: constructing a plurality of dimension agents based on a large language model, wherein the agent of each dimension can predict a label of the corresponding dimension and the confidence level thereof based on the input text; constructing a keyword library corresponding to the tag, wherein the keyword library comprises intelligent agents with corresponding dimensions, keywords and basic weights corresponding to the keywords; Acquiring original chat data, traversing each keyword, searching and segmenting the original chat data based on the keywords to obtain a plurality of fragments and constructing to obtain a fragment set, wherein each fragment is a chat text in which the keywords searched in the original chat data are located and a text formed by N pieces of chat texts before and after the chat text; Inputting each of the Q important fragments into each dimension intelligent agent respectively to obtain the label of the corresponding dimension and the confidence coefficient thereof and constructing a confidence coefficient matrix, calculating the final confidence coefficient of each label based on the confidence coefficient matrix and screening out the final label corresponding to each of the Q important fragments. Preferably, the multi-dimensional agents include a character recognition type agent, a behavior analysis type agent, and an intention recognition type agent, and the corresponding tags include a character tag, a behavior tag, and an intention tag including at least one category. Preferably, the calculating the weight of each segment in the segment set according to the basic weight corresponding to each keyword specifically includes: counting the number of times a single keyword in a keyword library appears in a sharded collection And the total number of occurrences of all keywords in the sharded collectionAnd calculate the word frequency weight of the single keywordThe following formula is shown: ; counting the number of fragments searched in the fragment set by a single keyword And the total number of segments for which all keywords are retrieved in the set of segmentsAnd calculate the distribution weight of the single keywordThe following formula is shown: ; Word frequency weighting based on individual keywords Distributed weightsAnd calculating the final weight of the single keyword by the basic weight, wherein the final weight is shown in the following formula: ; Wherein, the Representing the final weight of a single keyword,Representing the basis weight of a single keyword,AndAnd respectively representing weight coefficients corresponding to the word frequency weights and the distribution weights. Preferably, the method screens out Q important slices according to the weight of each slice in the slice set, specifically includes: calculating the final weight sum of all keywords appearing in each fragment in the fragment set to obtain the weight of each fragment in the fragment set; and sorting the weight of each fragment in the fragment set from large to small, and screening out Q fragments with the top ranking a