CN-122019856-A - Text content rapid classification management method and system based on artificial intelligence
Abstract
The application relates to the technical field of artificial intelligence, in particular to a text content rapid classification management method and system based on artificial intelligence. The method comprises the steps of obtaining a text to be classified, carrying out multi-language compatible language unit segmentation, matching semantic units with a preset known semantic set, identifying the known semantic units and unknown semantic units, capturing context information, extracting co-occurrence correlation features and text tendency features based on the context information, constructing a behavior image, updating and adjusting the behavior image along with a newly added text, comparing the pre-inferred primary meaning and related attributes with the preset known semantic behavior image set, generating confidence, correcting and adjusting the confidence when the pre-inferred primary meaning and related attributes reappear, integrating, and carrying out classification decision output classification results and generating semantic units with explanatory clue indication contribution degree larger than a preset contribution threshold. The problems of low timeliness and classification accuracy of sensitive information identification, continuous reduction of system performance, high cost of manual intervention and hysteresis in the prior art are solved.
Inventors
- FU FAN
Assignees
- 深圳火炎焱人工智能有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260130
Claims (10)
- 1.A text content rapid classification management method based on artificial intelligence, which is characterized by comprising the following steps: Obtaining a text to be classified, performing multi-language compatible language unit segmentation on the text to be classified, and matching semantic units obtained by segmentation with a preset known semantic set to identify the known semantic units and unknown semantic units in the text to be classified, and capturing context information of the unknown semantic units in the text to be classified; Extracting co-occurrence associated features, grammar action features and text tendency features of the unknown semantic units based on the context information to construct a behavior representation of the unknown semantic units, updating the context information when a new text containing the unknown semantic units is subsequently received, and adjusting the behavior representation according to the updated context information; Comparing the adjusted behavior portraits with a preset known semantic behavior portraits set, deducing the preliminary meaning and related attributes of the unknown semantic units according to the comparison result, and generating corresponding confidence degrees; when the reappearance of the unknown semantic unit is detected in the newly added text, correcting the preliminary meaning and the related attribute according to the adjusted behavior image to obtain the meaning and the related attribute of the unknown semantic unit, and synchronously adjusting the confidence coefficient; Integrating the meaning and related attribute of a known semantic unit and the meaning and related attribute of an unknown semantic unit in a text to be classified, determining the participation degree of the meaning and related attribute of the unknown semantic unit in a classification decision based on the adjusted confidence, performing the classification decision on the text to be classified, outputting a classification result, and generating an explanatory clue, wherein the explanatory clue is used for indicating the semantic unit with the contribution degree to the classification result being larger than a preset contribution threshold.
- 2. The method for rapid classification management of text content based on artificial intelligence of claim 1, wherein extracting co-occurrence association features, grammar effect features and text tendency features of the unknown semantic units based on the context information further comprises: Identifying whether a repair structure exists in the context information; when the repair structure is identified, carrying out semantic deconstructment on the repair structure to obtain the real text tendency characteristic of the repair structure; And correcting the text tendency characteristic of the unknown semantic unit based on the real text tendency characteristic, wherein the corrected text tendency characteristic is used for subsequently combining the co-occurrence association characteristic and the grammar action characteristic of the unknown semantic unit to construct the behavior portrait of the unknown semantic unit.
- 3. The method for rapid classification management of text content based on artificial intelligence according to claim 1, wherein comparing the adjusted behavioral portraits with a preset set of known semantic behavioral portraits, deducing preliminary meaning and related attributes of the unknown semantic units according to the comparison result, and generating corresponding confidence levels, comprising: Comparing the adjusted behavior portraits with a preset known semantic behavior portraits set, and identifying a plurality of known semantic units with the similarity of the adjusted behavior portraits exceeding a preset similarity threshold; checking whether there is a contradiction of meaning or related attributes between a plurality of the known semantic units; when the contradiction is identified, analyzing a feature subset in the adjusted behavior representation, which causes the feature subset to be similar to different known semantic units; Generating multidimensional semantic inferences comprising a plurality of preliminary meanings and related attributes of the unknown semantic units in different contradictory semantic directions based on the feature subsets, and distributing initial confidence degrees for the semantic inferences corresponding to the preliminary meanings and the related attributes; Tracking the change of the feature subset when continuously receiving the new text containing the unknown semantic unit and updating the adjusted behavior portrait, and adjusting the initial confidence according to the change of the feature subset to obtain the confidence; and when the adjusted behavior portraits continuously present the characteristics of a plurality of contradictory semantic directions which are larger than the preset contradiction times in the preset time, maintaining the multidimensional semantic inference and generating a semantic uncertainty report.
- 4. The method for rapid classification management of text content based on artificial intelligence according to claim 3, wherein when the adjusted behavior representation continuously presents features of a plurality of contradictory semantic directions greater than a preset contradiction number within a preset time, maintaining the multi-dimensional semantic inference and generating a semantic uncertainty report, further comprising: Performing preliminary evaluation on potential risks according to the contradictory semantic directions and related information sources indicated in the semantic uncertainty report; selecting and activating one or more automatic risk mitigation processes and/or information collection processes matched with the potential risks from a preset risk linkage strategy library; Sending an early warning signal, wherein the early warning signal comprises identification information of the unknown semantic unit, multidimensional semantic inference corresponding to the contradictory semantic direction, a semantic uncertainty index and a suggested follow-up action scheme, and the semantic uncertainty index is determined by the semantic uncertainty report and is used for representing the semantic uncertainty degree; and starting an information tracing and verifying process according to the key information sources indicated in the semantic uncertainty report.
- 5. The method for rapid categorization management of text content based on artificial intelligence of claim 4, further comprising, after generating the explanatory cue: Marking semantic units with the contribution degree larger than a preset contribution threshold value to the classification result as contribution semantic units, identifying multi-dimensional semantic inference and/or semantic uncertainty indexes corresponding to the contribution semantic units as first multi-dimensional semantic inference and/or first semantic uncertainty indexes, and acquiring semantic uncertainty reports corresponding to the first semantic uncertainty indexes as first semantic uncertainty reports when the first semantic uncertainty indexes exist; extracting a first semantic inference and a first confidence coefficient thereof, wherein the first semantic inference is consistent with the category corresponding to the classification result, from the first multidimensional semantic inference; When the contribution semantic unit is an unknown semantic unit, extracting a key feature subset supporting the first semantic inference from a behavior portrait of the contribution semantic unit; Upon identifying the first semantic uncertainty indicator, analyzing the first semantic uncertainty report to identify a primary factor that causes uncertainty and quantifying a degree of impact of the primary factor on the classification decision; Constructing a hierarchical explanatory cue based on the contribution semantic units role in the classification decision, the first semantic inference, the first confidence level, the key feature subset, the principal factors, and the degree of influence; evaluating potential influence of the hierarchical explanatory clues on the manual decision to obtain an evaluation result; And when the evaluation result has potential misleading, generating a supplementary description for guiding the manual decision to read the contribution semantic unit.
- 6. The method for rapid categorization management of text content based on artificial intelligence of claim 1, further comprising, after generating the explanatory cue: Determining semantic units which are indicated by the explanatory clues and have a contribution degree to the classification result greater than a preset contribution threshold as contribution semantic units, and determining a semantic evolution period of the contribution semantic units according to the occurrence frequency of new texts containing the contribution semantic units in a preset historical time period, the change rate of text tendency characteristics corresponding to the contribution semantic units and/or the behavior portrait difference degree of the contribution semantic units at different time points; Continuously receiving the newly added text containing the contribution semantic units in the semantic evolution period, and updating the context information of the contribution semantic units based on the newly added text containing the contribution semantic units; Based on the updated context information, extracting co-occurrence associated features, grammar action features and text tendency features of the contribution semantic units according to preset time granularity respectively to construct behavior portraits of the contribution semantic units at different time points; comparing behavior images of the contribution semantic units at different time points, and identifying semantic evolution trend and/or evolution amplitude of the contribution semantic units; And when the semantic evolution trend and/or the evolution amplitude exceed a preset evolution threshold, adjusting the meaning and related attributes of the contribution semantic units, and updating the explanatory clues based on the adjusted meaning and related attributes.
- 7. The method for rapid categorization management of text content based on artificial intelligence of claim 6, further comprising, after determining the semantic evolution period of the contributing semantic units: In a preset sliding time window, when the occurrence frequency of the new text of the contribution semantic unit and/or the change rate of the text tendency characteristic corresponding to the contribution semantic unit exceeds a corresponding preset abrupt change threshold, adjusting the semantic evolution period into a short-term high-frequency semantic evolution period, and continuously receiving the new text containing the contribution semantic unit in a high-frequency manner in the short-term high-frequency semantic evolution period; In the short-term high-frequency semantic evolution period, constructing a short-term behavior portrait sequence of the contribution semantic unit based on the newly added text containing the contribution semantic unit, and calculating the semantic evolution rate and direction of the contribution semantic unit based on the short-term behavior portrait sequence; when the semantic evolution rate and the semantic evolution direction meet preset stable conditions, the semantic evolution period of the contribution semantic unit is adjusted according to the preset stable conditions, and the short-term high-frequency semantic evolution period is switched into the adjusted semantic evolution period.
- 8. The method for rapid categorization management of text content based on artificial intelligence of claim 6, further comprising, after updating context information of the contributing semantic unit based on the newly added text containing the contributing semantic unit: Identifying a plurality of influencing factors related to the occurrence frequency of the new text of the contribution semantic unit and/or the change rate of text tendency features corresponding to the contribution semantic unit based on the new text containing the contribution semantic unit received in the semantic evolution period, wherein the plurality of influencing factors comprise one or more of market emotion factors, policy guidance factors, industry dynamic factors and event factors; for each influencing factor, extracting text features corresponding to the influencing factor from the newly added text containing the contribution semantic units received in the semantic evolution period, wherein the text features at least comprise tendentious vocabulary density and/or the occurrence frequency of preset official terms; Calculating the contribution degree of each influence factor to the occurrence frequency of the newly added text of the contribution semantic unit and/or the change rate of the text tendency characteristic corresponding to the contribution semantic unit based on the text characteristic; And integrating the contribution degree of each influence factor, and generating a comprehensive evaluation result of the contribution semantic unit, wherein the comprehensive evaluation result is used for representing the real change trend of the contribution semantic unit under the combined action of multiple influence factors and indicating potential risks.
- 9. The method for rapid categorization management of text content based on artificial intelligence of claim 8, wherein integrating the contribution degree of each of the influencing factors generates a comprehensive evaluation result of the contribution semantic unit, comprising: carrying out standardization processing on the contribution degree of each influence factor to obtain standardized contribution degree; based on a preset financial field experience rule, identifying and constructing a nonlinear association structure between the influence factors; Converting the nonlinear association structure into a dynamically adjusted weight parameter; calculating a comprehensive evaluation result of the contribution semantic unit based on the occurrence frequency of the newly added text of the contribution semantic unit and/or the change rate of the text tendency characteristic corresponding to the contribution semantic unit, the standardized contribution degree and the weight parameter by adopting an iterative optimization mode; Monitoring deviation between the comprehensive evaluation result and a preset market performance index and/or risk event result; and correcting the weight parameters according to the deviation so as to update the comprehensive evaluation result.
- 10. An artificial intelligence based text content rapid classification management system, the system comprising: The acquisition module is used for acquiring a text to be classified, carrying out multi-language compatible language unit segmentation on the text to be classified, matching semantic units obtained by segmentation with a preset known semantic set to identify the known semantic units and the unknown semantic units in the text to be classified, and capturing context information of the unknown semantic units in the text to be classified; The behavior portrayal construction module is used for extracting co-occurrence association features, grammar action features and text tendency features of the unknown semantic units based on the context information to construct a behavior portrayal of the unknown semantic units, updating the context information when a new text containing the unknown semantic units is subsequently received, and adjusting the behavior portrayal according to the updated context information; The meaning inference and confidence coefficient generation module is used for comparing the adjusted behavior portraits with a preset known semantic behavior portraits set, inferring the preliminary meaning and related attributes of the unknown semantic units according to the comparison result, and generating corresponding confidence coefficient; the meaning correction and confidence coefficient adjustment module is used for correcting the preliminary meaning and the related attribute according to the adjusted behavior image when the reappearance of the unknown semantic unit is detected in the newly added text, so as to obtain the meaning and the related attribute of the unknown semantic unit, and synchronously adjusting the confidence coefficient; The classifying decision and interpretation module is used for integrating the meaning and related attribute of the known semantic unit and the meaning and related attribute of the unknown semantic unit in the text to be classified, determining the participation degree of the meaning and related attribute of the unknown semantic unit in the classifying decision based on the adjusted confidence, classifying the text to be classified, outputting a classifying result and generating an explanatory clue, wherein the explanatory clue is used for indicating the semantic unit with the contribution degree to the classifying result being larger than a preset contribution threshold.
Description
Text content rapid classification management method and system based on artificial intelligence Technical Field The application relates to the technical field of artificial intelligence, in particular to a text content rapid classification management method and system based on artificial intelligence. Background In the professional fields of financial risk control, public opinion monitoring and the like, an artificial intelligent system for quickly and accurately classifying and managing massive text information is a core infrastructure for supporting efficient operation of business. However, with the continuous evolution of internet languages and the diversification of information sources, the conventional text processing method is not careful when dealing with the expressions of mixed languages, emerging vocabularies and intentional disguises, which directly affects the identification timeliness and the classification accuracy of sensitive information. In particular, existing systems face challenges in processing text that includes transcoding. For example, in social media and internet forums, users are accustomed to mixing the vocabulary and grammatical structures of two or more languages in the same sentence, making it difficult for the system to judge the main language and understand the full meaning behind the mixed structure, thereby affecting classification accuracy. In addition, in the emerging fields of financial science and technology, cryptocurrency and the like, users can create a great number of brand new 'jargon' or new words which are integrated with multiple language elements, and the words have short life cycle, high propagation speed and high meaning dependence on context. Existing systems often classify these new words as nonsensical noise due to their lack of training data, missing significant risk signals that may be contained therein. To further complicate matters, some users intentionally "morph" text to avoid keyword masking, replacing normal words with non-standard characters that are close in pronunciation or similar in shape. The method breaks the preprocessing link of the system before text classification, so that the text information is incomplete or misinterpreted from the source, and the system cannot recognize camouflage words, thereby misclassifying or missing potential early warning information. Eventually, existing systems fall into a vicious circle, with performance continually degrading. The technical team has to invest huge manpower to keep track of new words, new stems and evading means of the network, label new data manually and train iteratively frequently. The passive response 'patching' working mode is high in cost and is always delayed from the evolution speed of the network language, so that efficiency advantages brought by an automatic system are completely counteracted, and timeliness and accuracy required by financial wind control cannot be met. In view of the above, there is a need in the art for improvements. Disclosure of Invention The application discloses a text content rapid classification management method and system based on artificial intelligence, and aims to solve the problems that when the prior art faces massive text information, the sensitivity information identification timeliness and classification accuracy are reduced, the system performance is continuously reduced, and the manual intervention cost is high and lagged. The technical scheme of the application is as follows: in a first aspect, the application discloses a text content rapid classification management method based on artificial intelligence, which comprises the following steps: Obtaining a text to be classified, performing multi-language compatible language unit segmentation on the text to be classified, matching semantic units obtained by segmentation with a preset known semantic set to identify the known semantic units and the unknown semantic units in the text to be classified, and capturing context information of the unknown semantic units in the text to be classified; Based on the context information, extracting co-occurrence associated features, grammar action features and text tendency features of the unknown semantic units to construct a behavior representation of the unknown semantic units, updating the context information when a new text containing the unknown semantic units is subsequently received, and adjusting the behavior representation according to the updated context information; Comparing the adjusted behavior portraits with a preset known semantic behavior portraits set, deducing the preliminary meaning and related attribute of the unknown semantic unit according to the comparison result, and generating corresponding confidence coefficient; When the reappearance of the unknown semantic unit is detected in the newly added text, correcting the preliminary meaning and the related attribute according to the adjusted behavior image to obtain the meaning and the relate