Search

CN-122001965-A - Unstructured message processing method, system, equipment and medium

CN122001965ACN 122001965 ACN122001965 ACN 122001965ACN-122001965-A

Abstract

The invention discloses a method, a system, equipment and a medium for unstructured message processing, which belong to the technical field of data analysis processing, wherein the method comprises the following steps: and constructing a heterogeneous source data model, locking an atomization assembly sequence, extracting effective elements through feature cutting and cascading recursion analysis, and finally loading the effective elements into a normalized target model. The system comprises a source data model construction module, a component sequence locking module, a characteristic identification cutting module, a cascading analytical model construction module, a recursion flow driving module and a structured recombination module. The invention realizes the accurate recursion analysis of complex nested messages through the dynamic arrangement of the atomization component and the multi-level cascading mechanism. The method supports the hot loading of the components and the dynamic updating of the strategies, and realizes flexible expansion of the capacity on the premise of ensuring high availability of the service. The method gets rid of dependence on complex regularization, remarkably improves the processing efficiency of unstructured data, and reduces the maintenance cost.

Inventors

  • ZHONG DANYE
  • SUN XIN

Assignees

  • 江苏保旺达软件技术有限公司

Dates

Publication Date
20260508
Application Date
20260122

Claims (10)

  1. 1.A method for unstructured message processing, comprising: Obtaining unstructured source data flow, packaging data fingerprints and meta-attributes, and constructing a heterogeneous source data object model; responding to the establishment of the heterogeneous source data object model, and locking an adaptive atomization processing component sequence based on a strategy library preset by the data fingerprint index; traversing the atomization processing assembly sequence, and cutting the unstructured source data stream through feature recognition logic to obtain a discrete feature set; Combining the inclusion structure of the discrete feature set to construct a multi-level cascading analytical model for path planning of deep nested information; establishing a mapping channel of current output and component input when the feature to be refined is detected through the multi-level cascading analytical model, and driving data flow to iteratively circulate among the atomization processing component sequences to obtain effective information elements; based on the mapping conversion logic, loading the effective information elements into a normalized target information model to complete the structured recombination of data.
  2. 2. The method of unstructured message processing according to claim 1, wherein the sequence of atomization processing components comprises a feature extraction class component and an attribute assignment class component that are logically coupled; The traversing the sequence of atomization processing components includes: activating the feature extraction type component, cutting the unstructured source data stream through preset feature delimiting logic, and generating a temporary storage state feature set; Monitoring the recursion attribute of the temporary storage state feature set by using the multi-level cascading analytical model; responding to the detection of the recursion attribute, constructing a secondary input stream, inputting the temporary storage state feature set loop back to the feature extraction class component, and executing iterative analysis; And in response to the lack of the recursion attribute, activating the attribute assignment class component, mapping the temporary storage state feature set into target service data, and generating the effective information element.
  3. 3. The method of unstructured message processing of claim 1, further comprising: constructing a dynamic operator resident container for hosting component instances conforming to a unified contract interface protocol; responding to the triggering of the function expansion instruction, analyzing an external component packaging package, and checking the compliance of the unified contract interface protocol; instantiating the external component package by using a hot loading mechanism, generating an extension component instance and registering the extension component instance to the dynamic operator resident container; and refreshing the index mapping of the strategy library, and endowing the atomization processing component sequence with the authority of calling the expansion component instance.
  4. 4. The method of unstructured message processing according to claim 1, wherein the locking the adapted sequence of atomization processing components comprises: accessing a strategy index library residing in a memory buffer, wherein the strategy index library maintains the mapping relation between strategy indexes and strategy entities based on a periodic refreshing mechanism; extracting a data source attribution identifier in the heterogeneous source data object model as a primary index key, and positioning a corresponding analysis rule set in the strategy index library; Extracting detail positioning pointers in the heterogeneous source data object model as secondary index keys, and addressing and packaging specific analysis logic units with component arrangement configuration data in the analysis rule set; and analyzing the component arrangement configuration data, and instantiating the atomization processing component sequence in sequence.
  5. 5. The method of unstructured message processing according to claim 2, wherein the preset feature delimiting logic comprises: executing key-value pair anchoring logic that locates and cuts a key-value pair structure based on a predefined key-name identifier and connector pattern; Executing a position offset delimiting logic, wherein the position offset delimiting logic intercepts a feature segment at a fixed position based on a preset byte length or character index interval; Performing fence closure delimiter logic that extracts payloads within a closure interval based on starting and ending symbols that occur in pairs; Structured serialization logic is performed that traverses and maps attribute values of leaf nodes based on a syntax tree structure of an object markup language.
  6. 6. The method of unstructured message processing of claim 1, wherein the map transformation logic comprises: executing time sequence regular logic, wherein the time sequence regular logic analyzes heterogeneous time stamp formats and converts the heterogeneous time stamp formats into uniform time sequence measurement standards; Executing semantic translation logic, wherein the semantic translation logic translates the original feature values into business semantic values according to a pre-configured association mapping rule; executing code restoration logic, wherein the code restoration logic recognizes a transmission code protocol and restores the transmission code protocol into a character plaintext; Morphology remodeling logic is executed that reconstructs a surface form of the data through character modification operations.
  7. 7. The method for unstructured message processing according to claim 1, wherein constructing a heterogeneous source data object model comprises: executing protocol regulation logic, wherein the protocol regulation logic establishes communication connection with a multi-source data access point and converts heterogeneous transmission protocol loads into standard input streams; Executing a routing marking logic, wherein the routing marking logic injects the data fingerprint into the standard input stream according to the static configuration context of the data access task, and the data fingerprint contains guiding information for indexing the policy repository; Metadata synthesis logic is executed that generates a globally unique sequence number and a receive timestamp and binds it to the standard input stream, establishing a lifecycle attribute of the data object.
  8. 8. A system for unstructured message processing, a method for unstructured message processing according to any of claims 1 to 7, comprising: The source data model construction module is used for acquiring unstructured source data streams, packaging data fingerprints and meta-attributes and constructing a heterogeneous source data object model; the component sequence locking module is used for locking the adaptive atomization processing component sequence based on a strategy library preset by the data fingerprint index in response to the establishment of the heterogeneous source data object model; The feature recognition cutting module is used for traversing the atomization processing assembly sequence, and cutting the unstructured source data stream through feature recognition logic to obtain a discrete feature set; The cascade analysis model construction module is used for constructing a multi-level cascade analysis model by combining the internal structure of the discrete feature set and is used for path planning of deep nested information; The recursion flow driving module is used for establishing a mapping channel between current output and component input when the feature to be refined is detected through the multi-level cascading analysis model, and driving data flow to iteratively flow among the atomization processing component sequences to obtain effective information elements; And the structured reorganization module is used for loading the effective information elements into the normalized target information model based on the mapping conversion logic to complete structured reorganization of the data.
  9. 9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, carries out the steps of a method of unstructured message processing according to any of claims 1 to 7.
  10. 10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of a method of unstructured message processing according to any of claims 1 to 7.

Description

Unstructured message processing method, system, equipment and medium Technical Field The invention relates to the technical field of data analysis processing, in particular to a method, a system, equipment and a medium for unstructured message processing. Background With the rapid development of information technology, modern comprehensive information technology systems are increasingly widely applied in various business scenes, and the explosion of data access requirements is accompanied. In complex digital ecology, the system needs to perform high-frequency interaction through a plurality of communication protocols from heterogeneous source ends, so that massive log messages and business files are generated. These data are often present in unstructured or semi-structured text form, and their content formats tend to be complex and variable, lacking uniform canonical standards. In order to effectively utilize these data resources, the information collector must perform deep protocol adaptation, content parsing and information extraction for different data formats, and convert the original unstructured text into standardized structure data for subsequent service analysis, statistical calculation or decision support. In the face of continuous influx of large-scale data traffic, especially in the complex scene of high concurrency and multiple sources, how to realize the receiving, high-efficiency analysis, accurate extraction and quick conversion of data has become a key challenge for restricting the data processing capability of the existing system. In order to meet the above requirements, the main stream of text data processing systems in the current market mainly depends on regular expressions or simple optimization schemes based on the regular when performing message parsing. This technical path, which essentially employs pattern matching, has obvious limitations in practical applications. Firstly, the regular expression is written with an extremely high technical threshold, writers must have a strong professional background to cope with complex and changeable message formats, the designed expression tends to be logic-obscure, and the ordinary skill is difficult to quickly understand and maintain, so that the regular test, debugging and iteration cycles are long. Second, regular matching presents a natural bottleneck in operating efficiency. In order to give consideration to diversified data patterns, a complex regular expression can generate a large number of backtracking operations during running, so that the matching efficiency is obviously reduced. Particularly when processing a mass data stream, the system often has to rely on stacking more hardware resources to maintain necessary processing performance, which not only causes excessive consumption of computing resources, but also greatly increases construction and operation costs of the system. Disclosure of Invention In view of the above-mentioned problems, the present invention provides a method, system, device and medium for unstructured message processing. Therefore, the invention solves the technical problems that the prior art relies on regular matching to process unstructured messages, and has the problems of high rule writing threshold, low operation efficiency, large resource occupation and difficult handling of multi-layer nested complex format analysis. The technical scheme includes that unstructured source data flow is obtained, data fingerprints and meta attributes are packaged, a heterogeneous source data object model is built, an adaptive atomization processing component sequence is locked based on a strategy base preset by data fingerprint indexes in response to building of the heterogeneous source data object model, the atomization processing component sequence is traversed, the unstructured source data flow is cut through feature recognition logic to obtain a discrete feature set, a multi-level cascade analysis model is built by combining an embedded structure of the discrete feature set and used for path planning of deep nested information, a mapping channel of current output and component input is built through the multi-level cascade analysis model when features to be refined are detected, the data flow is driven to flow in an iteration mode among the atomization processing component sequence to obtain effective information elements, and the effective information elements are loaded to a normalized target information model based on mapping conversion logic to complete structural recombination of data. The method for processing unstructured messages comprises the steps of enabling an atomization processing component sequence to comprise a feature extraction component and an attribute assignment component which are logically coupled, traversing the atomization processing component sequence, enabling the feature extraction component to cut an unstructured source data stream through preset feature delimitation logic to generate a temp