CN-116108805-B - Stream processing method, device and equipment for tree file and storage medium
Abstract
The embodiment of the disclosure provides a streaming method, a streaming device, streaming equipment and a streaming storage medium for tree files. The method comprises the steps of dividing a start tag and a corresponding end tag in a tree file to be processed and text contents between adjacent start tags and end tags into data blocks, sequentially sending the data blocks to a processor according to the sequence of the data blocks obtained by dividing to form a data stream, and carrying out stream processing on the data stream. In this way, the processor can continuously stream small batches of data, and the file processing efficiency can be improved while the pressure of the processor is reduced.
Inventors
- HAO WEI
- SHEN CHUANBAO
- LIU JIARUI
Assignees
- 安徽华云安科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20221201
Claims (8)
- 1. A method for streaming tree files, the method comprising: dividing a start tag and a corresponding end tag in a tree file to be processed and text contents between adjacent start tags and end tags into data blocks; sequentially sending the data blocks to a processor according to the sequence of the data blocks obtained by dividing to form a data stream; performing streaming processing on the data stream; the streaming of the data stream comprises: and sequentially carrying out the following processing on the data blocks in the data stream: If the current data block is a start tag and the current data block is a first data block, generating a simulation path of the current data block and a node corresponding to the simulation path according to the root path and the current data block; If the current data block is a start tag and the current data block is not the first data block, generating a simulation path of the current data block and a node corresponding to the simulation path according to the simulation path of the last data block and the current data block, and further, if the tag attribute exists in the current data block, storing the tag attribute to the node of the current data block; if the current data block is text content, storing the current data block into a node of the previous data block; if the current data block is an end tag and the current data block is not the last data block, determining that the simulation path of the last data block corresponding to the data block to which the start tag belongs is the simulation path of the current data block; If the current data block is the end tag and the current data block is the last data block, determining the simulated path of the current data block as the root path.
- 2. The method according to claim 1, wherein the method further comprises: And if the tree file meets the preset processing conditions according to the file size and the aging level of the tree file, performing stream processing on the tree file.
- 3. The method according to claim 1, wherein the method further comprises: Storing the data in each node into a memory; Under the condition of data statistics, data corresponding to each node is extracted from the memory, and a statistical file is generated.
- 4. The method according to claim 1, wherein the method further comprises: And inquiring the corresponding node according to the input analog path, and acquiring data in the node.
- 5. The method of any one of claims 1-4, wherein the tree file is an XML file, a JSON file, an HTML file, or a YAML file.
- 6. A streaming apparatus for tree files, the apparatus comprising: the dividing module is used for dividing the starting tag and the corresponding ending tag in the tree file to be processed and the text content between the adjacent starting tag and ending tag into data blocks; The sending module is used for sequentially sending the data blocks to the processor according to the sequence of the data blocks obtained by dividing to form a data stream; The processing module is used for carrying out stream processing on the data stream; the processing module is specifically configured to: and sequentially carrying out the following processing on the data blocks in the data stream: If the current data block is a start tag and the current data block is a first data block, generating a simulation path of the current data block and a node corresponding to the simulation path according to the root path and the current data block; If the current data block is a start tag and the current data block is not the first data block, generating a simulation path of the current data block and a node corresponding to the simulation path according to the simulation path of the last data block and the current data block, and further, if the tag attribute exists in the current data block, storing the tag attribute to the node of the current data block; if the current data block is text content, storing the current data block into a node of the previous data block; if the current data block is an end tag and the current data block is not the last data block, determining that the simulation path of the last data block corresponding to the data block to which the start tag belongs is the simulation path of the current data block; If the current data block is the end tag and the current data block is the last data block, determining the simulated path of the current data block as the root path.
- 7. An electronic device, the electronic device comprising: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
- 8. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-5.
Description
Stream processing method, device and equipment for tree file and storage medium Technical Field The disclosure relates to the technical field of data processing, and in particular relates to a streaming processing method, device and equipment of a tree file and a storage medium. Background In the traditional data processing, data are collected uniformly, stored in a database, and then all data in the table are calculated in batches and then the result is output. The batch processing is completed once, and the timeliness requirement on the data is not high, but the performance requirement on the processor is high. With the development of information systems, the number of systems and the data content are continuously increased, files such as XML, JSON and the like are generated to store large-capacity tree structure data, and meanwhile, HTML files used by webpages, YAML files used by configuration files and the like are tree structures. Although the performance of the processor is continuously improved, the processing of the oversized tree file by using the batch processing method can bring great pressure to the processor, and the processing efficiency is slower and the delay is higher. Disclosure of Invention The present disclosure provides a method, an apparatus, a device, and a storage medium for processing a tree file, which can improve tree file processing efficiency. In a first aspect, an embodiment of the present disclosure provides a method for streaming a tree file, where the method includes: dividing a start tag and a corresponding end tag in a tree file to be processed and text contents between adjacent start tags and end tags into data blocks; sequentially sending the data blocks to a processor according to the sequence of the data blocks obtained by dividing to form a data stream; and carrying out stream processing on the data stream. In some implementations of the first aspect, the method further includes: and if the tree file meets the preset processing conditions according to the file size and the aging level of the tree file, performing stream processing on the tree file. In some implementations of the first aspect, streaming the data stream includes: the following processing is sequentially carried out on the data blocks in the data stream: If the current data block is a start tag and the current data block is a first data block, generating a simulation path of the current data block and a node corresponding to the simulation path according to the root path and the current data block; if the current data block is a start tag and the current data block is not the first data block, generating a simulation path of the current data block and a node corresponding to the simulation path according to the simulation path of the last data block and the current data block; if the current data block is text content, storing the current data block into a node of the previous data block; if the current data block is an end tag and the current data block is not the last data block, determining that the simulation path of the last data block corresponding to the data block to which the start tag belongs is the simulation path of the current data block; If the current data block is the end tag and the current data block is the last data block, determining the simulated path of the current data block as the root path. In some implementations of the first aspect, the method further includes: If the current data block is a start tag and the current data block is not the first data block and the tag attribute exists in the current data block, the tag attribute is stored to a node of the current data block. In some implementations of the first aspect, the method further includes: Storing the data in each node into a memory; Under the condition of data statistics, data corresponding to each node is extracted from the memory, and a statistics file is generated. In some implementations of the first aspect, the method further includes: And inquiring the corresponding node according to the input analog path, and acquiring data in the node. In some implementations of the first aspect, the tree file is an XML file, a JSON file, an HTML file, or a YAML file. In a second aspect, an embodiment of the present disclosure provides a streaming apparatus for a tree file, where the apparatus includes: the dividing module is used for dividing the starting tag and the corresponding ending tag in the tree file to be processed and the text content between the adjacent starting tag and ending tag into data blocks; The sending module is used for sequentially sending the data blocks to the processor according to the sequence of the data blocks obtained by dividing to form a data stream; and the processing module is used for carrying out stream processing on the data stream. In a third aspect, an embodiment of the present disclosure provides an electronic device comprising at least one processor, and a memory communicatively coupled t