CN-122019215-A - Multi-mode data recording method and device
Abstract
The application relates to the technical field of robot middleware and distributed system architecture, in particular to a multi-mode data recording method and device. The method comprises the steps of receiving messages of at least one data flow from a robot middleware, executing a byte-based access control operation for each received message, wherein the access control operation comprises the steps of according to the actual memory occupation amount of the message, attempting to apply corresponding quota permission, discarding the message in response to application failure, associating the quota permission with the message in response to application success, respectively recombining load data and metadata of a plurality of messages in which the application is successful to form batch data units conforming to a column-type storage format, and writing the batch data units into a storage file. The application realizes recording and analysis, supports zero-copy efficient reading, and greatly improves the stability, efficiency and integrity of multi-mode data recording.
Inventors
- ZHANG XIAODONG
- MA WEI
- YANG ZIJIANG
Assignees
- 中国科学技术大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260416
Claims (10)
- 1. A multi-mode data recording method, comprising: Receiving a message from at least one data stream of the robotic middleware; Executing a byte-based admission control operation for each received message, wherein the admission control operation comprises the steps of attempting to apply for a corresponding quota permission according to the actual memory occupation amount of the message, discarding the message in response to application failure, and associating the quota permission with the message in response to application success; respectively reorganizing the load data and the metadata of a plurality of messages which are applied successfully to form batch data units which accord with a column type storage format; and writing the batch data units into a storage file.
- 2. The method of claim 1, wherein the associating the quota grant with the message comprises: transferring ownership of the successfully applied quota permission to the message object; the quota permission is released when the message object is written to a storage file or destroyed.
- 3. The method of claim 1, wherein reorganizing the payload data and metadata of the plurality of messages for success respectively to form a batch data unit conforming to a columnar storage format, comprises: respectively storing the load data and the metadata of the same field of the plurality of messages which are applied successfully into corresponding memory arrays; based on the memory array, constructing LISTARRAY data structures conforming to an Apache Arrow format; The LISTARRAY data structure is packaged RecordBatch to form the bulk data unit.
- 4. The method according to claim 1, characterized in that: the reorganizing the payload data and metadata of the plurality of messages for which the application was successful and the writing of the bulk data units to the storage file are performed in an asynchronous task independent of the received messages, and The writing the batch data units into a storage file includes: And writing the single batch data units into the storage file through one continuous I/O operation.
- 5. The method of claim 4, wherein before the recombining the payload data and the metadata of the plurality of messages for which the application is successful, further comprises: establishing an asynchronous transmission channel controlled by memory quota for the asynchronous task as an independent data channel; And the asynchronous task reads the message of the success application from the asynchronous transmission channel for processing.
- 6. The method according to claim 1, wherein the method further comprises: When a message belonging to a new data type is received for the first time, dynamically creating an asynchronous processing task which is special for processing the data type and is isolated at a physical thread level and an independent asynchronous data channel controlled by a memory quota; And carrying out the recombination and writing on the subsequently received messages of the same type through the corresponding independent asynchronous data channels.
- 7. The method according to claim 1 or 4, characterized in that the method further comprises: responding to the recording stopping instruction, stopping receiving the new message; and closing the storage file after waiting for all the recombination and the writing in progress.
- 8. The method of claim 1, wherein the robotic intermediate is dora-rs, and/or, The at least one data stream includes at least two from among lidar, cameras, inertial measurement units, map data, control instructions.
- 9. The method according to claim 1, characterized in that: the storage file is an Apache Arrow format file containing self-descriptive file page footers; The data in the storage file is organized in a columnar storage format to support zero-copy reading by Pandas, pyTorch data processing frameworks.
- 10. A multi-modal data recording apparatus, comprising: a receiving module for receiving a message from at least one data stream of the robotic middleware; The admission control module is used for executing the admission control operation based on bytes for each received message, wherein the admission control operation comprises the steps of attempting to apply corresponding quota permission according to the actual memory occupation amount of the message, discarding the message in response to the application failure, and associating the quota permission with the message in response to the application success; The reorganization module is used for reorganizing the load data and the metadata of the plurality of messages which are applied successfully respectively to form batch data units which accord with a column type storage format; And the writing module is used for writing the batch data units into a storage file.
Description
Multi-mode data recording method and device Technical Field The application relates to the technical field of robot middleware and distributed system architecture, in particular to a multi-mode data recording method and device. Background In the real-time computing fields of robots, automatic driving and the like, the multi-mode data stream (such as camera images, laser radar point clouds and inertial measurement data) generated during the running of the system is recorded efficiently and reliably, and is an important basis for developing algorithm training, functional verification and fault diagnosis. Currently, the industry generally adopts a recording tool (such as rosbag) of a robot middleware (such as ROS/ROS 2) or a general container format (such as MCAP) which is developed later as a mainstream solution. Although the scheme can meet the basic data recording requirement, when the infrastructure of the scheme meets the severe requirements of high throughput, strong real-time and data isomerism of a robot system, the inherent defects which are difficult to overcome exist, and the defects are mutually related and superposed to influence. These drawbacks are mainly manifested in three ways, firstly, in the lack of system stability. The existing tool adopts a passive blocking type writing model, and lacks a fine resource management and control mechanism. When the performance of the storage I/O fluctuates, the blocking of the recording link can quickly form back pressure and conduct to the upstream sensing, positioning, control and other core service modules, and when serious, the real-time performance of the system is invalid and even the whole system collapses. Second, the data processing efficiency is low. The file generated by the existing scheme adopts a line type storage structure, and original data is stored line by line according to the message time sequence, so that the file is suitable for a conventional recording scene, but is difficult to support the series-oriented batch calculation and analysis tasks such as AI training and the like. The data must undergo a number of inverse serialization and line-to-line format conversions before use, significantly increasing overhead, creating a pipeline bottleneck from data acquisition to model training. Finally, the recording of multiple data streams is prone to interference. The existing architecture generally adopts a mode of sharing a processing queue and an I/O channel to uniformly record various data, the instantaneous pressure of a high-bandwidth data stream (such as an image stream) can occupy system resources, so that high-frequency and light-weight key data streams (such as control instructions and IMU data) are frequently lost, and the recording result can not truly and completely reflect the running state of the system. In summary, the existing recording technology faces three interleaving problems of system instability caused by back pressure spreading, low data processing efficiency caused by line storage and key data loss caused by resource competition in the robot field for a long time. Therefore, the present application provides a method and apparatus for efficiently recording multi-mode data, which aims to systematically solve the above-mentioned drawbacks from the architecture level. Disclosure of Invention The multi-mode data efficient recording scheme provided by the embodiment of the application is used for solving the three core defects of the prior art, namely recording back pressure spreading and system real-time invalidation caused by lack of accurate resource control, low data processing efficiency caused by adopting a line type storage format, high preprocessing expenditure, and key data loss and insufficient recording integrity caused by multi-data stream sharing resource competition. A first aspect of an embodiment of the present application provides a multi-mode data recording method, including: Receiving a message from at least one data stream of the robotic middleware; Executing a byte-based admission control operation for each received message, wherein the admission control operation comprises the steps of attempting to apply for a corresponding quota permission according to the actual memory occupation amount of the message, discarding the message in response to application failure, and associating the quota permission with the message in response to application success; respectively reorganizing the load data and the metadata of a plurality of messages which are applied successfully to form batch data units which accord with a column type storage format; and writing the batch data units into a storage file. In some embodiments of the application, the associating the quota permission with the message includes: transferring ownership of the successfully applied quota permission to the message object; the quota permission is released when the message object is written to a storage file or destroyed. In some embodiments of th