CN-121980512-A - Processing method and system for multi-source heterogeneous data of new energy truck

CN121980512ACN 121980512 ACN121980512 ACN 121980512ACN-121980512-A

Abstract

The invention belongs to the field of data processing, and particularly relates to a processing method and a processing system for multi-source heterogeneous data of a new energy truck, wherein the method is used for continuously monitoring until the starting characteristics of vehicle operation events which occur in the multi-source data at the same time are identified; the method comprises the steps of inputting time window data containing initial characteristics into a pre-trained generation type countermeasure network, learning time stamp deviation characteristics through a generator, generating time sequence alignment parameters, simultaneously verifying parameter accuracy through a discriminator, carrying out time synchronization processing on multi-source characteristic sequences in a future time period based on the alignment parameters, generating standardized characteristic sequences, evaluating data quality through calculating integrity indexes and consistency indexes, dynamically screening the standardized characteristic sequences according to quality improvement degree, adding the standardized characteristic sequences into a target data set, and finally reconstructing vehicle state vectors based on the target data set and packaging the vehicle state vectors into a standardized data model, so that the problems of multi-source heterogeneous data time sequence dislocation and characteristic deletion are effectively solved.

Inventors

JIANG MINGHUI
LU JIANXIN
XU JIAN
ZHANG YUXI

Assignees

江苏零浩网络科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260126

Claims (10)

1. The processing method for the multi-source heterogeneous data of the new energy truck is characterized by comprising the following steps of: s1, controlling a sensor network configured by a data acquisition end to acquire an original data stream of each data source in parallel, and continuously monitoring the main clock time of a data acquisition system configured for each data record mark until at least two heterogeneous data sources simultaneously generate initial characteristic data representing the same vehicle operation event; S2, inputting a multisource time sequence in an initial processing time window containing initial characteristic data into a pre-trained generation type countermeasure network, learning a time stamp deviation characteristic through a generator, generating a time sequence alignment parameter, and verifying the judgment accuracy of the time sequence alignment parameter through a discriminator; s3, based on the time sequence alignment parameters output by the generator and the judgment accuracy, carrying out time synchronization processing on the multi-source characteristic sequences of the time windows to be processed, which flow into the future preset time length, so as to generate standardized characteristic sequences of all the time windows; S4, calculating an integrity index and a consistency index of the standardized feature sequences of each time window to be processed, comparing the comprehensive quality evaluation index of the current processing time window with the previous processing time window, and determining whether to add the current standardized feature sequences into a target data set according to the quality improvement degree; S5, performing time sequence correction, data synchronization and quality assessment processes in an iterative mode, terminating the processes when standardized feature sequences of a plurality of continuous processing time windows in a preset time length in the future do not reach a preset quality improvement threshold, reconstructing vehicle state vectors based on all standardized feature sequences in a target data set, and packaging a plurality of groups of complete vehicle state vectors and internal feature association relations thereof together to form a standardized data model for monitoring and analyzing the vehicle state.
2. The method for processing heterogeneous data of a new energy truck according to claim 1, wherein the implementation process of the generated countermeasure network comprises: s2.1, inputting the multi-source time sequence in the initial processing time window to a pre-trained generator of a generating type countermeasure network to extract the timestamp deviation characteristic; S2.2, the generator outputs a group of time sequence alignment parameters through full-connection layer regression based on the extracted time stamp deviation characteristics, wherein the time sequence alignment parameters comprise a time offset compensation value of each data source relative to a system reference time axis and a normalized scaling factor of the acquisition frequency of each data source; and S2.3, splicing the time sequence alignment parameters output by the generator with the corresponding multi-source time sequences to form feature vectors, and inputting the feature vectors to a discriminator of the generated countermeasure network, wherein the discriminator adopts a circulating neural network structure containing an attention mechanism and is used for analyzing the internal relevance between the time sequence alignment parameters and the multi-source time sequences.
3. The method for processing heterogeneous data of a new energy truck according to claim 2, wherein the implementation process of the generated countermeasure network further comprises: S2.4, the discriminator outputs discrimination confidence of the time sequence alignment parameter based on the attention weight analysis result, wherein the discrimination confidence is used for representing the matching degree between the time sequence alignment parameter and the real time sequence relation; S2.5, in the training process of the generated countermeasure network, iteratively optimizing network parameters by minimizing the alignment error loss of the generator and maximizing the discrimination accuracy of the discriminator until the discriminator cannot distinguish the time sequence alignment parameters output by the generator from the real alignment parameters with the accuracy exceeding the preset alignment error loss threshold.
4. The method for processing multi-source heterogeneous data of a new energy truck according to claim 3, wherein the time synchronization processing is performed on the multi-source characteristic sequence of the time window to be processed flowing in a preset time length in the future, and the method comprises the following steps: s3.1, extracting time sequence alignment parameters output by a generator from a pre-trained generation type reactance network, and simultaneously acquiring discrimination confidence coefficient output by a discriminator; S3.2, based on the ending time of the current processing time window, advancing according to a preset time step, and generating a continuous time window sequence to be processed within a preset time length in the future, wherein the duration of each time window is consistent with that of the initial processing time window; S3.3, for the multi-source characteristic sequences flowing in each time window to be processed, compensating and correcting the time mark of each data record according to the time offset compensation value of the corresponding data source, wherein: when the time offset compensation value is positive, the time mark of the data record is forward adjusted for the duration of the absolute value of the time offset compensation value; And when the time offset compensation value is negative, the time mark of the data record is backwards adjusted to the duration of the absolute value of the preset time offset compensation value.
5. The method for processing multi-source heterogeneous data of a new energy truck according to claim 4, wherein the time synchronization processing is performed on a multi-source characteristic sequence of a time window to be processed flowing in a preset time length in the future, further comprising: s3.4, carrying out resampling processing on the characteristic sequences of all the data sources based on the acquisition frequency normalization scaling factors of each data source, and unifying the acquisition frequency of each data source to a standard frequency reference, wherein the method specifically comprises the following steps: For a data source with the acquisition frequency higher than the standard frequency reference, performing downsampling after adopting anti-aliasing filtering; for a data source with the acquisition frequency lower than the standard frequency reference, adopting an interpolation algorithm based on physical constraint to carry out up-sampling; s3.5, dynamically adjusting the application strength of the time sequence alignment parameter according to the judgment confidence level output by the judgment device, wherein: When the discrimination confidence is larger than a preset discrimination confidence threshold, completely adopting the generated time sequence alignment parameters; When the discrimination confidence is smaller than or equal to a preset discrimination confidence threshold, the application strength of the time sequence alignment parameters is attenuated proportionally, and smooth transition is carried out by combining with the history alignment parameters; S3.6, rearranging the multi-source characteristic sequences subjected to time stamp correction and frequency unified processing according to a system reference time axis to generate standardized characteristic sequences with completely aligned time sequences in each time window.
6. The method for processing heterogeneous data of a new energy truck according to claim 5, wherein the deciding whether to add the current standardized feature sequence to the target data set according to the quality improvement degree comprises: s4.1, counting the number of effective data points in each data source standardized feature sequence in the current processing time window, calculating the ratio of the number of the effective data points of each data source to the total number of the corresponding expected data points, obtaining the data integrity rate value of each data source, and selecting the minimum value from the data integrity rate values of all the data sources as an integrity index of the current processing time window; S4.2, selecting data source combinations with association relation at a physical layer, calculating the correlation coefficient between standardized feature sequences of each data source combination in a current processing time window, averaging the correlation coefficients of all the data source combinations, and taking the obtained average value as a consistency index of the current processing time window; and S4.3, carrying out weighted calculation on the integrity index and the consistency index through a preset working condition-weight mapping table to obtain the comprehensive quality evaluation index of the current processing time window.
7. The method for processing heterogeneous data of a new energy truck according to claim 6, wherein the deciding whether to add the current standardized feature sequence to the target data set according to the quality improvement degree further comprises: s4.4, comparing the comprehensive quality evaluation index value of the current processing time window with the comprehensive quality evaluation index value of the previous processing time window, calculating the difference value of the two values, and taking the obtained difference value as the quality improvement degree of the current processing time window relative to the previous processing time window; S4.5, when the quality improvement degree is a positive value and the comprehensive quality evaluation index value of the current processing time window exceeds a preset quality threshold value, adding the current standardized feature sequence into the target data set; S4.6, when the quality improvement degree is a non-positive value or the comprehensive quality evaluation index value of the current processing time window does not reach a preset quality threshold value, the current standardized feature sequence is not added into the target data set; And S4.7, after the current standardized feature sequence is determined to be added into the target data set, taking the comprehensive quality evaluation index value of the current processing time window as a new quality reference value for calculating the quality improvement degree of the next time window to be processed.
8. The method for processing heterogeneous data of a new energy truck according to claim 7, wherein the step of obtaining the comprehensive quality evaluation index of the current processing time window by performing weighted calculation through a preset working condition-weight mapping table comprises the steps of: S4.3.1, extracting predefined working condition discrimination characteristics based on a multi-source standardized feature sequence with timing synchronization completed in a current processing time window and a lightweight decision tree model, wherein the lightweight decision tree model outputs a vehicle working condition label corresponding to the current processing time window by comparing a preset working condition feature threshold library; s4.3.2, according to the working condition distinguishing characteristics, combining a predefined working condition-weight mapping rule base, and acquiring a dynamic weight parameter pair corresponding to the current working condition label; S4.3.3, based on the dynamic weight parameter sequence corresponding to the current working condition label, combining the corresponding integrity index and consistency index, and obtaining the comprehensive quality evaluation index through a weighted average algorithm.
9. The processing system for the multi-source heterogeneous data of the new energy wagon is used for realizing the processing method for the multi-source heterogeneous data of the new energy wagon according to any one of claims 1 to 8, and is characterized by comprising a monitoring module, an countermeasure module and a synchronization module; The monitoring module is used for controlling the sensor network configured by the data acquisition end to acquire the original data stream of each data source in parallel, and continuously monitoring the main clock time of the data acquisition system configured for each data record mark until at least two heterogeneous data sources simultaneously generate initial characteristic data representing the same vehicle operation event; The countermeasure module is used for inputting a multisource time sequence in an initial processing time window containing initial characteristic data into a pre-trained generation type countermeasure network, learning a time stamp deviation characteristic through a generator, generating a time sequence alignment parameter, and verifying the judgment accuracy of the time sequence alignment parameter through a discriminator; And the synchronization module is used for carrying out time synchronization processing on the multisource characteristic sequences of the time windows to be processed, which flow into the future preset time length, based on the time sequence alignment parameters output by the generator and the judgment accuracy, and generating standardized characteristic sequences of all the time windows.
10. The system for processing multi-source heterogeneous data of a new energy truck according to claim 9, wherein the system for processing multi-source heterogeneous data further comprises an evaluation and discrimination module and a packaging module; The evaluation judging module is used for calculating the integrity index and the consistency index of the standardized feature sequence of each time window to be processed, comparing the comprehensive quality evaluation index of the current processing time window with the previous processing time window, and determining whether to add the current standardized feature sequence into the target data set according to the quality improvement degree; The packaging module is used for iteratively executing a time sequence correction, data synchronization and quality assessment flow, terminating the flow when the standardized feature sequences of a plurality of continuous processing time windows in a preset time length in the future do not reach a preset quality improvement threshold, reconstructing a vehicle state vector based on all the standardized feature sequences in a target data set, and jointly packaging a plurality of groups of complete vehicle state vectors and internal feature association relations thereof into a standardized data model for monitoring and analyzing the vehicle state.

Description

Processing method and system for multi-source heterogeneous data of new energy truck Technical Field The invention belongs to the field of data processing, and particularly relates to a method and a system for processing multi-source heterogeneous data of a new energy truck. Background The large-scale operation of the new energy trucks generates massive multi-source heterogeneous data, including high-frequency whole car CAN signals, battery state data, low-frequency GPS position information, discrete driving events and the like, wherein the data have obvious differences in time sequence, format and acquisition frequency to form complex data ecology, and the prior art generally relies on a large data platform or data center to perform centralized processing, and adopts a simple data access and batch storage scheme. However, when processing streaming data, the existing method often lacks a fine time sequence alignment mechanism, only can rely on the time stamp of the data to carry out coarse granularity fusion or adopts simple interpolation filling to process data with different frequencies, and the method is difficult to overcome the inherent challenges caused by asynchronous bottom hardware clocks and inconsistent acquisition periods, so that the problems of time stamp dislocation and a large number of vacancies of data fields occur when a unified vehicle state feature vector is constructed, the accuracy and instantaneity of subsequent data analysis are severely restricted, and the severe requirements of upper-layer application such as accurate energy efficiency management and fault early warning on the quality of the data cannot be met. The prior art has the following problems that when the multi-source heterogeneous data of the new energy wagon are fused, because the clocks of hardware of different data sources are not synchronous and the acquisition frequency difference is huge, the constructed unified time sequence feature vector has timestamp dislocation and a large number of vacancies of data fields, and the causality distortion and the accuracy of the subsequent data analysis are reduced. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a processing method and a processing system for multi-source heterogeneous data of a new energy truck, wherein the method comprises the steps of parallelly collecting original data streams of each data source through a sensor network configured by a control data collecting end, marking unified clock time, continuously monitoring until the starting characteristics of vehicle running events which occur in the multi-source data simultaneously are recognized, inputting time window data containing the starting characteristics into a pre-trained generating type countermeasure network, learning time stamp deviation characteristics through a generator, generating time sequence alignment parameters, simultaneously verifying parameter accuracy through a discriminator, carrying out time synchronization processing on the multi-source characteristic sequences in a future time period based on the alignment parameters, generating a standardized characteristic sequence, evaluating data quality through calculating an integrity index and a consistency index, dynamically screening the standardized characteristic sequence according to the quality improvement degree, adding a target data set, iteratively executing the process until the continuous multiple time windows do not reach a quality improvement threshold, finally reconstructing vehicle state vectors based on the target data sets and packaging the vehicle state vectors into a standardized data model, effectively solving the problems of multi-source heterogeneous data dislocation and feature loss, and improving the quality and the accuracy of vehicle state analysis. In order to achieve the above purpose, the present invention provides the following technical solutions: the processing method for the multi-source heterogeneous data of the new energy truck comprises the following steps: s1, controlling a sensor network configured by a data acquisition end to acquire an original data stream of each data source in parallel, and continuously monitoring the main clock time of a data acquisition system configured for each data record mark until at least two heterogeneous data sources simultaneously generate initial characteristic data representing the same vehicle operation event; S2, inputting a multisource time sequence in an initial processing time window containing initial characteristic data into a pre-trained generation type countermeasure network, learning a time stamp deviation characteristic through a generator, generating a time sequence alignment parameter, and verifying the judgment accuracy of the time sequence alignment parameter through a discriminator; s3, based on the time sequence alignment parameters output by the generator and the judgment accuracy, carrying out time synchronization