CN-118916679-B - Incremental learning method and system for dynamic pulsar data analysis

CN118916679BCN 118916679 BCN118916679 BCN 118916679BCN-118916679-B

Abstract

The invention provides an incremental learning method and system for dynamic pulsar data analysis. The method is used for improving accuracy and efficiency of pulsar identification and solving the problem of data drift in pulsar candidate data. Firstly, pulsar data is preprocessed through multi-source data integration and feature engineering technology to extract key features and enhance the input quality of a model. And then, training a basic model by using an initial Bayesian neural network to ensure the adaptability and stability of the model in various data environments. In order to cope with the challenges of dynamic change of data, the invention adopts an incremental learning strategy, so that the model can be updated in real time when new data is received, and the whole network does not need to be retrained, thereby remarkably improving the learning efficiency and instantaneity. In addition, by introducing model uncertainty evaluation, model parameters are further optimized and adjusted, and accuracy and reliability of the identification process are ensured.

Inventors

JIN JING
LIU YI

Assignees

哈尔滨工业大学

Dates

Publication Date: 20260508
Application Date: 20240802

Claims (8)

1. An incremental learning method for dynamic pulsar data analysis, the method comprising: S1, analyzing a pulsar data set to determine that a data drift problem exists; s2, adopting a weight adjustment technology to solve the problem of potential data unbalance caused by excessive non-pulsar candidate data; s3, replacing the neural network in the model with a Bayesian neural network, capturing uncertainty of data and enhancing robustness of the model; S4, designing an incremental learning model, allowing the incremental learning model to be dynamically adjusted when receiving new data, and adapting to the new data set through posterior distribution updating without training from scratch; S5, adopting a multi-modal structure as an incremental learning model, taking a Bayesian neural network as a basis, and integrating the incremental learning model into the incremental learning model training of the current task by combining an adaptive replay mechanism, so that the incremental learning model can utilize the knowledge of the previous task to enhance the accuracy and stability when processing the new task; in the S5, the multi-modal incremental learning model is based on a Bayesian neural network, modeling is carried out by adopting a mode of combining a Bayesian one-dimensional convolutional neural network and a Bayesian two-dimensional convolutional neural network, and further processing is carried out by adopting a Bayesian linear network to generate a final output; The Bayesian two-dimensional convolutional neural network is used for extracting local features, and realizes multiple feature enhancement and extraction by comprising a batch normalization layer, an activation function, a self-adaptive average pooling layer, a linear layer and a SEBlock residual error module activated by Sigmoid, wherein an attention mechanism helps an incremental learning model to be more effectively focused on important feature areas, so that the quality of data representation is improved.
2. The method according to claim 1, wherein in S1, the best period, the signal-to-noise ratio and the trend of the best DM value of the pulsar candidate data over time are observed, and it is determined that the data has a drift problem, and in addition, the drift problem of the data is caused by the change of the observation environment, the adjustment of the observation strategy, and the dynamic change of the interplanetary medium are required.
3. The method of claim 1, wherein in S2, a Focal loss function is introduced to dynamically adjust the loss weight of the easily-classified samples, thereby reducing the negative impact of unbalanced data on incremental learning model training.
4. The method according to claim 1, wherein in S3, the incremental learning models each employ bayesian neural networks and KL divergence calculation is performed on each layer of networks to capture uncertainty of data and enhance robustness of the model, and for bayesian inference, a lower bound of evidence approximation is used as an objective function: (1)。
5. The method of claim 1, wherein in S4, the new data is gradually absorbed by the network without forgetting the old data by iteratively updating the posterior distribution, when the new data arrives, the current posterior distribution is used as a new prior distribution, and the posterior distribution of the parameters is updated by bayesian inference to realize the dynamic update of the incremental learning model, and the dynamic update formula is expressed as: (2)。
6. an incremental learning system for dynamic pulsar data analysis, the system comprising: the analysis module is used for analyzing the pulsar data set and determining that the data drift problem exists; the adjustment module is used for processing the potential data unbalance problem caused by excessive non-pulsar candidate data by adopting a weight adjustment technology; the replacement module is used for replacing the neural network in the model with a Bayesian neural network, capturing the uncertainty of the data and enhancing the robustness of the model; The design module is used for designing an incremental learning model, allowing the incremental learning model to be dynamically adjusted when receiving new data, and adapting to the new data set through posterior distribution updating without training from scratch; The incremental learning model adopts a multi-modal structure, is based on a Bayesian neural network, and is integrated into the incremental learning model training of the current task by combining an adaptive replay mechanism, so that the incremental learning model can utilize the knowledge of the previous task to enhance the accuracy and stability when processing the new task; In the processing module, the multi-modal incremental learning model is based on a Bayesian neural network, modeling is carried out by adopting a mode of combining a Bayesian one-dimensional convolutional neural network and a Bayesian two-dimensional convolutional neural network, and further processing is carried out by adopting a Bayesian linear network to generate a final output; The Bayesian two-dimensional convolutional neural network is used for extracting local features, and realizes multiple feature enhancement and extraction by comprising a batch normalization layer, an activation function, a self-adaptive average pooling layer, a linear layer and a SEBlock residual error module activated by Sigmoid, wherein an attention mechanism helps an incremental learning model to be more effectively focused on important feature areas, so that the quality of data representation is improved.
7. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1-5 when the computer program is executed.
8. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1-5.

Description

Incremental learning method and system for dynamic pulsar data analysis Technical Field The invention relates to the technical field of pulsar candidate identification, in particular to an incremental learning model for processing a data drift problem, which is suitable for intelligent processing and analysis of astronomical observation data, and particularly relates to an incremental learning method and system for dynamic pulsar data analysis. Background With the rapid development of astronomical observation technology and the remarkable improvement of data acquisition capability, scientists have been able to acquire a large amount of astronomical data, especially in the field of pulsar observation. Pulsar is a highly compressed neutron star with extreme physical properties and regular pulse emission characteristics, and is an important object in astronomical physics research. Traditional pulsar identification methods rely mainly on predefined thresholds and manually selected features, which tend to be inefficient in large data environments and are susceptible to changes in data quality and data volume. In particular, over time, the nature and distribution of the collected data may change significantly, so-called "data drift", due to changes and upgrades of the observation equipment, changes in the observation environment, and adjustments in the observation strategy. The problem of data drift poses a significant challenge for automatic pulsar identification systems because conventional machine learning models generally assume that training data and future data remain in the same distribution, which assumption often does not hold true in astronomical observations. In addition, the different astronomical telescopes and observers may cause systematic errors between the data due to technical and design differences, which may affect the generalization ability and recognition accuracy of the model without proper processing. Meanwhile, the non-uniformity and sparsity of pulsar data per se also increase the complexity of the recognition work. In the prior art, attempts are made to solve these problems by various statistical methods and machine learning techniques, for example, using a support vector machine, a conventional algorithm such as random forest, and a deep learning algorithm such as ResNet and CNN for classification recognition. However, these methods still have difficulty in handling large-scale data sets in practical applications, particularly where the data distribution varies significantly. Therefore, development of a new technical scheme is urgently needed to improve accuracy and efficiency of pulsar identification technology in dynamic and big data environments, and particularly, the problem of data drift and equipment difference can be adaptively processed, so that breakthrough progress is realized in the field of astronomical data analysis. The invention is proposed under the background, and aims to solve the problems and improve the identification performance of pulsar candidates by introducing incremental learning and Bayesian neural network technology. Disclosure of Invention The invention provides an incremental learning method and system for dynamic pulsar data analysis. The method aims at dynamically adjusting the learning model to adapt to new data distribution, effectively solving the problem of data drift and improving the accuracy and efficiency of pulsar candidate identification. The invention is realized by the following technical scheme, and provides an incremental learning method for dynamic pulsar data analysis, which comprises the following steps: S1, analyzing a pulsar data set to determine that a data drift problem exists; s2, adopting a weight adjustment technology to solve the problem of potential data unbalance caused by excessive non-pulsar candidate data; s3, replacing the neural network in the model with a Bayesian neural network, capturing uncertainty of data and enhancing robustness of the model; S4, designing an incremental learning model, allowing the incremental learning model to be dynamically adjusted when receiving new data, and adapting to the new data set through posterior distribution updating without training from scratch; And S5, the incremental learning model adopts a multi-mode structure, is based on a Bayesian neural network, and is integrated into the incremental learning model training of the current task by combining an adaptive replay mechanism, so that the incremental learning model can utilize the knowledge of the previous task to enhance the accuracy and stability when processing the new task. Further, in the step S1, the best period, the signal-to-noise ratio and the trend of the best DM value of the pulsar candidate data over time are observed, and the drift problem of the data is determined, and in addition, the drift problem of the data is caused by the change of the observation environment, the adjustment of the observation strategy and the dyna