CN-115810390-B - Automatic design method of transcriptome processing scheme based on reinforcement learning

CN115810390BCN 115810390 BCN115810390 BCN 115810390BCN-115810390-B

Abstract

The invention discloses an automatic design method of a transcriptome processing scheme based on reinforcement learning, which comprises the steps of constructing a reinforcement learning agent module, obtaining the transcriptome processing scheme based on an agent operation space and an agent circulation neural network through sampling, applying the transcriptome processing scheme to transcriptome observation data to obtain processed transcriptome data, inputting the transcriptome data and independent mode observation data into the reinforcement learning environment module to obtain a quantitative evaluation result of the transcriptome processing scheme, calculating parameter gradients of the agent circulation neural network based on a reward function of the transcriptome processing scheme, and reversely propagating and updating parameters of the agent circulation neural network, judging whether the maximum iteration times are reached, if so, stopping iteration, and rewarding the highest processing scheme in the iteration process as a final design result. The method can design a corresponding optimal transcription processing scheme from the head under the scene of the appointed multi-mode analysis task.

Inventors

DENG YUE
HOU YIMIN

Assignees

北京航空航天大学

Dates

Publication Date: 20260512
Application Date: 20221129

Claims (5)

1. An automated method of designing a reinforcement learning-based transcriptome processing scenario, comprising: s1, constructing a reinforcement learning agent module, wherein the reinforcement learning agent module comprises an agent operation space and an agent circulation neural network; S2, initializing related parameters in the intelligent body circulation neural network; s3, obtaining a transcriptome processing scheme through sampling based on the intelligent agent operation space and the intelligent agent circulating neural network; S4, applying a transcriptome processing scheme to transcriptome observation data to obtain processed transcriptome data; s5, inputting the processed transcriptome data and independent modal observation data into a reinforcement learning environment module, quantifying the consistency degree of the transcriptome data and the independent modal observation data and taking the quantitative evaluation result of a transcriptome processing scheme; S6, calculating a reward function of the transcriptome processing scheme; S7, calculating the parameter gradient of the intelligent body circulation neural network based on the reward function obtained in the S6, and reversely transmitting and updating the intelligent body circulation neural network parameters; S8, judging whether the maximum iteration times are reached, if so, stopping the iterative optimization process, and taking the highest rewarded processing scheme in the iterative process as a final design result, otherwise, returning to S3 to continuously update the relevant parameters in the intelligent body circulation neural network; The transcriptome processing scheme is represented as an ordered sequence of operations: wherein S represents a transcriptome processing scheme, T represents the length of the operation sequence and is determined by when the agent took the terminator, S t represents a one-hot encoded operation vector at the T position; the sampling process in S3 includes: initializing a start operation s 0 , a hidden variable h 0 and a memory vector c 0 to be a zero vector; The agent circulation neural network selects operations in the transcriptome processing scheme one by one in a circulation mode, and the circulation can be ended until a terminator is selected, so that a sampling process is completed, and the sampling process can be characterized as: Wherein S represents a transcriptome processing scheme, S i represents a single-heat encoding operation vector at the i position, S t+1 represents a single-heat encoding operation vector at the t+1 position, T represents the length of an operation sequence, S i ) i≤t represents an existing operation sequence with the length of T; in the step S5, the process of quantitatively evaluating the transcriptome processing scheme by the reinforcement learning environment module may be characterized as follows: wherein X and Representing the transcriptional expression matrices before and after treatment respectively, S represents transcriptome treatment protocol, Y represents matrix or vector of independent modality observation data, and environmental function f Env (.) is used to express the matrix of the treated transcription Mapping to space where Y is located, and evaluating function f Eval (.) for calculation Degree of agreement with Y and outputting quantitative assessment of transcriptome treatment protocol r (X, Y); in the step S6, the bonus function is expressed as: Where T represents the length of the transcriptome processing scheme S, and if the transcriptome processing scheme S fails to output meaningful processing results, the reward function will output a negative number-r penalty (r penalty > 0) for penalty.
2. The method for automated transcriptome processing scenario according to claim 1, wherein the agent computation space comprises a plurality of scaling, normalization and feature selection operations and a terminator representing the end of the computation sequence.
3. The method for automatically designing a reinforcement learning-based transcriptome processing scenario of claim 1, wherein said agent-circulating neural network comprises a basic time unit for weight sharing; the implementation process of the basic time unit comprises the following steps: step 1, given the current sequence position t, adopting an encoder to encode the single-heat encoding operation vector s t of the current position into a form of low-dimensional dense representation: Wherein f Encoder (·) represents a neural network; The low-dimensional vector is represented by a vector, Step 2, constructing a long-short-time memory network, wherein the input of the long-short-time memory network is the current coding operation vector The memory c t of the existing operation sequence and the current hidden state h t , the output of the network is the predicted result for the t+1 position operation type: wherein f LSTM (DEG) represents a long-short-time memory network, h t+1 represents a prediction result of a t+1 position operation hidden variable, C t+1 denotes the updated sequence memory, And 3, adopting a decoder to interpret the predicted result h t+1 of the t+1 position operation hidden variable obtained in the step 2 into a form of a single-hot coding vector: d t+1 ＝f Decoder (h t+1 ) s t+1 ～P(s t+1 |(s i ) i≤t )＝Cat(d t+1 ) Where f Decoder (·) represents a neural network with the output layer employing a softmax activation function, the output d t+1 ∈R H of the decoder network provides a probability that each dimension in S t+1 is set to 1, and the type distribution Cat (·) is employed to model the probability for selecting a new operation S t+1 based on the existing operation (S i ) i≤t .
4. The method for automatically designing a transcriptome processing scheme based on reinforcement learning according to claim 1, wherein the process of applying the transcriptome processing scheme to transcriptome observation data in S4 is characterized as: Wherein S (-) represents a mapping function based on the transcriptome processing scheme S, which is capable of mapping the transcript data from the original pre-processing space to the post-processing space, X and Representing the transcriptional expression matrices before and after treatment respectively, M represents the number of samples of transcription data and remains unchanged before and after the treatment, and N 1 、N 2 represents the number of genes before and after the treatment, respectively.
5. The method for automatically designing a transcriptome processing scheme according to claim 3, wherein in S7, the process of calculating the parameter gradient of the agent circulation neural network specifically comprises: Step 1, calculating an objective function related to an agent circulation neural network, wherein the objective function is expressed as: Wherein, the Representing the expected value of a transcriptome processing scheme S obtained by sampling, A (X, S, Y) representing a dominance function defined as r * (X, S, Y) -b, b representing an exponential moving average of a historical dominance function A (X, S, Y), θ representing all parameters in an agent cyclic neural network { f Encoder (·),f LSTM (·),f Decoder () }, ρ (θ) representing a ratio of old to new probabilities in the agent cyclic neural network, clip () ' S used for clipping ρ (θ) between 1- ε and 1+ε, the combination of the function and a minimum function min () ' S used for limiting the update amplitude of the agent cyclic neural network parameters and accelerating the convergence of the network, γ >0 representing an superparameter used for balancing the contribution of entropy constraint in a target expected value, d t representing a probability distribution vector output by the decoder network and being 1, entropy () ' S) representing the information entropy corresponding to probability distribution d t ; Step 2, calculating parameter gradients of the agent circulation neural network and back-propagating and updating the parameters of the agent circulation neural network: wherein ζ >0 represents a parameter update step size; Representing the gradient of the parameter theta of the intelligent body circulation neural network.

Description

Automatic design method of transcriptome processing scheme based on reinforcement learning Technical Field The invention belongs to the technical field of transcriptome processing and reinforcement learning, and particularly relates to an automatic design method of a transcriptome processing scheme based on reinforcement learning. Background Advances in transcriptome sequencing technology have driven the explosive development of a variety of transcriptome analysis methods such as differential expression analysis, cell class identification, developmental trajectory reasoning, and the like. As an essential element of the analysis of transcriptomes, transcription processing aims at removing technical noise during sequencing, thereby restoring true gene expression data of important biological value. The existing popular transcriptome processing schemes can be roughly divided into two types, 1) a first type is an expert-dominated processing scheme which is defined by an expert based on priori calculation experience and is an ordered operation sequence consisting of scaling, normalization and feature selection operation, and 2) a second type is a deep neural network-dominated processing scheme which adopts probability statistics priori modeling aiming at real gene expression or a more advanced self-supervision paradigm as a loss function of a neural network so as to restrict the network to realize restoration of the real expression. Although both of the above-described transcriptome treatment schemes can achieve restoration of true gene expression, quantitative evaluation of the transcriptome treatment scheme is limited by the fact that the true gene expression level is practically difficult to be effectively observed, which is a technical problem to be solved by those skilled in the art, which causes difficulty for those skilled in the relevant arts to select an appropriate transcriptome treatment scheme for a specified transcriptome analysis task. Sequencing technology has evolved from initial transcriptome single-mode observations to multi-mode simultaneous observations in the last decade. With the assistance of multi-mode data, the quantitative evaluation difficulty of the transcriptome treatment scheme can be realized, namely the quality of the transcriptome treatment scheme can be quantitatively evaluated through the consistency between the transcriptome and other independent mode observation data after the comparison and calculation. Accordingly, breakthroughs in the sequencing technology plane present new challenges to transcriptome resolution methods, enabling a completely new innovation from "selecting appropriate transcriptome processing schemes" to "designing optimal transcriptome processing schemes from scratch for a given multimodal analysis task". The design can best capture tissue biological information contained in the transcriptome, such as spatial stereo distribution of cell types, and can reveal the design mode of a treatment scheme for connecting the transcriptome with other modes, so that related technicians are inspired to design a better transcriptome treatment scheme. However, those skilled in the art currently lack custom tools for achieving de novo design of optimal transcription processing schemes for a given multimodal analysis task. Therefore, how to provide a method capable of designing a corresponding optimal transcription processing scheme from scratch in the scenario of specifying a multi-modal analysis task is a technical problem to be solved by those skilled in the art. Disclosure of Invention In view of the foregoing, the present invention provides an automatic design method for a transcriptome processing scheme based on reinforcement learning, which solves at least some of the above technical problems, and is capable of designing a corresponding optimal transcriptome processing scheme from the head in a scenario of specifying a multi-modal analysis task. The embodiment of the invention provides an automatic design method of a transcriptome processing scheme based on reinforcement learning, which comprises the following steps: s1, constructing a reinforcement learning agent module, wherein the reinforcement learning agent module comprises an agent operation space and an agent circulation neural network; S2, initializing related parameters in the intelligent body circulation neural network; s3, obtaining a transcriptome processing scheme through sampling based on the intelligent agent operation space and the intelligent agent circulating neural network; S4, applying a transcriptome processing scheme to transcriptome observation data to obtain processed transcriptome data; s5, inputting the processed transcriptome data and independent modal observation data into a reinforcement learning environment module, quantifying the consistency degree of the transcriptome data and the independent modal observation data and taking the quantitative evaluation result of a transcriptome proces