Search

CN-121999859-A - Transcript isomer transition event identification method, device, equipment and memory

CN121999859ACN 121999859 ACN121999859 ACN 121999859ACN-121999859-A

Abstract

The present application relates to a method, apparatus, computer device, memory and computer program product for identifying an isoform conversion event of a transcript. The method comprises the steps of obtaining expression matrixes of different transcripts under different cell types according to the transcript expression matrixes, performing matrix conversion on the expression matrixes to obtain a first target matrix, performing ranking assignment on elements in the first target matrix according to a size sequence to obtain a second target matrix, screening data of the first target matrix according to the second target matrix to screen out a third target matrix, and calculating first target values of the target transcripts under different cell types based on the elements in the third target matrix, wherein the first target values are used for describing the occurrence intensity of the isomer conversion event, so that the isomer conversion event of the transcripts can be effectively, accurately identified according to the occurrence intensity of the isomer conversion event, and transcripts with the isomer conversion event are obtained.

Inventors

  • WANG XUE
  • LIN XIUMEI
  • LIU CHANG
  • CHEN LIANG
  • ZHOU JIE
  • LIU CHUANYU
  • LIU LONGQI

Assignees

  • 杭州华大生命科学研究院
  • 深圳华大生命科学研究院

Dates

Publication Date
20260508
Application Date
20241108

Claims (11)

  1. 1. A method for identifying an isoform conversion event of a transcript, said method comprising: Obtaining expression matrixes of different transcripts under each cell type according to the transcript expression matrixes; performing matrix transformation on the expression matrix to obtain a first target matrix, wherein the first target matrix is used for describing the expression level of the different transcripts in each cell type; Performing ranking assignment on the elements in the first target matrix according to the size sequence to obtain a second target matrix; Data screening is carried out on the first target matrix according to the second target matrix so as to screen a third target matrix, wherein the third target matrix is used for describing the expression difference of transcripts in different cell types; Calculating a first target value of the target transcript under different cell types based on each element in the third target matrix, wherein the first target value is used for describing the occurrence intensity of the isomer conversion event; based on the intensity of occurrence of the isoform switching event, transcripts are identified that have undergone an isoform transcription event.
  2. 2. The method of claim 1, wherein obtaining an expression matrix for different transcripts for each cell type based on the transcript expression matrix comprises: Extracting data of a plurality of cell types from the transcript expression matrix; And summing the data of the cell types according to the types to obtain an expression matrix of different transcripts under the cell types.
  3. 3. The method of claim 1, wherein performing matrix transformation on the expression matrix to obtain a first target matrix comprises: calculating the expression quantity and the value among the expression quantities of all transcripts of each cell type according to the expression quantities of different transcripts in the expression matrix under each cell type; Calculating the ratio between the expression quantity of different transcripts in the expression matrix under each cell type and the expression quantity and value; And generating a first target matrix according to the ratio of the expression quantity of different transcripts in the expression matrix under each cell type to the expression quantity sum value.
  4. 4. The method according to claim 1, wherein the method further comprises: screening data of the same transcript with the change of the ranking under different cell types from the second target matrix respectively to obtain a target matrix of the target transcript under each cell type; the step of screening the first target matrix according to the second target matrix to screen a third target matrix includes: And screening the first target matrix according to the target matrix of the target transcript under each cell type to obtain a third target matrix of the target transcript under each cell type.
  5. 5. The method of claim 1, wherein the cell types comprise a first cell type and a second cell type, and wherein calculating a first target value for the target transcript for a different cell type based on each element in the third target matrix comprises: Calculating residual values between elements corresponding to the first cell type and elements corresponding to the second cell type in the third target matrix; according to the quotient value between each residual value and the number of transcripts in the third target matrix; Summing the quotient values to obtain a residual sum of elements of the target transcript with the ranking change in the first cell type and the second cell type.
  6. 6. The method of claim 5, wherein the sum of residuals comprises a sum of squares of residuals, the method further comprising: respectively calculating the square value of the product between each residual value and a preset multiple; The quotient value according to the residue value and the number of transcripts in the third target matrix comprises: according to the quotient between each square value and the number of transcripts in the third target matrix; Summing the quotient values to obtain a residual sum of elements of the target transcript with the change in the ranking of the first cell type and the second cell type, wherein the residual sum comprises: summing the quotient values obtained based on the square values to obtain a residual square sum of elements of the target transcript having a ranking change in the first cell type and the second cell type.
  7. 7. The method according to any one of claims 1 to 6, further comprising: Performing a significance test for transcripts in each cell type for which the isoform transcriptional event occurred, resulting in a second target value indicative of significance; And in the transcripts with the isomer transcription event, screening according to the second target value to obtain screened transcripts with the isomer transcription event.
  8. 8. An isoform conversion event recognition device for a transcript, the device comprising: The acquisition module is used for acquiring the expression matrixes of different transcripts under each cell type according to the transcript expression matrixes; The transformation module is used for carrying out matrix transformation on the expression matrix to obtain a first target matrix, wherein the first target matrix is used for describing the expression level of the different transcripts in each cell type; The assignment module is used for carrying out ranking assignment on the elements in the first target matrix according to the size sequence to obtain a second target matrix; The first screening module is used for carrying out data screening on the first target matrix according to the second target matrix so as to screen out a third target matrix, wherein the third target matrix is used for describing the expression difference of transcripts in different cell types; The first calculation module is used for calculating a first target value of the target transcript under different cell types based on each element in the third target matrix, wherein the first target value is used for describing the occurrence intensity of an isomer conversion event; and the second screening module is used for identifying transcripts with the isomer transcription event according to the occurrence intensity of the isomer conversion event.
  9. 9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
  10. 10. A computer readable memory, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.
  11. 11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

Description

Transcript isomer transition event identification method, device, equipment and memory Technical Field The application relates to the field of biotechnology, in particular to a method, a device, equipment and a memory for identifying an isomer conversion event of a transcript. Background Currently, many examples showing important changes in function indicate the importance of analyzing isoforms, one particularly prominent example of which is pyruvate kinase. In normal adult homeostasis, cells use the adult subtype (M1) that supports oxidative phosphorylation, and almost all cancer cells use the embryo isoform (M2), which promotes aerobic glycolysis, one of the hallmarks of the disorder. Such changes in isoform usage are referred to as "isoform switching" or "isoform switching," and may not be detected if the data is analyzed only at the gene level. Thus, there is no solution to evaluate how different cell types can be used on different transcripts of the same gene. Disclosure of Invention In view of the above, it is desirable to provide a method, apparatus, device and memory for identifying an isoform conversion event of a transcript, which can effectively and accurately identify the isoform conversion event of the transcript. In a first aspect, the present application provides a method for identifying an isoform switching event of a transcript, said method comprising: Obtaining expression matrixes of different transcripts under each cell type according to the transcript expression matrixes; performing matrix transformation on the expression matrix to obtain a first target matrix, wherein the first target matrix is used for describing the expression level of the different transcripts in each cell type; Performing ranking assignment on the elements in the first target matrix according to the size sequence to obtain a second target matrix; Data screening is carried out on the first target matrix according to the second target matrix so as to screen a third target matrix, wherein the third target matrix is used for describing the expression difference of transcripts in different cell types; Calculating a first target value of the target transcript under different cell types based on each element in the third target matrix, wherein the first target value is used for describing the occurrence intensity of the isomer conversion event; based on the intensity of occurrence of the isoform switching event, transcripts are identified that have undergone an isoform transcription event. In one embodiment, the obtaining an expression matrix for different transcripts under each cell type based on the transcript expression matrix comprises: Extracting data of a plurality of cell types from the transcript expression matrix; And summing the data of the cell types according to the types to obtain an expression matrix of different transcripts under the cell types. In one embodiment, the performing matrix conversion on the expression matrix to obtain a first target matrix includes: calculating the expression quantity and the value among the expression quantities of all transcripts of each cell type according to the expression quantities of different transcripts in the expression matrix under each cell type; Calculating the ratio between the expression quantity of different transcripts in the expression matrix under each cell type and the expression quantity and value; And generating a first target matrix according to the ratio of the expression quantity of different transcripts in the expression matrix under each cell type to the expression quantity sum value. In one embodiment, the method further comprises: screening data of the same transcript with the change of the ranking under different cell types from the second target matrix respectively to obtain a target matrix of the target transcript under each cell type; the step of screening the first target matrix according to the second target matrix to screen a third target matrix includes: And screening the first target matrix according to the target matrix of the target transcript under each cell type to obtain a third target matrix of the target transcript under each cell type. In one embodiment, the cell types include a first cell type and a second cell type, and the calculating a first target value for the target transcript for a different cell type based on each element in the third target matrix includes: Calculating residual values between elements corresponding to the first cell type and elements corresponding to the second cell type in the third target matrix; according to the quotient value between each residual value and the number of transcripts in the third target matrix; Summing the quotient values to obtain a residual sum of elements of the target transcript with the ranking change in the first cell type and the second cell type. In one embodiment, the sum of residuals includes a sum of squares of residuals, and the method further includes: respectively calculati