CN-116912866-B - Medical manifest error correction method, device, equipment and computer readable storage medium
Abstract
The invention relates to the field of medical science and technology data processing, and discloses a medical list error correction method, which comprises the steps of performing row and column segmentation on a medical list to be corrected to obtain a first medical list, constructing a list structure feature of the first medical list and a cell structure feature of each cell, calculating a structure feature difference value between the cell structure feature and the list structure feature, identifying cells to be corrected according to the structure feature difference value, performing structure error correction on the cells to be corrected to obtain a second medical list, constructing a context of each cell in the second medical list, generating a prediction text of the corresponding cell according to the context, calculating a text difference value between the prediction text and a real text of each cell, identifying the cells to be corrected according to the text difference value, and performing text error correction on the cells to be corrected. The invention also provides a medical manifest error correction device, electronic equipment and a computer readable storage medium. The invention can improve the accuracy of the medical manifest error correction.
Inventors
- XU XIAN
Assignees
- 平安科技(深圳)有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20230628
Claims (10)
- 1. A method of medical checklist error correction, the method comprising: Performing row-column segmentation operation on the medical list to be corrected to obtain a first medical list; Constructing a list structure feature of the first medical list and a cell structure feature of each cell in the first medical list, wherein the list structure feature refers to an overall distribution feature of the first medical list and comprises a list head distribution area feature, a list content distribution area feature, a built-in list feature, a list number and a list number of rows and columns of a list, the list head distribution area feature comprises a list head, a list head and a cascade list head, and the cell structure feature refers to a position of each cell in the first medical list, a position relation between adjacent cells and a text data relation feature, and comprises the row number and the list number of each cell and the list head feature corresponding to the cell; calculating a structural feature difference between the cell structural feature of each cell and the list structural feature; Identifying the unit cells to be corrected according to the structural feature difference value, and carrying out structural correction on the unit cells to be corrected to obtain a second medical list; Building a context of each cell in the second medical list, and generating a prediction text of the corresponding cell according to the context; calculating a text difference between the predicted text and the real text of each cell in the second medical manifest; And identifying the cells to be corrected according to the text difference value, and performing text correction on the cells to be corrected to obtain a corrected medical list.
- 2. The medical checklist fault-correction method as claimed in claim 1, wherein said performing a rank-splitting operation on the medical checklist to be repaired to obtain a first medical checklist comprises: Performing edge detection on the medical list to be corrected to obtain a grid structure corresponding to the medical list to be corrected; extracting characteristics of each grid in the grid-shaped structure to obtain grid characteristics; performing row-column position information identification on the corresponding grids according to the grid characteristics by using a machine learning algorithm; and splitting each grid into cells according to the row-column position information of each grid to obtain the first medical list.
- 3. The medical manifest correction method of claim 1, wherein the constructing the manifest structure feature of the first medical manifest comprises: acquiring the value of each preset form dimension in the first medical list according to the preset form dimension; carrying out normalized coding on the value of each preset table dimension to obtain a table dimension code; And carrying out matrixing and splicing on each table dimension code to obtain the table structural features of the first medical list.
- 4. The medical manifest correction method of claim 1, wherein said calculating structural feature differences between cell structural features of each of said cells and said list structural features comprises: performing vector conversion on the cell structure characteristics of each cell to obtain a cell vector; Performing vector conversion on the list structure features to obtain a list vector; and calculating a distance value between the cell vector of each cell and the form vector, and taking the calculated distance value as the structural characteristic difference value.
- 5. The medical manifest correction method of claim 1, wherein the constructing the context of each cell in the second medical manifest, generating the predictive text for the corresponding cell based on the context, comprises: Acquiring text content of each cell and position information of each cell; Generating a context corresponding to each cell according to the text content and the position information of each cell by utilizing a pre-trained language model; sequentially selecting one cell in the second medical list as a target cell; and generating a predicted text of the target cell by using the pre-trained language model according to the context corresponding to the target cell.
- 6. The medical manifest correction method of claim 1, wherein prior to performing a rank-splitting operation on the medical manifest to be corrected, the method further comprises: performing image enhancement and denoising operation on the medical manifest to be corrected by using an image processing technology; Identifying a distortion region in the medical list to be corrected by using a form detection algorithm; and correcting the distortion area by using a table correction algorithm.
- 7. A medical checklist error correction apparatus, the apparatus comprising: the row and column segmentation module is used for executing row and column segmentation operation on the medical list to be corrected to obtain a first medical list; The structure feature extraction module is used for constructing a list structure feature of the first medical list and a cell structure feature of each cell in the first medical list, wherein the list structure feature refers to an overall distribution feature of the first medical list and comprises a list head distribution region feature, a list content distribution region feature, an embedded list feature and a list row number and a list column number, the list head distribution region feature comprises a list head, a list head and a cascade list head, and the cell structure feature refers to a position of each cell in the first medical list, a position relation between adjacent cells and a text data relation feature and comprises a list head feature corresponding to each cell, wherein the list row number and the list column number are included in each cell; The structure error correction module is used for calculating a structure characteristic difference value between the structure characteristic of each cell and the list structure characteristic, identifying the cell to be corrected according to the structure characteristic difference value, and carrying out structure error correction on the cell to be corrected to obtain a second medical list; The context generation module is used for constructing the context of each cell in the second medical list and generating a prediction text of the corresponding cell according to the context; And the context error correction module is used for calculating a text difference value between the predicted text and the real text of each cell in the second medical list, identifying the cell to be corrected according to the text difference value, and carrying out text error correction on the cell to be corrected to obtain the medical list after error correction.
- 8. The medical manifest correction apparatus according to claim 7, wherein the rank segmentation module performs the rank segmentation operation from the medical manifest to be corrected by: Performing edge detection on the medical list to be corrected to obtain a grid structure corresponding to the medical list to be corrected; extracting characteristics of each grid in the grid-shaped structure to obtain grid characteristics; performing row-column position information identification on the corresponding grids according to the grid characteristics by using a machine learning algorithm; and splitting each grid into cells according to the row-column position information of each grid to obtain the first medical list.
- 9. An electronic device, the electronic device comprising: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the medical manifest correction method according to any one of claims 1 to 6.
- 10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the medical manifest correction method according to any one of claims 1 to 6.
Description
Medical manifest error correction method, device, equipment and computer readable storage medium Technical Field The present invention relates to the field of medical science and technology data processing, and in particular, to a method and apparatus for correcting a medical manifest, an electronic device, and a computer readable storage medium. Background The medical checklist is rich in content carried by the medical checklist, such as personal information of patients, medical history information, physical information, symptoms, examination information and the like. The corresponding medical checklist is relatively complex in table structure, including merging cells, nesting tables, and the presence of multiple levels of headers. The current commonly applied OCR (Optical Character Recognition ) technology cannot better process complex table structures when recognizing the medical checklist, so that the problem that the recognition result is easy to be misplaced or misplaced is solved, meanwhile, the existing OCR technology has limited capability in terms of context understanding and error correction, in complex tables, particularly in the case of misplaced, the existing OCR technology cannot well understand the whole semantics and structure of the tables, and the error correction accuracy of the medical checklist is not high. Disclosure of Invention The invention provides a medical manifest error correction method, a device, electronic equipment and a computer readable storage medium, and mainly aims at the accuracy of medical manifest error correction. In order to achieve the above object, the present invention provides a medical manifest-based error correction method, including: Performing row-column segmentation operation on the medical list to be corrected to obtain a first medical list; constructing a list structure feature of the first medical list and a cell structure feature of each cell in the first medical list; calculating a structural feature difference between the cell structural feature of each cell and the list structural feature; Identifying the unit cells to be corrected according to the structural feature difference value, and carrying out structural correction on the unit cells to be corrected to obtain a second medical list; Building a context of each cell in the second medical list, and generating a prediction text of the corresponding cell according to the context; calculating a text difference between the predicted text and the real text of each cell in the second medical manifest; And identifying the cells to be corrected according to the text difference value, and performing text correction on the cells to be corrected to obtain a corrected medical list. Optionally, the performing a row-column segmentation operation on the medical manifest to be corrected to obtain a first medical manifest includes: Performing edge detection on the medical list to be corrected to obtain a grid structure corresponding to the medical list to be corrected; and extracting the characteristics of each grid in the grid-shaped structure to obtain grid characteristics. Performing row-column position information identification on the corresponding grids according to the grid characteristics by using a machine learning algorithm; and splitting each grid into cells according to the row-column position information of each grid to obtain the first medical list. Optionally, the constructing the list structure feature of the first medical list includes: acquiring the value of each preset form dimension in the first medical list according to the preset form dimension; carrying out normalized coding on the value of each preset table dimension to obtain a table dimension code; And carrying out matrixing and splicing on each table dimension code to obtain the table structural features of the first medical list. Optionally, the calculating a structural feature difference between the cell structural feature of each cell and the table structural feature includes: performing vector conversion on the cell structure characteristics of each cell to obtain a cell vector; Performing vector conversion on the list structure features to obtain a list vector; and calculating a distance value between the cell vector of each cell and the form vector, and taking the calculated distance value as the structural characteristic difference value. Optionally, the constructing the context of each cell in the second medical list, generating the predicted text of the corresponding cell according to the context, includes: Acquiring text content of each cell and position information of each cell; Generating a context corresponding to each cell according to the text content and the position information of each cell by utilizing a pre-trained language model; sequentially selecting one cell in the second medical list as a target cell; and generating a predicted text of the target cell by using the pre-trained language model according to the cont