Search

CN-121996739-A - Automatic processing method and system for research and development data based on keyword extraction

CN121996739ACN 121996739 ACN121996739 ACN 121996739ACN-121996739-A

Abstract

The application discloses an automatic processing method and system of research and development data based on keyword extraction, and relates to the technical field of automatic data processing, wherein the method comprises the steps of utilizing a text extraction algorithm to initially extract text information in chemical research and development data, and obtaining keywords; the method comprises the steps of extracting keywords for the second time by utilizing proper nouns of chemicals to obtain a target text set, calling target texts matched with user processing requirements in the target text set and position coordinates thereof in response to the user processing requirements, constructing a requirement result table according to the target texts, calling data information corresponding to the target texts in chemical research and development data to the requirement result table according to the position coordinates, carrying out abnormal recognition on the data information in the requirement result table according to data ranges and data conflict relations in historical chemical related data, and marking abnormal results. The application has the beneficial effects of improving the processing efficiency and accuracy of the chemical research and development data.

Inventors

  • BAO ENWEI
  • ZHANG FUJIA
  • GU SHANYUAN
  • BAI XIN

Assignees

  • 浙江保融科技股份有限公司

Dates

Publication Date
20260508
Application Date
20260409

Claims (10)

  1. 1. The research and development data automatic processing method based on keyword extraction is characterized by comprising the following steps: the text information in the chemical research and development data is initially extracted by using a text extraction algorithm, and keywords are obtained; extracting keywords for the second time by using proper nouns of the chemical drugs to obtain a target text set, wherein each target text in the target text set has a corresponding position coordinate; Responding to the user processing requirement, and calling a target text and a position coordinate thereof, which are matched with the user processing requirement, in a target text set; constructing a demand result table according to the target text; according to the position coordinates, data information corresponding to the target text in the chemical research and development data is called to a required result table; And carrying out abnormal recognition on the data information in the demand result table according to the data range and the data conflict relation in the historical chemical related data, and marking an abnormal result.
  2. 2. The automated processing method of research and development data based on keyword extraction as claimed in claim 1, wherein: The abnormal identification is carried out on the data information in the demand result table according to the data range and the data conflict relation in the historical chemical related data, and the marking of the abnormal result comprises the following steps: acquiring a first type conflict relation according to the simultaneous existence proportion of each chemical proper noun type in the historical chemical related data; dividing historical chemical related data according to proper noun types of chemical to obtain a first classification set; acquiring a data range corresponding to the proper noun type of the chemical medicine by using a first classification set, wherein the data range at least comprises a text range and a numerical range; acquiring a second set of classifications from historical chemical-related data having the same combination of chemical proper noun types; Constructing a second combination conflict relation according to the data range difference of the first classification set and the second classification set; constructing a data conflict relationship by using the first type conflict relationship and the second combination conflict relationship; And determining an abnormal result according to the data range and the data conflict relation.
  3. 3. The automated processing method of research and development data based on keyword extraction as claimed in claim 2, wherein: The determining the abnormal result according to the data range and the data conflict relation comprises the following steps: If the data information accords with the data range, executing secondary anomaly identification according to the first type conflict relation and the second combination conflict relation; if the data information does not accord with the data range, marking as an abnormal result; If the conflict relation exists according to the first type conflict relation and the second combination conflict relation, marking as an abnormal result.
  4. 4. A method for automatically processing development data based on keyword extraction as claimed in claim 2 or 3, wherein: The constructing a second combination conflict relation according to the data range differences of the first classification set and the second classification set comprises: Acquiring data range differences containing text difference information and numerical difference information according to the first classification set and the second classification set; And constructing a second combination conflict relation according to the data range difference changes of the combination of the special noun types of different chemicals.
  5. 5. The method for automatically processing research and development data based on keyword extraction of claim 1, further comprising: Training according to historical chemical research and development forms and chemical proper nouns by utilizing Qwen-Turbo model architecture, learning the chemical proper nouns and a form construction mode, and constructing an experimental data processing model, wherein the experimental data processing model responds to user processing demands.
  6. 6. The method for automatically processing research and development data based on keyword extraction of claim 1, further comprising: Constructing a term mapping relation according to the synonymous correlation and the antisense correlation of the proper nouns of the chemical medicines; initial recognition is carried out on the chemical research and development data according to antisense correlation of proper nouns of the chemical, and antisense correlation data is obtained; And carrying out term unified replacement on the chemical research and development data according to synonymous correlation of proper nouns of the chemical, and replacing the data in an ineffective antisense correlation mode to obtain the chemical research and development data after data preprocessing.
  7. 7. The method for automatically processing research and development data based on keyword extraction of claim 1, further comprising: constructing a first correction relation according to the physicochemical property and the application of the chemical; Traversing chemical research and development data according to the first correction relation to obtain data to be corrected; Acquiring correction information according to the data source to be corrected and the correction condition of the historical source, and recording the correction information and the position coordinates corresponding to the correction information, wherein the correction information at least comprises the correction proportion and the correction type of the data source to be corrected; And acquiring the identification data to be corrected according to the position coordinates of the abnormal result and the position coordinates corresponding to the correction information, carrying out abnormal identification on the identification data to be corrected according to the data range and the data conflict relation, and judging whether the identification data to be corrected is displayed.
  8. 8. The automated keyword extraction-based research and development data processing method of claim 7, wherein: the obtaining the identification data to be corrected according to the position coordinates of the abnormal result and the position coordinates corresponding to the correction information includes: And if the position coordinates of the abnormal result are the same as the position coordinates of the correction information, the correction information is called, and the data information in the abnormal result is corrected according to the correction information to obtain the identification data to be corrected.
  9. 9. The automated keyword extraction-based research and development data processing method of claim 8, wherein: the step of carrying out abnormal recognition on the identification data to be corrected according to the data range and the data conflict relation, and the step of judging whether the identification data to be corrected is displayed comprises the following steps: outputting the number of the identification values to be corrected according to the correction proportion; Generating identification values to be corrected corresponding to the generated quantity according to the correction type and the original data information; And carrying out abnormal recognition on the recognition value to be corrected according to the data range and the data conflict relation, screening out the recognition value to be corrected which is recognized as an abnormal result, and reserving and displaying the recognition value to be corrected which is recognized as a non-abnormal result.
  10. 10. A keyword extraction-based development data automation processing system for implementing the method of any one of claims 1 to 9, comprising: a configuration database for storing proper nouns of chemicals; The target text extraction module is used for carrying out initial extraction on text information in the chemical research and development data by utilizing a text extraction algorithm to obtain keywords, and calling proper nouns of the chemical in the configuration database to carry out secondary extraction on the keywords to obtain a target text set; The form construction module is used for retrieving a target text matched with the processing requirement of the user in the target text set and the position coordinates thereof, constructing a requirement result form according to the target text, and retrieving data information corresponding to the target text in the chemical research and development data according to the position coordinates to the requirement result form; And the abnormality identification module is used for carrying out abnormality identification on the data information in the demand result table according to the data range and the data conflict relation in the historical chemical related data and marking an abnormal result.

Description

Automatic processing method and system for research and development data based on keyword extraction Technical Field The application relates to the technical field of data automation processing, in particular to a research and development data automation processing method and system based on keyword extraction. Background In the field of chemical research and development experiments, research and development enterprises need to submit regularly researched and developed medicines to corresponding authoritative monitoring laboratories, the laboratories finally submit experimental monitoring data to product research and development enterprises in the form of reports, and operators of the research and development enterprises need to regularly arrange the experimental data, so that the experimental data are arranged into formatted data for daily operation and management. However, the chemical research and development data has the characteristics of multiple proper nouns, heterogeneous data dimension, non-uniform format and extremely high accuracy requirement, and the related technology relies on manual screening of key information from massive data, so that time and labor are consumed, manual subjective judgment or information omission is easy to occur, and further the subsequent research and development test is influenced. In the related technology of partially automatically extracting data and integrating the data table, only the data is extracted and tidied, whether the data is abnormal or not still needs to be manually judged by oneself, and the data abnormality caused by the calling error is easy to occur, so that the tidied data table still needs to consume a large amount of time to screen out abnormal data. The patent of the method, the device, the medium and the program product for processing the table data, the publication number of CN120407645A and the publication date of 2025 and month 08 and 01 specifically discloses that the method comprises the steps of inputting table data input by an object through an interactive interface and an associated processing request into a preset model, generating a tool call requirement corresponding to the processing request, sending the tool call requirement to a task processing engine, inputting tool information returned by the task processing engine into the preset model to obtain a plurality of target tools, wherein the plurality of target tools comprise reading tools and processing tools, inputting a reading result and function information of the processing tools into the preset model when the task processing engine is determined to finish reading the table data through the reading tools, obtaining a processing task corresponding to the processing request, sending the processing task to the task processing engine, and displaying an execution result at the interactive interface in response to receiving the execution result from the task processing engine. The scheme realizes automatic processing of data, but can not identify abnormal data, and is not suitable for a scene requiring higher data accuracy such as chemical research and development data processing. Disclosure of Invention Aiming at the problem that the accuracy and the efficiency cannot be considered in the integrated processing of a large amount of chemical research and development data in the data processing method in the prior art, the application provides the automatic processing method and the system for the research and development data based on keyword extraction, the automatic processing and the integration of the chemical research and development data are realized through keyword identification extraction and position coordinate matching, the abnormal identification is carried out on data information through the data range and the data conflict relation in the related data of historical chemical, abnormal data guidance is provided for a user, the user can conveniently process the abnormal data, and the accuracy and the efficiency of the automatic processing of the data are improved. The technical scheme includes that text information in chemical research and development data is initially extracted by means of a text extraction algorithm to obtain keywords, the keywords are secondarily extracted by means of chemical proper nouns to obtain target text sets, each target text in the target text sets has corresponding position coordinates, target texts matched with user processing requirements in the target text sets and position coordinates of the target texts are extracted in response to the user processing requirements, a requirement result table is built according to the target texts, data information corresponding to the target texts in the chemical research and development data is extracted to the requirement result table according to the position coordinates, abnormal identification is conducted on the data information in the requirement result table according to data range and data conflict