Search

US-12620467-B2 - Data extraction and data-structure transformation for improving data accessibility and usage consistency

US12620467B2US 12620467 B2US12620467 B2US 12620467B2US-12620467-B2

Abstract

A system and computer-implemented method includes querying a data source for treatment plan templates based on a medical condition of a patient. Raw data related to treatment plan templates is received from the data source. The raw data includes text information. Raw data is parsed, and tags are identified. elements are extracted from the tags using complex regular expression and keywords. Formatted data is constructed from the elements in a machine-readable format. The elements are arranged in a hierarchical structure within the formatted data. Structured data having data frames is generated based on the formatted data. The structured data is provided to a medical provider for review. An update in the structured data is received from the medical provider based on a deviation in schedule provided in the treatment plan templates. The treatment plans are generated based on incorporation of the update into the structured data.

Inventors

  • Aneesha Kakkesal
  • Brianne Glasbrenner
  • Narasimha Prasad
  • Sayon Chakraborti
  • Rohith Shetty
  • Sandeepa Aithal
  • Amit Mudugal Jagadeesh
  • Disha Bundela

Assignees

  • CERNER INNOVATION, INC.

Dates

Publication Date
20260505
Application Date
20240516

Claims (20)

  1. 1 . A computer-implemented method, the method comprising: querying at least one data source for one or more treatment plan templates, wherein the at least one data source is queried based on a medical condition of a patient; receiving raw data related to the one or more treatment plan templates from the at least one data source in response to the querying, wherein the raw data includes text information; parsing the raw data and identifying one or more tags from the parsed raw data; extracting elements required for preparing treatment plan from the one or more tags using complex regular expression and a standard set of keywords; constructing formatted data from the extracted elements in a machine-readable format, wherein the elements is arranged in a hierarchical structure within the formatted data; generating structured data having one or more data frames based on the formatted data, wherein each data frame of the one or more data frames includes information related to a particular phase of a treatment schedule proposed in the one or more treatment plan templates; providing the structured data to one or more devices associated with a medical provider for review, wherein the structured data is provided to the medical provider in a natural language format; receiving an update in the structured data from the one or more devices associated with the medical provider, wherein the update is received based on a deviation in existing schedule provided in the one or more treatment plan templates; and generating one or more treatment plans based on incorporation of the update into the structured data, wherein the one or more treatment plans are generated in at least one of the machine-readable format and the natural language format.
  2. 2 . The method of claim 1 , wherein the formatted data is constructed using one or more headings present in the text information of the raw data.
  3. 3 . The method of claim 1 , wherein the formatted data is accessible by one or more applications pertaining to a health-care system.
  4. 4 . The method of claim 1 , wherein the raw data is received from the at least one data source using an Application Programming Interface (API).
  5. 5 . The method of claim 1 , wherein the raw data is received in form of one or more of an Extensible Markup Language (XML) format, a Hypertext Markup Language (HTML) format, and a Portable Document Format (PDF) format.
  6. 6 . The method of claim 1 , wherein the machine-readable format is a Java Script Object Notation (JSON) format.
  7. 7 . The method of claim 1 , wherein the structured data is provided to the one or more devices associated with the medical provider in a spreadsheet format.
  8. 8 . The method of claim 1 , wherein the one or more treatment plans are generated in XML format.
  9. 9 . The method of claim 1 , wherein the complex regular expression and the standard set of keywords are dynamically updated in real time.
  10. 10 . The method of claim 1 , wherein the one or more tags include a Unique Identification Number (UIN) or Universal Unique Identifier (UUID) that are generated programmatically.
  11. 11 . The method of claim 1 , wherein the medical condition of the patient is determined based on information obtained from a patient information database.
  12. 12 . The method of claim 1 , further comprising: identifying one or more errors in the structured data based on the review of the medical provider; updating the structured data based on the one or more errors; and generating the one or more treatment plans based on the updated structured data.
  13. 13 . A system comprising: one or more processors; and a memory coupled to the one or more processors, the memory storing a plurality of instructions executable by the one or more processors, the plurality of instructions that when executed by the one or more processors cause the one or more processors to perform a set of operations comprising: querying at least one data source for one or more treatment plan templates, wherein the at least one data source is queried based on a medical condition of a patient; receiving raw data related to the one or more treatment plan templates from the at least one data source in response to the querying, wherein the raw data includes text information; parsing the raw data and identifying one or more tags from the parsed raw data; extracting elements required for preparing treatment plan from the one or more tags using complex regular expression and a standard set of keywords; constructing formatted data from the extracted elements in a machine-readable format, wherein the elements are arranged in a hierarchical structure within the formatted data; generating structured data having one or more data frames based on the formatted data, wherein each data frame of the one or more data frames includes information related to a particular phase of a treatment schedule proposed in the one or more treatment plan templates; providing the structured data to one or more devices associated with a medical provider for review, wherein the structured data is provided to the medical provider in a natural language format; receiving an update in the structured data from the one or more devices associated with the medical provider, wherein the update is received based on a deviation in existing schedule provided in the one or more treatment plan templates; and generating one or more treatment plans based on incorporation of the update into the structured data, wherein the one or more treatment plans are generated in at least one of the machine-readable format and the natural language format.
  14. 14 . The system of claim 13 , wherein the raw data is received from the at least one data source using an Application Programming Interface (API).
  15. 15 . The system of claim 13 , wherein the raw data is received in form of one or more of an Extensible Markup Language (XML) format, a Hypertext Markup Language (HTML) format, and a Portable Document Format (PDF) format.
  16. 16 . The system of claim 13 , wherein the machine-readable format is a Java Script Object Notation (JSON) format.
  17. 17 . The system of claim 13 , wherein the structured data is provided to the one or more devices associated with the medical provider in a spreadsheet format.
  18. 18 . The system of claim 13 , wherein the one or more treatment plans are generated in XML format.
  19. 19 . The system of claim 13 , wherein the one or more tags include a Unique Identification Number (UIN) or Universal Unique Identifier (UUID) that are generated programmatically.
  20. 20 . A non-transitory computer-readable medium storing a plurality of instructions executable by one or more processors that cause the one or more processors to perform operations comprising: querying at least one data source for one or more treatment plan templates, wherein the at least one data source is queried based on a medical condition of a patient; receiving raw data related to the one or more treatment plan templates from the at least one data source in response to the querying, wherein the raw data includes text information; parsing the raw data and identifying one or more tags from the parsed raw data; extracting elements from the one or more tags using complex regular expression and a standard set of keywords; constructing formatted data from the extracted elements in a machine-readable format, wherein the elements are arranged in a hierarchical structure within the formatted data; generating structured data having one or more data frames based on the formatted data, wherein each data frame of the one or more data frames includes information related to a particular phase of a treatment schedule proposed in the one or more treatment plan templates; providing the structured data to one or more devices associated with a medical provider for review, wherein the structured data is provided to the medical provider in a natural language format; receiving an update in the structured data from the one or more devices associated with the medical provider, wherein the update is received based on a deviation in existing schedule provided in the one or more treatment plan templates; and generating one or more treatment plans based on incorporation of the update into the structured data, wherein the one or more treatment plans are generated in at least one of the machine-readable format and the natural language format.

Description

FIELD The present disclosure relates generally to select extraction and transformation of data embedded within files. More particularly, the present disclosure relates to systems and methods that detect select data content and that transforms the structure of the detected data to improve accessibility and usage consistency of the select data. BACKGROUND An oncology treatment plan is a comprehensive and individualized strategy designed by a team of healthcare professionals to guide the care of a cancer patient. It outlines the specific treatments, therapies, and interventions that will be used to manage the patient's cancer. This plan considers factors such as the type and stage of cancer, the patient's overall health, and their treatment goals. The importance of an oncology treatment plan cannot be overstated, as it serves as a roadmap for both the medical team and the patient, ensuring that the effective and appropriate treatments are administered, while also helping to manage side effects and monitor progress throughout the cancer journey. It plays an important role in improving the patient's chances of successful treatment and recovery. Currently, the process of creating the oncology treatment plan involves a manual approach of gathering data from a data source and feeding relevant data into a health information system. For example, a user device receives an input from the analyst and translates the input into a query to the data source like National Comprehensive Cancer Network® (NCCN®) server. The data source provides templates associated with the treatment plan to the user device. The analyst may download the templates. The templates obtained from the data source may be present in different data formats, such as an Extensible Markup Language (XML) format, a Hypertext Markup Language (HTML) format, and a Portable Document Format (PDF) format. Further, the data obtained from the research institute includes a large quantity of data. After downloading, the analysts manually navigates through the large quantity of data to extract relevant data related to the treatment plan, such as medications, cycle definitions, dosing, frequency, routes of administration, premedication, and notes. Further, the analyst feeds the relevant data into the health information system to create a final treatment plan. For extracting the relevant data, the analyst has to traverse through multiple flowcharts and large amount of data. Manual data entry and navigating through the large quantity of data is a tedious task and prone to errors. BRIEF SUMMARY In an embodiment, a computer-implemented method includes querying at least one data source for one or more treatment plan templates. At least one data source is queried based on the medical condition of a patient. Raw data related to one or more treatment plan templates is received from at least one data source in response to the querying. The raw data includes text information. Raw data is parsed, and one or more tags are identified. Elements required for preparing treatment plan are extracted from the one or more tags using complex regular expressions and a standard set of keywords. Formatted data is constructed from the extracted elements in a machine-readable format. The elements are arranged in a hierarchical structure within the formatted data. Structured data having one or more data frames based on the formatted data is generated. Each data frame of the one or more data frames includes information related to a particular phase of a treatment schedule proposed in the one or more treatment plan templates. The structured data is provided to one or more devices associated with a medical provider for review in a natural language format. An update in the structured data is received from the one or more devices associated with the medical provider based on a deviation in existing schedule provided in the one or more treatment plan templates. The one or more treatment plans are generated based on incorporation of the update into the structured data. The one or more treatment plans are generated in at least one of the machine-readable format and the natural language format. The formatted data is constructed using one or more headings present in the text information of the raw data. The formatted data is accessible by one or more applications pertaining to a health-care system. The raw data is received from at least one data source using an Application Programming Interface (API). The raw data is received in form of one or more of an Extensible Markup Language (XML) format, a Hypertext Markup Language (HTML) format, and a Portable Document Format (PDF) format. The machine-readable format is a Java Script Object Notation (JSON) format. The structured data is provided to the one or more devices associated with the medical provider in a spreadsheet format. One or more treatment plans are generated in XML format. The complex regular expression and the standard set of keywords are dynamicall