IN-202547033103-A - ARTIFICIAL INTELLIGENCE-ENABLED SYSTEM AND METHOD FOR AUTHORING A SCIENTIFIC DOCUMENT
Abstract
A system and a method for automatically authoring a scientific document using a machine learning model and natural language processing (NLP) with minimal user intervention are provided. The system configures a scientific document template including multiple sections based on scientific document requirements. The system maps the sections in the scientific document template with content from the source documents by executing a section mapping algorithm and automatically generates the scientific document. The mapping includes matching the sections of the scientific document template with sections extracted from the source documents, and predicting appropriate sections in the scientific document template for rendering the content from the source documents based on the matching using the machine learning model and historical scientific document information. The system executes one or more content editing functions, for example, tense conversion, additional information fetch and display, post-text to in-text conversion, etc., on the scientific document using NLP.
Inventors
- RAMANUJAM ILANGO
Dates
- Publication Date
- 20250425
- Application Date
- 20250403
- Priority Date
- 20220908
Claims (20)
- CLAIMSI claim:5 1. A system for automatically authoring a scientific document using a machine learning model and natural language processing with minimal user intervention, the system comprising:at least one processor;a non-transitory, computer-readable storage medium operably and communicatively coupled to the at least one processor and configured to store computer program instructions executable by the at least one processor; and15 an automated authoring engine defining the computer program instructions, which when executed by the at least one processor, cause the at least one processor to:configure a scientific document template comprising a plurality of sections based on scientific document requirements, wherein one or more of the plurality 20 of sections are configured as feedback to retrain the machine learning model;receive and store a plurality of source documents in a source database;automatically extract and pre-process content from the plurality of source 25 documents using natural language processing;> map the sections configured in the scientific document template with the content from the plurality of source documents by executing a section mapping algorithm, wherein the mapping comprises:matching the sections of the scientific document template with sections extracted from the plurality of source documents; andpredicting appropriate sections from among the plurality of sections in the 5 scientific document template for rendering the content from the plurality of source documents based on the matching using the machine learning model and historical scientific document information acquired from users;automatically generate the scientific document by rendering the content from 10 the plurality of source documents into the predicted sections of the scientific document template; and> execute one or more of a plurality of content editing functions on the automatically generated scientific document using natural language processing.-
- 2. The system of claim 1, wherein the plurality of sections of the scientific document template comprises fixed sections and user-configurable sub-sections.-
- 3. The system of claim 1, wherein the plurality of content editing functions comprises:automatically converting tenses of the content in the automatically generated scientific document based on user preferences by executing a natural language generation algorithm;25 highlighting data fields in the automatically generated scientific document that require attention and editing from a user; andexecuting post-text to in-text conversion.30
- 4. The system of claim 1, wherein one or more of the computer program instructions defined by the automated authoring engine, when executed by the at least one processor, cause the at least one processor to interpret in-text tables from the plurality of source documents and generate an in-text table summary by executing a natural language understanding algorithm.- 5
- 5. The system of claim 1, wherein one or more of the computer program instructions defined by the automated authoring engine, when executed by the at least one processor, cause the at least one processor to fetch and display, in response to a user input, additional information from the plurality of source documents for selection and rendering into one or more of the plurality of sections in the scientific document 10 template, wherein the user input is configured as additional feedback to retrain the machine learning model.-
- 6. The system of claim 1, wherein one or more of the computer program instructions defined by the automated authoring engine, when executed by the at least one 15 processor, cause the at least one processor to provide selective access of one of: an entirety of the automatically generated scientific document and one or more sections of the automatically generated scientific document, to one or more co-authors of the automatically generated scientific document for performing one or more actions on the automatically generated scientific document.-
- 7. The system of claim 1, wherein one or more of the computer program instructions defined by the automated authoring engine, when executed by the at least one processor, cause the at least one processor to generate and render a preview of the automatically generated scientific document on a preview screen of a user interface for 25 subsequent editing and automatic regeneration of the scientific document.-
- 8. The system of claim 1, wherein one or more of the computer program instructions defined by the automated authoring engine, when executed by the at least one processor, cause the at least one processor to generate and render one or more of a 30 plurality of reports comprising:a traceability report configured to display the mapping of the sections with the source documents containing the rendered content;an audit report configured to record and display actions performed on the 5 automatically generated scientific document; and- a version history report configured to display versions of the automatically generated scientific document.- 10
- 9. The system of claim 1, wherein the scientific document is a clinical study report, and wherein the scientific document requirements based on which the scientific document template is configured comprise regulatory authority guidelines, and wherein the regulatory authority guidelines comprise the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) E3 guidelines 15 defined by the ICH. -
- 10. The system of claim 1, wherein the plurality of source documents comprises a protocol document, a statistical analysis plan document, a case report form, safety narratives, intext tables, post-text tables, summary reports, and tables, listings, and figures.
- 11. A method employing an automated authoring engine defining computer program instructions executable by at least one processor for automatically authoring a scientific document using a machine learning model and natural language processing with minimal user intervention, the method comprising:configuring a scientific document template comprising a plurality of sections based on scientific document requirements, wherein one or more of the plurality of sections are configured as feedback to retrain the machine learning model;30 receiving and storing a plurality of source documents in a source database;automatically extracting and pre-processing content from the plurality of source documents using natural language processing;mapping the sections configured in the scientific document template with the 5 content from the plurality of source documents by executing a section mapping algorithm, wherein the mapping comprises:> matching the sections of the scientific document template with sections extracted from the plurality of source documents; andpredicting appropriate sections from among the plurality of sections in the scientific document template for rendering the content from the plurality of source documents based on the matching using the machine learning model and historical scientific document information acquired from users;automatically generating the scientific document by rendering the content from the plurality of source documents into the predicted sections of the scientific document template; and20 executing one or more of a plurality of content editing functions on the automatically generated scientific document using natural language processing.-
- 12. The method of claim 11, wherein the plurality of sections of the scientific document template comprises fixed sections and user-configurable sub-sections.
- 13. The method of claim 11, wherein the plurality of content editing functions comprises:automatically converting tenses of the content in the automatically generated scientific document based on user preferences by executing a natural language 30 generation algorithm;fetching and displaying, in response to a user input, additional information from the plurality of source documents for selection and rendering into one or more of the plurality of sections in the scientific document template, wherein the user input is configured as additional feedback to retrain the machine learning model;highlighting data fields in the automatically generated scientific document that require attention and editing from a user; andexecuting post-text to in-text conversion.-
- 14. The method of claim 11, further comprising interpreting in-text tables from the plurality of source documents and generating an in-text table summary by executing a natural language understanding algorithm.- 15
- 15. The method of claim 11, further comprising providing selective access of one of: an entirety of the automatically generated scientific document and one or more sections of the automatically generated scientific document, to one or more co-authors of the automatically generated scientific document for performing one or more actions on the automatically generated scientific document.
- 16. The method of claim 11, further comprising generating and rendering a preview of the automatically generated scientific document on a preview screen of a user interface for subsequent editing and automatic regeneration of the scientific document.- 25
- 17. The method of claim 11, further comprising generating and rendering one or more of a plurality of reports comprising: - a traceability report configured to display the mapping of the sections with the source documents containing the rendered content;an audit report configured to record and display actions performed on the automatically generated scientific document; anda version history report configured to display versions of the automatically 5 generated scientific document.
- 18. A non-transitory, computer-readable storage medium having embodied thereon, computer program instructions executable by at least one processor for automatically authoring a scientific document using a machine learning model and natural language 10 processing with minimal user intervention, the computer program instructions when executed by the at least one processor cause the at least one processor to:configure a scientific document template comprising a plurality of sections based on scientific document requirements, wherein one or more of the plurality of 15 sections are configured as feedback to retrain the machine learning model;receive and store a plurality of source documents in a source database;automatically extract and pre-process content from the plurality of source 20 documents using natural language processing;> map the sections configured in the scientific document template with the content from the plurality of source documents by executing a section mapping algorithm, wherein the mapping comprises:matching the sections of the scientific document template with sections extracted from the plurality of source documents; andpredicting appropriate sections from among the plurality of sections in the 30 scientific document template for rendering the content from the plurality of source documents based on the matching using the machine learning model and historical scientific document information acquired from users;automatically generate the scientific document by rendering the content from the 5 plurality of source documents into the predicted sections of the scientific document template; and> execute one or more of a plurality of content editing functions on the automatically generated scientific document using natural language processing.
- 19. The non-transitory, computer-readable storage medium of claim 18, wherein the plurality of content editing functions comprises:automatically converting tenses of the content in the automatically generated 15 scientific document based on user preferences by executing a natural language generation algorithm;fetching and displaying, in response to a user input, additional information from the plurality of source documents for selection and rendering into one or more of the 20 plurality of sections in the scientific document template, wherein the user input is configured as additional feedback to retrain the machine learning model;> highlighting data fields in the automatically generated scientific document that require attention and editing from a user; andexecuting post-text to in-text conversion.
- 20. The non-transitory, computer-readable storage medium of claim 18, wherein one or more of the computer program instructions when executed by the at least one processor 30 further cause the at least one processor to interpret in-text tables from the plurality of source documents and generate an in-text table summary by executing a natural language understanding algorithm.## **6. DATE AND SIGNATURE**
Description
COMPLETE SPECIFICATION****1. TITLE OF THE INVENTION:** Artificial Intelligence-Enabled System And Method For Authoring A Scientific Document**2. APPLICANT(S)**(a) NAME: Symbiance Inc.(b) NATIONALITY: USA(c) ADDRESS: 500 College Rd E, Suite 415, Princeton, NJ 08540 USA# **3. PREAMBLE TO THE DESCRIPTION**# **PROVISIONAL**The following specification describes the invention.# **COMPLETE**The following specification particularly describes the invention and the manner in which it is to be performed.# **4. DESCRIPTION**# CROSS-REFERENCE TO RELATED APPLICATIONS5 ** [0001]** This application is a national phase application of PCT international application no. PC/US22/46344, filed in the United States Patent and Trademark Office on May 11, 2022 which claims priority to and the benefit of the non-provisional patent application titled "Artificial Intelligence-Enabled System And Method For Authoring A Scientific Document", application number 17/940,019, filed in the United States Patent and 10 Trademark Office on September 08, 2022. The specification of the above referenced patent application is incorporated herein by reference in its entirety.# BACKGROUND15 ** [0002]** Scientific documents such as clinical study reports (CSRs) are lengthy and manually written or typed documents that describe clinical trial methods and results. These scientific documents are comprehensive documents comprising a substantial amount of information collected from multiple source documents such as protocol, a statistical analysis plan, a case report form, safety narratives, in-text tables, post-text tables, and 20 tables, listings, and figures (TLFs). For example, the CSR is similar to a peer-reviewed manuscript comprising an introduction, a background, summary sections, appendices, experimental methods, descriptions of study subjects, efficacy results, safety results, conclusions, etc. The CSR describes endpoints of a clinical study or outcomes being researched, provides detailed information on how data was collected and analyzed, and 25 confirms whether the endpoints were met or outcomes were achieved. The CSR helps regulatory agencies determine whether a potential new medication is safe and effective.** [0003]** Authoring scientific documents such as clinical study reports (CSRs) is time consuming and requires substantial manual effort. Writers, for example, medical writers, 30 typically spend days, weeks, and even months to prepare CSRs. The writers typically copy and paste content from other sources to relevant sections of a CSR template and spend asubstantial amount of time writing safety narratives and interpretations of study results from the tables, listings, and figures (TLFs). Moreover, editing or correcting these lengthy scientific documents, identifying and incorporating missing information therewithin, implementing efficient co-authoring, correcting grammar, and maintaining consistency of 5 language and grammar throughout these scientific documents, while adhering to guidelines defined by regulatory authorities, are substantially difficult, time consuming, and subject to several errors, thereby affecting quality of these scientific documents. Furthermore, incorporating and interpreting objects such as tables, listings, figures, etc., in these scientific documents add to the extensive manual efforts that need to be taken by writers.** [0004]** Hence, there is a long-felt need for an artificial intelligence (AI)-enabled system and method for automatically authoring a scientific document, for example, a clinical study report, using a machine learning model and natural language processing with minimal user intervention, while addressing the above-recited problems associated with the related art.# SUMMARY OF THE INVENTION** [0005]** This summary is provided to introduce a selection of concepts in a simplified form that are further disclosed in the detailed description of the invention. This summary 20 is not intended to determine the scope of the claimed subject matter.** [0006]** The artificial intelligence (AI)-enabled system and method disclosed herein address the above-recited need for automatically authoring a scientific document, for example, a clinical study report (CSR), using a machine learning (ML) model and natural 25 language processing (NLP) with minimal user intervention. The AI-enabled system uses AI techniques to extract content from source documents and automatically author or write the scientific document. The AI-enabled system reads from unstructured source documents and summarizes the content into another document, that is, the automatically generated scientific document. The AI-enabled system reduces manual efforts and time consumed in 30 preparing CSRs and other scientific documents substantially, thereby allowing users to focus more on discussion points and interpretations. The AI-enabled system accelerates authoring of scientific documents using ML and NLP comprising natural language generation (NLG) and natu