CN-122021948-A - Method and system for conception quantitative research scheme based on large language model
Abstract
The invention discloses a method and a system for designing a quantitative research scheme based on a large language model, which aim at challenges and demands in the whole flow of the quantitative research scheme designed by a HCI novice researcher, analyze user input information by adopting the large language model, construct search conditions based on analysis results to perform academic index API search, and avoid semantic deviation of a vectorization method by combining an original text injection method, so that the generation scheme is ensured to have a complete proof chain and verifiability, and the problem of lack of traceability of evidence in the existing method is solved. The method enables the generation of the quantitative research scheme or the modules in the quantitative research scheme to preferentially use the confirmed evidence by establishing the reference pool, thereby improving the reliability of the quantitative research scheme.
Inventors
- LIU QI
- JIN YIKE
- ZHOU YIXIANG
- PAN JIAMAN
- LI ZEJIAN
Assignees
- 浙江大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260415
Claims (9)
- 1. A method for designing a quantitative study plan based on a large language model, comprising: Analyzing a research background and a target input by a user through a large language model, constructing a Boolean search condition based on the analyzed elements, searching a plurality of documents through an academic index API based on the Boolean search condition, extracting a corresponding research scheme from the whole text segment of each searched document through an original text injection method, adding the document selected by the user from the plurality of documents into a reference pool, and adding the reference pool and the research scheme into a model context in a set output format; and splicing the research background and the target input by the user, the model context and the output format of the quantitative research scheme into a first prompt word, and generating the quantitative research scheme through a large language model based on the first prompt word.
- 2. The method for designing quantitative research schemes based on large language models according to claim 1, wherein the generated quantitative research schemes, the research background and object input by the user, the model context and the research scheme evaluation template are spliced into second prompt words, and the generated quantitative research schemes are evaluated through the large language models based on the second prompt words to obtain evaluation results and give modification suggestions.
- 3. The method of designing a quantitative study solution based on a large language model according to claim 2, wherein a third hint word is constructed from the generated quantitative study solution or different modules in the quantitative study solution, including study assumptions, experimental design, data analysis, and expected results, evaluation results, modification suggestions, and user-given improvement goals, an optimized quantitative study solution or an optimized module is generated through the large language model based on the third hint word, and modified contents are marked.
- 4. A method of designing a quantitative study based on a large language model according to claim 3, wherein the generation of the quantitative study, the evaluation of the quantitative study, and the optimization of the quantitative study or the modules within the quantitative study can be performed simultaneously by a plurality of threads, and each thread has a separate reference pool and generation history.
- 5. The method for designing quantitative research schemes based on large language models according to claim 1, wherein the research background and the target input by the user are analyzed through the large language models, the analyzed elements are structured query elements, and the information input by the user comprises keywords, research interests, target variables and method preferences.
- 6. The method for designing a quantitative study solution based on a large language model according to claim 1, wherein the searching of the plurality of documents through the academic index API based on the boolean search condition comprises: And matching based on titles, abstracts, keywords and journals of the documents, filtering according to publication years, introduced quantity and fields, and screening out the documents meeting accessibility.
- 7. The method for designing quantitative research schemes based on large language models according to claim 1, wherein adding the retrieved whole text segment of each document into the model context in the set output format by the original text injection method comprises: calling the CORE or other full library to obtain full text fragments of the screened documents, blocking the full text fragments, extracting research schemes from each block in a set output format, and integrating the research schemes of each block to obtain the research schemes of the screened documents.
- 8. The method of designing a quantitative study solution based on a large language model according to claim 1, wherein adding documents selected by a user from the plurality of documents to the citation pool comprises: based on the metadata, abstract and full text of the screened documents, constructing a structured list, constructing structured list entries by a plurality of screened documents, displaying to a user, receiving a selection instruction of the user, and adding the structured list corresponding to the documents selected by the user into a reference pool.
- 9. A large language model based concept quantitative study plan system comprising a memory and one or more processors, the memory having executable code stored therein, the one or more processors, when executing the executable code, for implementing the large language model based concept quantitative study plan method of any one of claims 1-8.
Description
Method and system for conception quantitative research scheme based on large language model Technical Field The invention belongs to the field of artificial intelligence, and particularly relates to a method and a system for designing a quantitative research scheme based on a large language model. Background The quantitative research is a core method for ensuring the transparency and repeatability of the research in the field of human-computer interaction disciplines, and covers links such as hypothesis proposal, experimental design, sample size estimation, statistical test, result presentation and the like, and has higher requirements on standardization and logic. The high-quality research scheme not only can promote academic output, but also has a key role in scientific research training and guidance of a teacher. However, novice researchers often do not know how to proceed with quantitative studies due to lack of domain knowledge and methodological training of the system, and it is difficult to discover potential problems in the solution in time. While the traditional method relying on document retrieval and manual writing has limited efficiency, the general large language model can generate texts, but has the problems of outdated knowledge, semantic drift, insufficient academic traceability and the like, and is difficult to meet the professional requirements of quantitative research. Existing aids also often lack quantitative study-oriented structured output and systematic evaluation mechanisms. By combining the generation capacity of the large language model with the real-time updating of the academic database, a workflow system integrating retrieval, generation and evaluation can be established. Through the engineered prompt template and the context management mechanism, the output can be ensured to meet the academic specification, and sources can be explicitly quoted in a research scheme, so that the reliability and traceability of the result are improved. The U.S. patent publication No. US 2025/0209273 A1 discloses a research scheme conception method based on a large language model, which is characterized in that (1) a dual-agent architecture is adopted, wherein one agent is used for raising or clarifying research problems, the other agent is responsible for generating corresponding research methods and scheme drafts, (2) a retrieval mechanism of a dual warehouse is adopted, comprising global library recall and user library fine screening, (3) an iterative flow of dynamic verification and method synthesis is adopted, the whole process is realized through a multi-step iterative flow, and users can interact with a system at key nodes to promote gradual improvement of the scheme. In addition, the method defines a four-step use flow of the system, and defines specific steps of solution generation by decomposing a core research problem into sub-problems, searching related information based on the sub-problems and generating paths of candidate solutions by integrating search results. However, the functional boundaries of the solution focus on research problem proposition, research motivation verification and primary solution synthesis, and quantitative research links are not incorporated into the functional system, and structural flow design and technical support for quantitative research scenes (such as experimental schemes, measurement indexes, data analysis methods and the like) are lacking. The invention patent application with publication number CN 116991977A discloses a domain vector knowledge accurate retrieval method based on a large language model, and provides a method for splitting a candidate document into sub-texts, vectorizing each sub-text, establishing a file vector database, simultaneously structuring key information as metadata, and storing the key information, wherein the key information is firstly matched based on structuring conditions and then vectorized twice matched during retrieval so as to improve the retrieval accuracy and the retrieval efficiency of unstructured texts. The patent belongs to the technical field of text data processing, aims to solve the technical pain points of low accuracy and efficiency in the existing unstructured text data retrieval, provides a large language model-based domain vector knowledge accurate retrieval method and device, and is suitable for efficient and accurate retrieval scenes of various unstructured text resources (such as PDF, word, XML and other format files). However, this approach focuses mainly on the retrieval module itself, and specific applications and verification have not been developed in academic research scenarios. It is worth noting that the retrieval and analysis of academic papers are of great importance to the solution conception of researchers, not only to enhance their understanding of the knowledge of the relevant fields, but also to provide a powerful support for the selection and demonstration of methodologies. Disclosure