WO-2026094421-A1 - RELEVANCE CALCULATION DEVICE, RETRIEVAL DEVICE, METHOD, AND PROGRAM THEREFOR
Abstract
[Problem] To provide a relevance calculation device for calculating the relevance of document data to an input keyword. [Solution] The present invention provides a relevance calculation device for calculating the relevance between document data and a keyword, the relevance calculation device having: an input unit that receives input of text data; a document data feature extraction unit that generates a document data feature vector from document text data having two or more sentences; a keyword feature extraction unit that generates a keyword feature vector from keyword text data; and a relevance calculation unit that compares the document data feature vector generated by the document data feature extraction unit with the keyword feature vector generated by the keyword feature extraction unit to calculate the relevance.
Inventors
- LI, CHAO
- MANAGI, Shunsuke
- KEELEY, Alexander Ryota
- TAKEDA, SHUTARO
- SEKI, Daikichi
Assignees
- 株式会社aiESG
Dates
- Publication Date
- 20260507
- Application Date
- 20250904
- Priority Date
- 20241101
Claims (9)
- A relevance calculation device that calculates the degree of relevance between document data and keywords, An input section that accepts text data input, A document data feature extraction unit that generates a document data feature vector from document text data having two or more sentences, A keyword feature extraction unit that generates keyword feature vectors from keyword text data, A relevance calculation device comprising: a document data feature vector generated by the document data feature extraction unit and a keyword feature vector generated by the keyword feature extraction unit, and a relevance calculation unit that calculates the degree of relevance.
- The correlation calculation device according to claim 1 further, A document data preprocessing unit that generates document token data having a predetermined length from document text data input from the input unit, The system includes a keyword preprocessing unit that generates keyword token data having a predetermined length from keyword text data input from the input unit, The document data feature extraction unit generates a document data feature vector based on the document token data generated by the document data preprocessing unit, The keyword feature extraction unit is a relevance calculation device that generates keyword feature vectors based on keyword token data generated by the keyword preprocessing unit.
- The relevance calculation device according to claim 1, wherein the relevance calculation unit is a neural network model trained to assign a relevance of 1 when a keyword matches the content of the document data.
- The correlation calculation device according to claim 1 further, The document data feature vector storage unit stores the document data feature vectors generated by the document data feature extraction unit, The aforementioned relevance calculation unit, upon receiving keyword input from the input unit, compares the keyword feature vector generated by the keyword feature extraction unit with the document data feature vector read from the document data feature vector storage unit to calculate the degree of relevance.
- A search device that calculates the degree of relevance between document data and keywords to search for highly relevant document data, An input section that accepts text data input, A document data feature extraction unit that generates a document data feature vector from document text data having two or more sentences, A keyword feature extraction unit that generates keyword feature vectors from keyword text data, A document data feature vector storage unit stores the document data feature vector generated by the document data feature extraction unit, The relevance calculation unit, upon receiving keyword input from the input unit, calculates the degree of relevance by comparing the keyword feature vector generated by the keyword feature extraction unit with two or more document data feature vectors read from the document data feature vector storage unit. A search device comprising: a search unit that searches for highly relevant document data based on the relevance calculated by the relevance calculation unit.
- A method for calculating the degree of relevance between paragraphs and keywords using a computer, Steps to accept text data input, A step of generating a paragraph feature vector from paragraph text data having two or more sentences, The steps include generating keyword feature vectors from keyword text data, A method for calculating relevance, comprising the steps of: comparing the generated paragraph feature vector with the generated keyword feature vector and calculating the degree of relevance.
- A search method that uses a computer to calculate the degree of relevance between document data and keywords, thereby searching for highly relevant document data. Steps to accept text data input, A step of generating a document data feature vector from document text data having two or more sentences, The steps include storing the generated document data feature vector, The steps include generating keyword feature vectors from keyword text data, Upon receiving keyword input, the process involves calculating the degree of relevance by comparing the generated keyword feature vector with two or more stored document data feature vectors, and A search method comprising the steps of: searching for highly relevant document data based on the calculated relevance.
- A relevance calculation program that calculates the degree of relevance between paragraphs and keywords, Steps to accept text data input, A step of generating a paragraph feature vector from paragraph text data having two or more sentences, The steps include generating keyword feature vectors from keyword text data, A relevance calculation program characterized by causing a computer to perform a process comprising the steps of: comparing the generated paragraph feature vector with the keyword feature vector generated by the keyword feature extraction unit and calculating the degree of relevance.
- A search program that uses a computer to calculate the degree of relevance between document data and keywords, thereby searching for highly relevant document data. Steps to accept text data input, A step of generating a document data feature vector from document text data having two or more sentences, A step of storing the generated document data feature vector, The steps include generating keyword feature vectors from keyword text data, Upon receiving keyword input, the process involves calculating the degree of relevance by comparing the generated keyword feature vector with two or more stored document data feature vectors, and A search program characterized by causing a computer to perform a process that includes the step of searching for highly relevant document data based on the calculated relevance.
Description
Relevance calculation device, search device, method, and program thereof This invention relates to a relevance calculation device, method, and program for calculating the degree of relevance of document data to input keywords. The process involves searching and extracting data related to desired keywords from digitized documents, news articles, and academic papers stored on the internet and in databases. For example, Patent Document 1 discloses a method that performs morphological analysis on an input search query to calculate word vectors, and for document data, calculates word vectors, word-sentence vectors, and word-document vectors. It then searches for document data with a high degree of word-document vector relevance and displays the relevant parts of those documents. Figure 1 is a block diagram showing an example of the functional configuration of the relevance calculation device according to the present invention.Figure 2 is a flowchart showing the data preprocessing in the document data preprocessing unit 120 or keyword preprocessing unit 150 of the present invention.Figure 3 shows an example of a neural network configuration for the document data feature extraction unit 130 according to the present invention.Figure 4 shows an example of the configuration of the transformer block layer 303.Figure 5 shows an example of the configuration of the residual connection section 305.Figure 6 shows an example of a neural network configuration for the keyword feature extraction unit 160.Figure 7 shows an example of the neural network configuration of the relevance calculation unit 170.Figure 8 is a flowchart illustrating the procedure for training the document data feature extraction unit 130, keyword feature extraction unit 160, and relevance calculation unit 170 of the present invention.Figure 9 is a flowchart illustrating the process by which the relevance calculation device 100 performs relevance calculation using the trained neural network of the present invention.Figure 10 is a block diagram showing an example of the functional configuration of a search device, which is a second embodiment of the present invention.Figure 11 is a flowchart illustrating the process of performing a search using a trained neural network in the search device 200 in a second embodiment of the present invention.This is an example of a hardware configuration diagram for the relevance calculation device 100 or search device 200 in the present invention. The embodiments for carrying out the present invention will be described below with reference to the drawings. In this specification and the drawings, components having substantially the same function and configuration are denoted by the same reference numerals, and redundant explanations are omitted. (First embodiment) Figure 1 is a block diagram showing an example of the functional configuration of the relevance calculation device in the present invention. The relevance calculation device 100 of the present invention includes an input unit 110, a document data preprocessing unit 120, a document data feature extraction unit 130, a document data feature vector storage unit 140, a keyword preprocessing unit 150, a keyword feature extraction unit 160, and a relevance calculation unit 170. The input unit 110 accepts text data. This text data includes, for example, document text data (text data of document data) and keyword text data (text data of keyword data). The language of the text data is not limited to Japanese; it can be English, Chinese, or any other language to which the natural language processing model can be applied. Document data is a collection of two or more sentences, such as news articles or academic papers. Keyword data is, for example, one or more words or phrases. Keyword data may also be words or phrases set by the user for document searching. Of the text data received by the input unit 110, document data is input to the document data preprocessing unit 120, and keywords are input to the keyword preprocessing unit 150. The document data preprocessing unit 120 performs preprocessing on the document text data input from the input unit 110. Specifically, the document data preprocessing unit 120 generates document token data of a predetermined length from the document text data in order to generate a feature vector for the document data. More precisely, it divides the document data into words and phrases, maps the divided words and phrases to token IDs, and then generates document token data of a predetermined length. By making the document token data of a predetermined length, the generation of document data feature vectors becomes easier even if the length of the document data varies. The document data feature extraction unit 130 generates document data feature vectors from document text data containing two or more sentences. Specifically, it generates document data feature vectors based on document token data generated by the document data preprocessing unit 120.