US-12625899-B2 - Document search system and document search method
Abstract
Highly accurate document search, especially intellectual property-related document search, is achieved with a simple input method. A processing portion has a function of generating text analysis data from text data input to an input portion; a function of extracting a search word from words included in the text analysis data; and a function of generating first search data from the search word on the basis of weight dictionary data and thesaurus data. A memory portion stores second search data generated when the first search data is modified by a user. The processing portion updates the thesaurus data in accordance with the second search data.
Inventors
- Kazuki HIGASHI
- Junpei MOMO
Assignees
- SEMICONDUCTOR ENERGY LABORATORY CO., LTD.
Dates
- Publication Date
- 20260512
- Application Date
- 20240816
- Priority Date
- 20190426
Claims (9)
- 1 . A semiconductor device comprising: a processing portion, the processing portion comprising: a processor comprising a transistor; and a memory portion in which a program is stored, wherein the processor is configured to perform a document search comprising the steps of: generating weight dictionary data and thesaurus data on the basis of a plurality of pieces of reference document data stored in a database; generating text analysis data from text data; extracting a search word from words included in the text analysis data; generating first search data from the search word on the basis of the weight dictionary data and the thesaurus data; modifying the first search data to generate second search data by modifying a value of weight data of the first search data; storing the second search data in the memory portion; updating the thesaurus data by adding a product of a contribution ratio and a difference between the first search data and second search data after the second search data has been stored in the memory portion; generating ranking data by giving scores to the plurality of pieces of reference document data on the basis of the second search data and ranking the plurality of pieces of reference document data on the basis of the scores to the thesaurus data, wherein reference text analysis data is generated from the plurality of pieces of reference document data, wherein a plurality of keywords and related terms of the keywords are extracted from words included in the reference text analysis data, wherein the value of weight data is modified in accordance with a degree of relevance of the related terms of the keywords by a user, wherein the weight dictionary data is generated by extracting appearance frequencies of the keywords from the words included in the reference text analysis data and adding, to each of the plurality of keywords, a first weight based on the appearance frequency, and wherein the first weight is a value based on an inverse document frequency of the keyword in the reference text analysis data.
- 2 . The semiconductor device according to claim 1 , wherein the thesaurus data is generated by adding a second weight to each of the related terms.
- 3 . The semiconductor device according to claim 2 , wherein the second weight is a product of the first weight of the keyword and a value based on a similarity degree or a distance between a distributed representation vector of the related term and a distributed representation vector of the keyword.
- 4 . The semiconductor device according to claim 3 , wherein the distributed representation vector is generated with use of a neural network.
- 5 . The semiconductor device according to claim 1 , wherein, in updating the thesaurus data, the second search data is modified by a plurality of users.
- 6 . The semiconductor device according to claim 1 , wherein the user modifies the value of the weight data using a display device.
- 7 . A semiconductor device comprising: a processing portion, the processing portion comprising: a processor comprising a transistor; and a memory portion in which a program is stored, wherein the processor is configured to perform a document search comprising the steps of: generating weight dictionary data and thesaurus data on the basis of a plurality of pieces of reference document data stored in a database; generating text analysis data from text data; extracting a search word from words included in the text analysis data; generating first search data from the search word on the basis of the weight dictionary data and the thesaurus data; modifying the first search data to generate second search data by modifying a value of weight data of the first search data; storing the second search data in the memory portion; updating the thesaurus data by adding a product of a contribution ratio and a difference between the first search data and second search data after the second search data has been stored in the memory portion; generating ranking data by giving scores to the plurality of pieces of reference document data on the basis of the second search data and ranking the plurality of pieces of reference document data on the basis of the scores to the thesaurus data, wherein the value of weight data is modified in accordance with a degree of relevance of related terms of a keyword by a user.
- 8 . The semiconductor device according to claim 7 , wherein the related terms are generated by adding a weight based on a similarity degree to terms related to the keyword, and wherein the related terms include synonyms, antonyms, broader terms, and narrower terms of the keyword.
- 9 . The semiconductor device according to claim 7 , wherein the user modifies the value of the weight data using a display device.
Description
TECHNICAL FIELD One embodiment of the present invention relates to a document search system and a document search method. Note that one embodiment of the present invention is not limited to the above technical field. Examples of the technical field of one embodiment of the present invention include a semiconductor device, a display device, a light-emitting device, a power storage device, a memory device, an electronic device, a lighting device, an input device (e.g., a touch sensor), an input/output device (e.g., a touch panel), a driving method thereof, and a manufacturing method thereof. BACKGROUND ART Prior art search before application for an invention can reveal if there is a relevant intellectual property right. Domestic or foreign patent documents, papers, and the like obtained through the prior art search are helpful in confirming the novelty and non-obviousness of the invention and determining whether to file the application. In addition, patent invalidity search can reveal whether there is a possibility of invalidation of the patent right owned by an applicant or whether the patent rights owned by others can be rendered invalid. When a user enters a keyword into a patent document search system, the system will output patent documents containing the keyword, for example. To conduct highly accurate prior art search with such a system, a user needs to have good techniques; for example, a user should select a proper search keyword and have to pick up needed patent documents from many patent documents output by the system. Use of artificial intelligence is under consideration for various applications. In particular, artificial neural networks are expected to provide computers having higher performance than conventional von Neumann computers. In recent years, a variety of studies on creation of artificial neural networks with electronic circuits have been carried out. For example, Patent Document 1 discloses an invention in which weight data necessary for calculation with an artificial neural network is retained in a memory device including a transistor that includes an oxide semiconductor in its channel formation region. REFERENCE Patent Document [Patent Document 1] United States Patent Application Publication No. 2016/0343452 SUMMARY OF THE INVENTION Problems to be Solved by the Invention An object of one embodiment of the present invention is to provide a document search system that enables highly accurate document search. Alternatively, an object of one embodiment of the present invention is to provide a document search method that enables highly accurate document search. Alternatively, an object of one embodiment of the present invention is to achieve highly accurate document search, especially for a document relating to intellectual property, with an easy input method. The description of a plurality of objects does not disturb the existence of each object. One embodiment of the present invention does not necessarily achieve all the objects described as examples. Furthermore, objects other than those listed are apparent from description of this specification, and such objects can be objects of one embodiment of the present invention. Means for Solving the Problems One embodiment of the present invention is a document search system including an input portion, a database, a memory portion, and a processing portion. The database has a function of storing a plurality of pieces of reference document data, weight dictionary data, and thesaurus data. The processing portion has a function of generating the weight dictionary data and the thesaurus data on the basis of the reference document data; a function of generating text analysis data from text data input to the input portion; a function of extracting a search word from words included in the text analysis data; and a function of generating first search data from the search word on the basis of the weight dictionary data and the thesaurus data. The memory portion has a function of storing second search data generated when the first search data is modified by a user. The processing portion has a function of updating the thesaurus data in accordance with the second search data. In one embodiment of the present invention, the document search system is preferable in which the processing portion has a function of generating reference text analysis data from the reference document data; and a function of extracting a plurality of keywords and related terms of the keywords from words included in the reference text analysis data. In one embodiment of the present invention, the document search system is preferable in which the weight dictionary data is data generated by extracting appearance frequencies of the keywords from the words included in the reference text analysis data and adding, to each of the keywords, a first weight based on the appearance frequency. In one embodiment of the present invention, the document search system is preferable in which the fi