CN-121981113-A - Semiconductor patent analysis method and device based on process tag knowledge graph
Abstract
The application discloses a semiconductor patent analysis method and device based on a process tag knowledge graph. The method comprises the steps of obtaining a semiconductor process tag system, constructing a structured prompt instruction input into a large language model based on the semiconductor process tag system, restraining the large language model to select process tags from the semiconductor process tag system only, obtaining semiconductor patent texts to be analyzed, performing text processing operation to determine the semiconductor patent technical texts, inputting the semiconductor patent technical texts and the structured prompt instruction into the large language model to determine the process tags of the semiconductor patent texts, constructing a process tag knowledge graph based on the process tags and a preset process tag data structure, and inquiring and/or statistically analyzing the process tag knowledge graph according to semiconductor patent analysis requirements to obtain analysis results representing the semiconductor process layout condition. The application has the technical effect of improving the efficiency of semiconductor patent analysis.
Inventors
- Ma Jinzhe
- LI JUNMING
- YU JIE
- ZHANG BINBIN
- ZHENG WU
- CAI CHAOHUI
Assignees
- 上海华虹计通智能系统股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251211
Claims (10)
- 1. The semiconductor patent analysis method based on the technological label knowledge graph is characterized by comprising the following steps of: acquiring a semiconductor process tag system, wherein the semiconductor process tag system is determined based on process flow division of a semiconductor manufacturing process and comprises a process major class, a process sub-class, a technology type and a technology type specific implementation mode; constructing a structured prompt instruction for inputting a large language model based on the semiconductor process tag system, wherein the structured prompt instruction is used for constraining the large language model to select process tags only from the semiconductor process tag system; Acquiring a semiconductor patent text to be analyzed, and performing text processing operations to determine a semiconductor patent technical text, wherein the text processing operations comprise one or more of text extraction, text translation, technical information extraction and term alignment; Inputting the semiconductor patent technology text and the structured prompt instruction into the large language model, and determining one or more process labels of the semiconductor patent text, wherein the process labels are classification nodes with semantic mapping relation with the semiconductor patent technology text in the semiconductor process label system; Constructing a process tag knowledge graph based on the process tag and a preset process tag data structure, wherein the preset process tag data structure comprises association relation between patent information and the process tag; and according to the analysis requirement of the semiconductor patent, inquiring and/or statistically analyzing the technological label knowledge graph to obtain an analysis result for representing the technological layout condition of the semiconductor.
- 2. The method for analyzing semiconductor patent according to claim 1, wherein, The semiconductor process tag architecture is configured as a four-level process tag hierarchy including a first process tag hierarchy for identifying a semiconductor manufacturing process category, a second process tag hierarchy for identifying a specific process link under the process category, a third process tag hierarchy for identifying a technology solution type under the process category, and a fourth process tag hierarchy for identifying a technology type specific implementation.
- 3. The semiconductor patent analysis method of claim 2, wherein the text processing operations include one or more of the following operations: the text extraction is configured to extract text content and perform layout reduction based on an optical character recognition engine when the semiconductor patent text is in an image format; The text translation is configured to translate the semiconductor patent text into a target language when it is in a non-target language; The technical information extraction is configured to identify and remove non-technical noise content in the semiconductor patent text, the non-technical noise content including legal claims, citation lists, and reference numeral descriptions; the term alignment is configured to map term variants in the semiconductor patent text to standard terms based on a preset term knowledge base.
- 4. The semiconductor patent analysis method according to claim 3, wherein the structured prompting instruction includes output range constraint information for constraining the large language model to select only process tags from the semiconductor process tag system, and output format constraint information for constraining the large language model to output the process tags in a preset output format; The output range constraint information comprises the number of process labels selected from the semiconductor process label system by the large language model is not more than the preset label number, and the process labels belong to the four-stage process label level of the semiconductor process label system; The output format constraint information is used for indicating the large language model to output the process labels in preset separators and sequences.
- 5. The semiconductor patent analysis method according to claim 4, wherein, The structured prompt instruction further comprises weight indication information, wherein the weight indication information is used for indicating that the process tag is determined based on the paragraph importance degree of the semiconductor patent technology text when the large language model analyzes the semiconductor patent technology text.
- 6. The semiconductor patent analysis method according to claim 5, wherein the structured prompting instruction further comprises a reference sample set for sample learning by the large language model; the reference sample set comprises a positive sample and a negative sample, wherein the positive sample comprises a first segment of a preset semiconductor patent technology text and standard process labels corresponding to the first segment in the semiconductor process label system, and is used for illustrating the semantic mapping relation; The negative sample comprises a second segment related to non-semiconductor patent technology text and exclusion indication information corresponding to the second segment, and the exclusion indication information is used for indicating that the large language model filtering does not belong to the technical content of the semiconductor technology label system.
- 7. The method for analyzing semiconductor patent according to claim 1, wherein, The process tag knowledge graph is queried and/or statistically analyzed according to the semiconductor patent analysis requirement, which comprises at least one of the following modes: Based on a natural language query instruction, generating a map query sentence, and searching associated nodes in the process tag knowledge map to generate a question-answer result; Based on a process tag query instruction, determining patent quantity information under the process tag corresponding to the process tag query instruction, and generating process capability comparison results of different applicant, wherein the patent quantity information comprises the number of patents, the number of times of patent quotes or the number of average claim items; based on the technical blind-supplementing instruction, identifying a missing link of a technological process related to the technological label corresponding to the technical blind-supplementing instruction by a target applicant, and generating a technical blind-supplementing recommended result.
- 8. The semiconductor patent analysis method according to any one of claims 1 to 7, further comprising the preset process tag data structure being configured as a triplet data structure; the process tag knowledge graph construction method based on the process tag and the preset process tag data structure comprises the following steps: Extracting patent identification information from the semiconductor patent technology text, and generating first-class triplet data, wherein the first-class triplet data comprises the patent identification information, the process tag corresponding to the patent identification information and a first relation between the patent identification information and the process tag; Extracting applicant information from the semiconductor patent technology text, and generating second-class triplet data, wherein the second-class triplet data comprises the applicant information, the process label corresponding to the applicant information and a second relation between the applicant information and the process label; And extracting a process flow or patent citation relation from the semiconductor patent technical text, and generating third class triplet data, wherein the third class triplet data comprises a first process label, a second process label associated with the first process label and a third relation between the first process label and the second process label.
- 9. A semiconductor patent analysis device based on a process tag knowledge graph is characterized by comprising: the acquisition unit is used for acquiring a semiconductor process tag system, wherein the semiconductor process tag system is determined based on the process flow division of a semiconductor manufacturing process and comprises a process category, a process subclass, a technology type and a technology type specific implementation mode; a building unit for building a structured hint instruction for inputting a large language model based on the semiconductor process tag system, the structured hint instruction being for constraining the large language model to select process tags only from the semiconductor process tag system; The text processing unit is used for acquiring the semiconductor patent text to be analyzed and performing text processing operation to determine the semiconductor patent technical text, wherein the text processing operation comprises one or more of text extraction, text translation, technical information extraction and term alignment; A process label determining unit, configured to input the semiconductor patent technology text and the structured prompt instruction into the large language model, and determine one or more process labels of the semiconductor patent text, where the process labels are classification nodes in the semiconductor process label system, and have a semantic mapping relationship with the semiconductor patent technology text; the map construction unit is used for constructing a process tag knowledge map based on the process tag and a preset process tag data structure, wherein the preset process tag data structure comprises association relation between patent information and the process tag; and the analysis unit is used for inquiring and/or statistically analyzing the process tag knowledge graph according to the analysis requirement of the semiconductor patent to obtain an analysis result for representing the semiconductor process layout condition.
- 10. A computer-readable storage medium comprising a memory having instructions stored thereon that, when read by a processor, perform the process tag knowledge-graph based semiconductor patent analysis method of any one of claims 1 to 8.
Description
Semiconductor patent analysis method and device based on process tag knowledge graph Technical Field The embodiment of the application relates to the technical field of semiconductor patent analysis, in particular to a semiconductor patent analysis method and device based on a process tag knowledge graph. Background The technology of the semiconductor manufacturing industry is faster, the process flow is complex, and the patent analysis is an important means for enterprises to make technical development decisions and competition situation awareness. Currently, semiconductor patent analysis mainly relies on the International Patent Classification (IPC) or the joint patent classification system (CPC), and performs patent text retrieval based on keywords. However, the classification system of IPC and CPC often needs to be compatible with multiple categories for technical division, and the hierarchy is shallow, so that more specific technical subdivision is difficult under the same process category. However, based on keyword matching or traditional machine learning patent analysis, deep understanding of semiconductor technical terms and process flows is lacking, and effective recognition of various semiconductor field terms and technical schemes in patent text is difficult. In addition, the existing analysis stays at the quantity statistics level, and discrete patent texts cannot be converted into a correlation and reasoning process knowledge network. Therefore, semiconductor patent analysis problems are of concern. Disclosure of Invention In view of the above, the embodiment of the application provides a method and a device for analyzing a semiconductor patent based on a process tag knowledge graph, so as to improve the efficiency of semiconductor patent analysis. A method for analyzing semiconductor patent based on technological label knowledge graph includes obtaining a semiconductor technological label system, determining technological process division of semiconductor technological label system based on semiconductor manufacturing technology and comprising technological major types, technological sub-types, technological types and technological type specific implementation modes, constructing a structured prompt instruction input with a large language model based on the semiconductor technological label system, enabling the structured prompt instruction to be used for restricting the large language model to select technological labels only from the semiconductor technological label system, obtaining semiconductor patent text to be analyzed, performing text processing operation to determine the semiconductor patent technical text, enabling the text processing operation to comprise one or more of text extraction, text translation, technological information extraction and term alignment, inputting the semiconductor patent technical text and the structured prompt instruction into the large language model to determine one or more technological labels of the semiconductor patent text, enabling the technological labels to be classified nodes in the semiconductor technological label system and having semantic mapping relation with the semiconductor patent technical text, constructing technological label knowledge graph based on technological label and technological label data structure, enabling preset technological label data structure to comprise correlation relation between patent information and technological label, analyzing technological label demand graph analysis and statistical analysis of technological label demand graph analysis. Optionally, the semiconductor process tag hierarchy is configured as a four level process tag hierarchy including a first process tag hierarchy for identifying a semiconductor manufacturing process category, a second process tag hierarchy for identifying a specific process link under a process broad category, a third process tag hierarchy for identifying a technology solution type under a process sub-category, and a fourth process tag hierarchy for identifying a technology type specific implementation. Optionally, the text processing operations include one or more of text extraction configured to extract text content and layout reduction based on an optical character recognition engine when the semiconductor patent text is in an image format, text translation configured to translate the semiconductor patent text into a target language when it is in a non-target language, technical information extraction configured to identify and remove non-technical noise content in the semiconductor patent text, the non-technical noise content including legal claims, reference lists, and reference notes, and term alignment configured to map term variants in the semiconductor patent text to standard terms based on a preset term knowledge base. The structured prompting instruction comprises output range constraint information used for constraining the large language model to select process labe