CN-121996740-A - Natural language processing method and equipment
Abstract
The embodiment of the disclosure provides a processing method and processing equipment of natural language, which are used for acquiring a vocabulary item set corresponding to an initial input sentence to be processed, wherein the vocabulary item set comprises initial vocabulary items included in the initial input sentence and expanded vocabulary items of the initial vocabulary items, performing word order reduction on the vocabulary items in the vocabulary item set to obtain the expanded sentence of the initial input sentence, and acquiring a processing result based on the expanded sentence and a natural language processing model. According to the method and the device for processing the initial input sentence, the initial term and the expanded term included in the initial input sentence can be obtained and restored to the expanded sentence with the orderly language, so that the initial input sentence can be expanded and enhanced, the obtained expanded sentence can be better understood by a natural language processing model, further natural language processing is performed by combining with the natural language processing model, and accuracy and stability of a processing result can be improved.
Inventors
- YU WEIQIANG
- QI JIAHUI
- LU YUNCHENG
- WANG DONGYU
- WANG ZHENG
- LI YAKUN
Assignees
- 北京火山引擎科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20241029
Claims (15)
- 1. A method for processing natural language, comprising: Acquiring a vocabulary item set corresponding to an initial input sentence to be processed, wherein the vocabulary item set comprises initial vocabulary items included in the initial input sentence and expanded vocabulary items of the initial vocabulary items; performing word order reduction on the terms in the term set to obtain an expanded sentence of the initial input sentence; and acquiring a processing result based on the extended sentence and a natural language processing model.
- 2. The method of claim 1, wherein the performing a word order reduction on the terms in the term set to obtain the expanded sentence of the initial input sentence comprises: traversing the terms in the term set, acquiring the terms with position information from the term set, and constructing an expansion sentence, wherein the position information is the corresponding position and/or offset value of the terms in the initial input sentence.
- 3. The method of claim 2, wherein traversing the terms in the set of terms, obtaining terms from the set of terms that are connected by location information, and constructing an expanded sentence comprises: Traversing the vocabulary items which are not added with any expansion statement in the vocabulary item set in sequence, and judging whether the traversed current vocabulary item is connected with the position information of the vocabulary item added with the current expansion statement; if the position information is connected, the current term is added into the current expansion sentence and is positioned after the previous term added into the current expansion sentence, or If the position information is not connected, skipping over the current term; and when the traversing is finished, completing the construction of the current expansion sentence, and restarting traversing the vocabulary items which are not added with any expansion sentence in the vocabulary item set in sequence to construct a new expansion sentence.
- 4. A method according to claim 3, characterized in that the method further comprises: creating an array according to the vocabulary term set, wherein each bit of the array is used for storing a state corresponding to each vocabulary term in the vocabulary term set, and the state corresponding to the vocabulary term is used for representing whether the vocabulary term is added into any expansion statement; In the traversal process, if any term is added into any expansion statement, updating the state corresponding to the term in the array; Correspondingly, traversing the vocabulary items without any expansion statement added in the vocabulary item set in sequence comprises the following steps: And traversing the vocabulary items without any expansion statement added in the vocabulary item set in sequence according to the array.
- 5. The method of claim 3, wherein the determining whether the traversed current term is connected to the position information of the term previously added to the current expansion sentence comprises: and judging whether the traversed current term is connected with the position and/or the offset value of the term added into the current expansion sentence.
- 6. The method of claim 5, wherein determining whether the traversed current term is connected to a position and/or offset value of a term previously added to the current expansion sentence comprises: Judging whether the initial offset value of the current term is connected with the end offset value of the term added into the current expansion statement or not, and/or judging whether the initial offset value of the current term is the same as the initial offset value of the term added into the current expansion statement, and the position of the term added into the current expansion statement is in an incremental relation with the position of the current term; If the initial offset value of the current term is connected with the ending offset value of the term added in the current expansion statement, or the initial offset value of the current term is the same as the initial offset value of the term added in the current expansion statement, and the position of the term added in the current expansion statement is in an increasing relation with the position of the term, the current term is connected with the position information of the term added in the current expansion statement.
- 7. The method of claim 1, wherein the obtaining the processing result based on the extended sentence and a natural language processing model comprises: Inputting the extension sentence into the natural language processing model, rewriting the extension sentence through the natural language processing model to obtain a rewritten extension sentence, and performing natural language processing according to the rewritten extension sentence to obtain a processing result.
- 8. The method of claim 1, wherein the obtaining the processing result based on the extended sentence and a natural language processing model comprises: and inputting the extended sentence into the natural language processing model, acquiring a sentence vector corresponding to the extended sentence through the natural language processing model, and carrying out vector processing according to the sentence vector to obtain a processing result.
- 9. The method of claim 1, wherein the obtaining the processing result based on the extended sentence and a natural language processing model comprises: And inquiring the extended sentences through the natural language model to obtain inquiry results and matching scores of the inquiry results and the extended sentences.
- 10. The method of any of claims 1-9, wherein the expanded term of the initial term comprises one or more of: synonyms of the initial terms, pinyin of the initial terms, and terms in standard formats corresponding to the initial terms.
- 11. The method according to any one of claims 1-9, wherein the obtaining a set of terms corresponding to the initial input sentence to be processed includes: and calling an analyzer of the search engine to obtain a vocabulary item set corresponding to the initial input sentence to be processed, wherein the analyzer is a component for processing text data in the search engine and comprises one or more analyzers including a word segmentation device, a synonym analyzer, a pinyin analyzer and a multilingual text analyzer.
- 12. A natural language processing device, comprising: The analysis unit is used for acquiring a vocabulary item set corresponding to an initial input sentence to be processed, wherein the vocabulary item set comprises initial vocabulary items contained in the initial input sentence and expanded vocabulary items of the initial vocabulary items; the word order reduction unit is used for carrying out word order reduction on the word terms in the word term set to obtain an expanded sentence of the initial input sentence; and the model processing unit is used for acquiring a processing result based on the expansion statement and the natural language processing model.
- 13. An electronic device is characterized by comprising a processor and a memory; The memory stores computer-executable instructions; The processor executing computer-executable instructions stored in the memory, causing the processor to perform the method of any one of claims 1-11.
- 14. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement the method of any of claims 1-11.
- 15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-11.
Description
Natural language processing method and equipment Technical Field The embodiment of the disclosure relates to the technical field of computer and network communication, in particular to a natural language processing method and equipment. Background In search engines, such as Opensearch, elasticsearch, there are often rich analyzer ecologies that intervene on query inputs (queries), such as word segmentation, synonyms, ICU (International Components for Unicode, unicode internationalization component), pinyin (pinyin), etc., while analyzer ecologies are built based on terms (term), which is well suited for full-text search scenarios. However, with recent explosion of applications related to natural language processing models (Natural Language Process, NLP), bypass intervention approaches similar to search engine analyzers, which term-based analyzers are not suitable for natural language processing models, are lacking in natural language processing model-related applications. Disclosure of Invention The embodiment of the disclosure provides a natural language processing method and device to overcome the problems. In a first aspect, an embodiment of the present disclosure provides a method for processing a natural language, including: Acquiring a vocabulary item set corresponding to an initial input sentence to be processed, wherein the vocabulary item set comprises initial vocabulary items included in the initial input sentence and expanded vocabulary items of the initial vocabulary items; performing word order reduction on the terms in the term set to obtain an expanded sentence of the initial input sentence; and acquiring a processing result based on the extended sentence and a natural language processing model. In a second aspect, an embodiment of the present disclosure provides a processing apparatus for natural language, including: The analysis unit is used for acquiring a vocabulary item set corresponding to an initial input sentence to be processed, wherein the vocabulary item set comprises initial vocabulary items contained in the initial input sentence and expanded vocabulary items of the initial vocabulary items; the word order reduction unit is used for carrying out word order reduction on the word terms in the word term set to obtain an expanded sentence of the initial input sentence; and the model processing unit is used for acquiring a processing result based on the expansion statement and the natural language processing model. In a third aspect, an embodiment of the present disclosure provides an electronic device, including a processor and a memory; The memory stores computer-executable instructions; The processor executes computer-executable instructions stored in the memory, causing the at least one processor to perform the natural language processing method as described above in the first aspect and the various possible designs of the first aspect. In a fourth aspect, embodiments of the present disclosure provide a computer readable storage medium having stored therein computer executable instructions that when executed by a processor implement a natural language processing method as described above in the first aspect and the various possible designs of the first aspect. In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the natural language processing method as described above in the first aspect and the various possible designs of the first aspect. The method and the device for processing the natural language are used for obtaining a vocabulary item set corresponding to an initial input sentence to be processed, wherein the vocabulary item set comprises initial vocabulary items contained in the initial input sentence and expanded vocabulary items of the initial vocabulary items, performing word order reduction on the vocabulary items in the vocabulary item set to obtain the expanded sentence of the initial input sentence, and obtaining a processing result based on the expanded sentence and a natural language processing model. According to the method and the device for processing the initial input sentence, the initial term and the expanded term included in the initial input sentence can be obtained and restored to the expanded sentence with the orderly language, so that the initial input sentence can be expanded and enhanced, the obtained expanded sentence can be better understood by a natural language processing model, further natural language processing is performed by combining with the natural language processing model, and accuracy and stability of a processing result can be improved. Drawings In order to more clearly illustrate the embodiments of the present disclosure or the solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the description of the prior art, it being obvious that the drawings in