CN-115964992-B - Method, device and equipment for converting spoken language into written language based on graph-note network

CN115964992BCN 115964992 BCN115964992 BCN 115964992BCN-115964992-B

Abstract

The invention provides a method, a device and equipment for converting a spoken language into a written language based on a graph-note network, wherein the method comprises the steps of carrying out semantic coding on a spoken language document to obtain semantic representation of the spoken language document, determining initial representation of each node in a document structure diagram of the spoken language document based on the semantic representation of the spoken language document, wherein the document structure diagram comprises document nodes, sentence nodes and word segmentation nodes, carrying out message transmission on the initial representation of each node in the document structure diagram based on an attention mechanism to obtain structure diagram representation of the document structure diagram, and carrying out semantic decoding based on the structure diagram representation to obtain the written document corresponding to the spoken language document. According to the method, the device and the equipment provided by the invention, through constructing the document diagram structure schematic diagram, a more concise written document with strong readability can be obtained, the problem that spoken language terms crossing sentence boundaries are missed when text conversion is carried out is avoided, and the written conversion effect of document-level spoken language texts is ensured.

Inventors

ZHAO YUNLONG
XU SHUANG
XU BO

Assignees

中国科学院自动化研究所

Dates

Publication Date: 20260505
Application Date: 20221212

Claims (8)

1. A method for converting spoken language into written language based on a graph-meaning network, comprising: Carrying out semantic coding on a spoken document to obtain semantic representation of the spoken document; Determining initial representation of each node in a document structure diagram of the spoken document based on semantic representation of the spoken document, wherein the document structure diagram comprises document nodes, sentence nodes and word segmentation nodes, the document nodes are connected with sentence nodes corresponding to spoken sentences belonging to the spoken document, and the sentence nodes are connected with word segmentation nodes corresponding to word segmentation belonging to the spoken sentences; based on an attention mechanism, carrying out message propagation on the initial representation of each node in the document structure diagram to obtain the structure diagram representation of the document structure diagram; Performing semantic decoding based on the structural diagram representation to obtain a written document corresponding to the spoken document; the message transmission is carried out on the initial representation of each node in the document structure diagram based on the attention mechanism to obtain the structure diagram representation of the document structure diagram, which comprises the following steps: Based on the attention mechanism, carrying out message propagation on the initial representation of the same hierarchical node in the document structure diagram to obtain hierarchical representation of each node in the document structure diagram, wherein the hierarchical representation of each node comprises the initial representation of the document node, the hierarchical representation of the sentence node and the hierarchical representation of the word segmentation node; based on the attention mechanism, carrying out message propagation on hierarchical representation of each node in the document structure diagram to obtain the structure diagram representation of the document structure diagram; based on the attention mechanism, the message transmission is carried out on the initial representation of the same hierarchical node in the document structure diagram to obtain the hierarchical representation of each node in the document structure diagram, and the method comprises the following steps: constructing a word-level full-connection graph based on each word segmentation node in the document structure diagram; constructing a sentence-level full-connection graph based on each sentence node in the document structure graph; based on the attention mechanism, carrying out message propagation on the initial representation of each word segmentation node in the word level full-connection graph to obtain hierarchical representation of each word segmentation node; and based on the attention mechanism, carrying out message propagation on the initial representation of each sentence node in the sentence-level full-connection graph to obtain the hierarchical representation of each sentence node.
2. The method for converting spoken language into written language based on a graph-meaning network according to claim 1, wherein the semantic decoding based on the structural diagram representation to obtain a written document corresponding to the spoken document comprises: feature fusion is carried out on the semantic representation and the structural diagram representation to obtain a fusion representation; and carrying out semantic decoding based on the fusion representation to obtain a written document corresponding to the spoken document.
3. The method for converting spoken language into written language based on a graph-meaning network according to claim 2, wherein the feature fusing the semantic representation and the structural representation to obtain a fused representation comprises: performing gated attention mechanism calculation based on the semantic representation and the structural diagram representation to obtain attention weight; Enhancing the structural diagram representation based on the attention weight to obtain an enhanced diagram representation; and carrying out feature fusion on the semantic representation and the enhancement map representation to obtain the fusion representation.
4. A method of converting spoken language into written language based on a graph-semantic network according to any one of claims 1 to 3, wherein the determining an initial representation of each node in a document structure diagram of the spoken document based on a semantic representation of the spoken document comprises: The semantic representation of each word in the spoken document is used as the initial representation of each word segmentation node in the document structure diagram; determining initial representations of sentence nodes in the document structure diagram based on semantic representations of word segmentation under the sentences in the spoken document; And determining the initial representation of the document nodes in the document structure diagram based on the initial representation of each sentence node in the document structure diagram.
5. A device for converting spoken language into written language based on a graph-meaning network, comprising: the coding unit is used for carrying out semantic coding on the spoken document to obtain semantic representation of the spoken document; A graph representation unit, configured to determine an initial representation of each node in a document structure diagram of the spoken document based on a semantic representation of the spoken document, where the document structure diagram includes document nodes, sentence nodes, and word segmentation nodes, the document nodes are connected with sentence nodes corresponding to spoken sentences belonging to the spoken document, and the sentence nodes are connected with word segmentation nodes corresponding to each word segmentation belonging to the spoken sentences; The graph propagation unit is used for carrying out message propagation on the initial representation of each node in the document structure graph based on the attention mechanism to obtain the structure graph representation of the document structure graph; the decoding unit is used for carrying out semantic decoding based on the structural diagram representation to obtain a written document corresponding to the spoken document; The graph propagation unit is specifically configured to: Based on the attention mechanism, carrying out message propagation on the initial representation of the same hierarchical node in the document structure diagram to obtain hierarchical representation of each node in the document structure diagram, wherein the hierarchical representation of each node comprises the initial representation of the document node, the hierarchical representation of the sentence node and the hierarchical representation of the word segmentation node; based on the attention mechanism, carrying out message propagation on hierarchical representation of each node in the document structure diagram to obtain the structure diagram representation of the document structure diagram; The graph propagation unit is also specifically configured to: constructing a word-level full-connection graph based on each word segmentation node in the document structure diagram; constructing a sentence-level full-connection graph based on each sentence node in the document structure graph; based on the attention mechanism, carrying out message propagation on the initial representation of each word segmentation node in the word level full-connection graph to obtain hierarchical representation of each word segmentation node; and based on the attention mechanism, carrying out message propagation on the initial representation of each sentence node in the sentence-level full-connection graph to obtain the hierarchical representation of each sentence node.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of graph-meaning network-based spoken-to-written-language conversion of any one of claims 1 to 4 when the program is executed by the processor.
7. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the method of graph-meaning network-based spoken-to-written-language conversion of any one of claims 1 to 4.
8. A computer program product comprising a computer program which, when executed by a processor, implements the method of the graphical user interface-based spoken-to-written language conversion of any one of claims 1 to 4.

Description

Method, device and equipment for converting spoken language into written language based on graph-note network Technical Field The invention relates to the technical field of natural language processing, in particular to a method, a device and equipment for converting spoken language into written language based on an ideographic network. Background Because people have differences in language application modes during speaking and writing, and the situations of wrong syntax and grammar and unfavorable conditions during speaking and noise carried during voice recording can influence accessibility and readability of spoken text obtained by voice recognition. Therefore, converting spoken text into written text is important to reduce the difficulty of understanding text content. Current research on spoken to written text conversion is usually sentence-by-sentence conversion. However, in practice, in the spoken text at the document level, the spoken terms may cross sentence boundaries, and the spoken text itself is very lengthy and poorly organized, and the sentence-by-sentence conversion cannot detect the influence between the front and rear sentences, and only the spoken terms can be deleted, so that the recombination simplification of the spoken text at the document level cannot be realized, and the conversion effect is not ideal. Disclosure of Invention The invention provides a method, a device and equipment for converting a spoken language into a written language based on a graph-note network, which are used for solving the problems that in the prior art, a method for converting the spoken language into the written language sentence by sentence is not suitable for a document-level conversion scene and a conversion effect is not ideal. The invention provides a method for converting a spoken language into a written language based on a graph-note network, which comprises the following steps: Carrying out semantic coding on a spoken document to obtain semantic representation of the spoken document; Determining initial representation of each node in a document structure diagram of the spoken document by semantic representation of the spoken document, wherein the document structure diagram comprises document nodes, sentence nodes and word segmentation nodes, the document nodes are connected with sentence nodes corresponding to spoken sentences belonging to the spoken document, and the sentence nodes are connected with word segmentation nodes corresponding to word segmentation belonging to the spoken sentences; carrying out message propagation on the initial representation of each node in the document structure diagram by using an attention mechanism to obtain the structure diagram representation of the document structure diagram; and carrying out semantic decoding by using the structural diagram representation to obtain a written document corresponding to the spoken document. According to the method for converting the spoken language into the written language based on the graph-note network, which is provided by the invention, the initial representation of each node in the document structure diagram is transmitted by a message based on an attention mechanism, so that the structure diagram representation of the document structure diagram is obtained, and the method comprises the following steps: Based on the attention mechanism, carrying out message propagation on the initial representation of the same-level nodes in the document structure diagram to obtain the hierarchical representation of each node in the document structure diagram; And based on the attention mechanism, carrying out message propagation on the hierarchical representation of each node in the document structure diagram to obtain the structure diagram representation of the document structure diagram. According to the method for converting the spoken language into the written language based on the graph-meaning network, which is provided by the invention, the initial representation of the same-level node in the document structure diagram is transmitted by the message based on the attention mechanism, so that the hierarchical representation of each node in the document structure diagram is obtained, and the method comprises the following steps: constructing a word-level full-connection graph based on each word segmentation node in the document structure diagram; constructing a sentence-level full-connection graph based on each sentence node in the document structure graph; based on the attention mechanism, carrying out message propagation on the initial representation of each word segmentation node in the word level full-connection graph to obtain hierarchical representation of each word segmentation node; and based on the attention mechanism, carrying out message propagation on the initial representation of each sentence node in the sentence-level full-connection graph to obtain the hierarchical representation of each sentence node. According to the me