CN-121257548-B - Semantic analysis and recognition method based on artificial intelligence

CN121257548BCN 121257548 BCN121257548 BCN 121257548BCN-121257548-B

Abstract

The invention discloses a semantic analysis and recognition method based on artificial intelligence, in particular relates to the field of semantic analysis, and aims to solve the problems that in the existing cross-language text processing, cultural load words are difficult to accurately recognize and disambiguate, so that frequent semantic offset and inconsistent translation results are caused; the method comprises the steps of carrying out text structure decomposition and culture metaphor analysis on cross-language comparison corpus, identifying culture load words with culture specific meanings, further obtaining semantic vector sets of the culture load words under a source culture context and a target culture context, calculating semantic escape distances between the source culture context and the target culture context to generate escape paths, and training a context disambiguation model by combining the escape paths and the contexts, so that the model can output final semantic probability distribution of the culture load words in target culture when inputting text to be analyzed, and generate semantic mapping prompts in a cross-language semantic conversion process based on the probability distribution, thereby realizing the accurate semantic analysis and prompts of the culture load words.

Inventors

Xing zhuang
HUANG ZIXUAN

Assignees

西安大麦网络科技有限公司

Dates

Publication Date: 20260505
Application Date: 20250930

Claims (6)

1. The semantic analysis and identification method based on artificial intelligence is characterized by comprising the following steps: s1, performing text structure decomposition and cultural metaphor analysis on cross-language comparison corpus, and identifying cultural load words with cultural specific meanings in the comparison corpus; S2, acquiring a first semantic vector set of the cultural load word under a source cultural context and a second semantic vector set of the cultural load word under a target cultural context; s3, calculating a cross-culture semantic difference index between the first semantic vector set and the second semantic vector set, and generating a load word escape path from source culture to target culture; S4, based on the load word escape path, performing context disambiguation model training by combining the contexts in the cross-language comparison corpus; S5, inputting the context of the cultural load word of the source language text to be analyzed and the corresponding escape path into a context disambiguation model, and outputting final semantic probability distribution of the cultural load word in the target culture; s6, carrying out semantic mapping prompt during cross-language semantic conversion based on final semantic probability distribution; In the step S3, calculating a cross-culture semantic difference index between the first semantic vector set and the second semantic vector set, and generating a load word escape path from source culture to target culture specifically includes: Mapping the first semantic vector set and the second semantic vector set to a unified semantic vector space; performing similarity measurement on each pair of semantic vector set pairs of cultural load words in a unified vector space, recording similarity values and storing the similarity values in an escape comparison result table; Carrying out weighted statistics on similarity values recorded in an escape comparison result table based on the confidence level of corpus sources to generate an integral cross-culture semantic difference index; Outputting an escape path of cultural load words from source culture to target culture according to the semantic difference index, and establishing a corresponding relation between a first semantic vector set of the escape path table record cultural load words and the escape path; the escape path comprises the change direction and the change size from the source culture semantic vector to the target culture semantic vector; in the step S4, based on the load word escape path, the context disambiguation model training combined with the context in the cross-language comparison corpus specifically includes: Constructing a training sample set of a context disambiguation model, wherein the training sample comprises a context segment and an escape path of a cultural load word in a source cultural corpus and semantic annotation of the load word in a target language; establishing a two-channel neural network architecture, wherein a first channel adopts an attention mechanism to encode a source language context, and a second channel carries out feature extraction on an escape path; designing a cross-channel feature fusion layer, and fusing the context coding features and the escape path features to generate comprehensive semantic features; Mapping the comprehensive semantic features to semantic spaces corresponding to the target cultural corpus through a full-connection layer, and outputting probability distribution of corresponding semantic notes by an output layer; And taking semantic annotation of the load word in the target language as a supervision signal to supervise and learn the model, and iteratively updating the model parameters.
2. The semantic analysis and recognition method based on artificial intelligence according to claim 1, wherein in S1, performing text structure decomposition and cultural metaphor analysis on a cross-language comparison corpus, and recognizing cultural load words with cultural specific meanings in the comparison corpus specifically comprises: respectively executing word segmentation and part-of-speech tagging on cross-language comparison corpus, dividing a text into vocabulary units and recording part-of-speech categories; invoking a preset cultural load word dictionary, comparing the vocabulary units with load word items recorded in the dictionary, and marking the vocabulary units matched with the items as cultural load words; In cross-language comparison corpus, extracting a context window of a sentence where the cultural load word is located aiming at the cultural load word, and analyzing the function of the cultural load word in the context by combining a syntactic dependency relationship and semantic role labeling, wherein the function comprises symbolic nouns, behavior phrases and set fixed semantic expression vocabulary; and recording the context fragments of the cultural load words in the cross-language comparison corpus text.
3. The semantic analysis recognition method based on artificial intelligence according to claim 2, wherein the specific establishment mode of the cultural load word dictionary is as follows: based on the language category of the comparison corpus, collecting vocabulary and phrases containing cultural metaphors, symbolism usage and fixed expression from a cultural corpus corresponding to the language category; The culture corpus is divided into a source culture corpus and a target culture corpus, wherein the source culture corpus comprises historical documents, literary works, media reports and social texts; Performing manual screening and semantic annotation on the collected words and phrases, and establishing cultural load word entries according to language types, part-of-speech types and semantic annotations; And storing the cultural load word entries as structured data to form a cultural load word dictionary for identifying the cultural load words.
4. The artificial intelligence based semantic analysis recognition method according to claim 1, wherein in S2, obtaining a first set of semantic vectors for cultural load words in a source cultural context and a second set of semantic vectors in a target cultural context specifically comprises: reading context fragments of cultural load words, and respectively inputting the fragments into a semantic coding model based on training on a source cultural corpus and a semantic coding model based on training on a target cultural corpus; Respectively obtaining semantic vector representations of cultural load words under a source cultural context based on semantic coding model output, merging the semantic vector representations into a first semantic vector set and semantic vector representations under a target cultural context, merging the semantic vector representations into a second semantic vector set; In the merging process, the first semantic vector set and the second semantic vector set are respectively established with corresponding cultural load words.
5. The artificial intelligence based semantic analysis and recognition method according to claim 1, wherein in S5, the context of the cultural load word of the source language text to be analyzed and the corresponding escape path are input to the context disambiguation model, and the final semantic probability distribution of the cultural load word in the target culture is output specifically includes: Preprocessing a text to be analyzed, and extracting cultural load words and corresponding up-down Wen Pianduan in the text to be analyzed; acquiring semantic vector representations of cultural load word context fragments of a text to be analyzed under a source cultural context, and loading corresponding escape paths in an escape path table according to the semantic vector representations; and inputting the context fragments of the cultural load words of the text to be analyzed and the corresponding escape paths into a trained context disambiguation model, and outputting the final semantic probability distribution of the cultural load words in the target culture.
6. The semantic analysis and recognition method based on artificial intelligence according to claim 1, wherein in S6, based on the final semantic probability distribution, performing semantic mapping hints during cross-language semantic conversion specifically includes: Detecting semantic annotations with highest proportion in final semantic probability distribution, and judging that semantic offset exists when the semantic annotations do not belong to the literal semantic meaning of the cultural load word; and when the semantic offset is judged to exist, generating semantic mapping prompt information, wherein the prompt information comprises a position index of a cultural load word in a text, a semantic annotation under a target cultural context and a corresponding probability value.

Description

Semantic analysis and recognition method based on artificial intelligence Technical Field The invention relates to the technical field of semantic analysis, in particular to a semantic analysis and identification method based on artificial intelligence. Background In the cross-language information processing and intelligent translation scene, along with the continuous deepening of globalization communication, the text semantic difference under different cultural backgrounds is gradually revealed, and especially, some words with strong cultural dependence are extremely easy to generate ambiguity and deviation in the cross-cultural propagation process, and the words often bear specific social history, regional custom or group metaphors, which are called cultural load words. In news stories, text translations, cross-border business contracts, and social media content understanding, translation or semantic parsing of cultural load words, if relying on literal meanings alone, often results in misreading or even propagation risk, e.g., the symbolic difference of "dragon" in chinese context and "dragon" in english context, is directed at different semantics in daily context and specific context. The existing cross-language processing method is usually focused on word alignment or statistical probability, but is difficult to accurately capture ambiguity and semantic offset characteristics of cultural load words in source cultural context and target cultural context, which causes the problems of improper semantic conversion, information mistransmission or cultural conflict in a cross-cultural communication scene. Especially in application scenarios requiring high accuracy, such as multi-language version comparison of legal documents, bilingual analysis of international business negotiating data, cross-cultural public opinion analysis and intelligent customer service response systems, if the semantic connotation of cultural load words cannot be effectively identified and disambiguated, stable and reliable cross-cultural semantic conversion results cannot be provided. Therefore, a method capable of combining the comparison corpus, the context information and the cross-culture semantic mapping relation is needed, the cultural load words are accurately identified, an escape path from the source culture to the target culture is established, and context disambiguation is performed in a modeling mode, so that accurate prompt and explanation are provided in the cross-language semantic conversion process, and the adaptability and reliability of intelligent translation and semantic analysis in cross-culture communication are improved. Disclosure of Invention In order to overcome the above-mentioned drawbacks of the prior art, embodiments of the present invention provide a semantic analysis recognition method based on artificial intelligence to solve the above-mentioned problems set forth in the background art. In order to achieve the above purpose, the present invention provides the following technical solutions: A semantic analysis recognition method based on artificial intelligence comprises the following steps: s1, performing text structure decomposition and cultural metaphor analysis on cross-language comparison corpus, and identifying cultural load words with cultural specific meanings in the comparison corpus; S2, acquiring a first semantic vector set of the cultural load word under a source cultural context and a second semantic vector set of the cultural load word under a target cultural context; S3, calculating a semantic escape distance between the first semantic vector set and the second semantic vector set, and generating a load word escape path from source culture to target culture; S4, based on the load word escape path, performing context disambiguation model training by combining the contexts in the cross-language comparison corpus; S5, inputting the context of the cultural load word of the source language text to be analyzed and the corresponding escape path into a context disambiguation model, and outputting final semantic probability distribution of the cultural load word in the target culture; s6, based on final semantic probability distribution, semantic mapping prompt is carried out during cross-language semantic conversion. In a preferred embodiment, in S1, the text structure decomposition and the cultural metaphor parsing are performed on the cross-language comparison corpus, and the identifying cultural load words with cultural specific meanings in the comparison corpus specifically includes: respectively executing word segmentation and part-of-speech tagging on cross-language comparison corpus, dividing a text into vocabulary units and recording part-of-speech categories; invoking a preset cultural load word dictionary, comparing the vocabulary units with load word items recorded in the dictionary, and marking the vocabulary units matched with the items as cultural load words; In cross-language compa