CN-121980331-A - Cross-border trade question-answering method, system, equipment and storage medium

CN121980331ACN 121980331 ACN121980331 ACN 121980331ACN-121980331-A

Abstract

The application discloses a method, a system, equipment and a storage medium for question and answer of cross-border trade, which comprise the steps of determining a first feature based on text data, determining a second feature based on image data, determining a third feature based on voice data, determining a fourth feature based on video data, carrying out feature fusion based on the first feature, the second feature, the third feature and the fourth feature through a graph attention mechanism model to obtain fused intention features, inputting the fused intention features into a trained multi-label ViT-CNN mixed model to obtain an intention classification result output by the trained multi-label ViT-CNN mixed model, determining answer data of a target problem based on the intention classification result and a preset resource database through a near-end strategy optimization algorithm, and sending the answer data to a user.

Inventors

WANG HAO
LI YUHUI
CHEN SHIHANG

Assignees

广西通信规划设计咨询有限公司

Dates

Publication Date: 20260505
Application Date: 20251223

Claims (10)

1. A method for question-answering of cross-border trade, characterized in that the method for question-answering of cross-border trade comprises: acquiring text data, image data, voice data and video data of a target problem; Determining a first feature based on the text data, determining a second feature based on the image data, determining a third feature based on the speech data, determining a fourth feature based on the video data; based on the first feature, the second feature, the third feature and the fourth feature, performing feature fusion through a graph attention mechanism model to obtain a fused intention feature; inputting the fused intention characteristic into a trained multi-label ViT-CNN mixed model to obtain an intention classification result output by the trained multi-label ViT-CNN mixed model; and determining reply data of the target problem through a near-end strategy optimization algorithm based on the intention classification result and a preset resource database, and sending the reply data to a user.
2. The cross-border trade question-answering method according to claim 1, wherein the determining a first feature based on the text data, determining a second feature based on the image data, determining a third feature based on the voice data, and determining a fourth feature based on the video data, comprises: Performing text cleaning on the text data to obtain text cleaned data; Inputting the data after text cleaning into a trained BERT-large model to obtain a first characteristic output by the trained BERT-large model; performing image enhancement on the image data to obtain image enhanced data; Inputting the data after image enhancement into a trained ResNet-50 model to obtain a second feature output by the trained ResNet-50 model; Carrying out noise reduction treatment on the voice data to obtain noise-reduced data; Extracting features of the data subjected to noise reduction processing through a Mel frequency cepstrum coefficient and a two-way long-short-term memory network model to obtain the third features; the fourth feature is determined based on the video data by a network model based on an expanded three-dimensional convolutional network model and a time-series self-attention mechanism.
3. The cross-border trade question-answering method according to claim 2, wherein the determining the fourth feature based on the video data by using an expanded three-dimensional convolutional network model and a time-series self-attention mechanism network model comprises: acquiring the total duration of the video data; based on the total duration of the video data, an initial sampling interval is determined by the following formula: Wherein, the For the initial sampling interval, Is the total duration of the video data; Sampling the video data according to the initial sampling interval to obtain sampled video data; extracting image features of the sampled video data through a third-generation lightweight convolutional neural network model, and calculating cosine similarity between all adjacent frames based on the image features of each frame; Calculating motion vector fields between all adjacent frames in the sampled video data through a French Beck optical flow algorithm; Extracting short-term time sequence characteristics of the sampled video data through the expanded three-dimensional convolution network model, and obtaining a time sequence characteristic sequence of the short-term time sequence characteristics through a 16-frame sliding window; Calculating time sequence consistency scores among all adjacent frames in the time sequence feature sequence; Screening video key frames from the sampled video data based on the cosine similarity, the motion vector field and the time sequence consistency score; and inputting the first video feature into the time sequence self-attention mechanism network model to obtain the fourth feature output by the time sequence self-attention mechanism network model.
4. The cross-border trade question-answering method according to claim 3, wherein the feature fusion is performed through a graph attention mechanism model based on the first feature, the second feature, the third feature and the fourth feature to obtain a fused intention feature, and the method comprises: determining a first weight value of the first feature, a second weight value of the second feature, a third weight value of the third feature, and a fourth weight value of the fourth feature; and based on the first weight value, the second weight value, the third weight value and the fourth weight value, carrying out weighted splicing on the first feature, the second feature, the third feature and the fourth feature to obtain the fused intention feature.
5. The cross-border trade questioning and answering method according to claim 4, wherein said determining a first weight value of said first feature, a second weight value of said second feature, a third weight value of said third feature, and a fourth weight value of said fourth feature comprises: constructing a historical training data set, wherein the historical training data set comprises manually marked historical intent labels of each historical training data, a modal quality score of each modal type of each historical training data, historical text data, historical image data, historical voice data and/or video data, the modal types comprise text modes, image modes, voice modes and video modes, and the modal quality score is a constant value representing the data quality of each modal type of each historical training data; And constructing an initial graph attention mechanism network, and training the initial graph attention mechanism network through the historical training data set to obtain a trained graph attention mechanism network and the first weight value, the second weight value, the third weight value and the fourth weight value which are output by the trained graph attention mechanism network.
6. The cross-border trade questioning and answering method according to claim 4, wherein said preset resource database comprises an image recognition library, a policy knowledge base, a video analysis tool and a financial institution model, wherein said determining answer data of said target question by a near-end policy optimization algorithm based on said intention classification result and preset resource database comprises: acquiring the resource load and the history matching success rate of the preset resource database; Constructing a reward function, wherein the reward function comprises a resource matching accuracy rate, a resource database response speed and a mode adaptation degree, the resource matching accuracy rate is a confidence value of a history log record, and the mode adaptation degree is obtained by matching a preset matching table based on a mode type; And taking the intention classification result, the resource load and the history matching success rate as input data, and calling the preset resource database through a near-end strategy optimization algorithm based on the reward function to obtain the reply data of the target problem.
7. The cross-border trade questioning and answering method according to claim 4, wherein said answer data comprises text mode answer data, voice mode answer data and video mode answer data, said sending said answer data to a user comprising: obtaining the user type of the user, wherein the user type comprises a new user and an old user; If the user type is an old user, sending the text mode reply data to the user; And in the case that the user type is a new user, sending the text mode reply data, the voice mode reply data and the video mode reply data to the user.
8. A cross-border trade question-answering system, the cross-border trade question-answering system comprising: The data acquisition module is used for acquiring text data, image data, voice data and video data of the target problem; the device comprises a text data module, a feature determining module, a third feature determining module, a fourth feature determining module and a fourth feature determining module, wherein the text data is used for determining a first feature, the image data is used for determining a second feature, the voice data is used for determining a third feature, and the video data is used for determining a fourth feature; The feature fusion module is used for carrying out feature fusion through a graph attention mechanism model based on the first feature, the second feature, the third feature and the fourth feature to obtain fused intention features; the model output module is used for inputting the fused intention characteristics into a trained multi-label ViT-CNN mixed model so as to obtain an intention classification result output by the trained multi-label ViT-CNN mixed model; And the reply data determining module is used for determining the reply data of the target problem through a near-end strategy optimization algorithm based on the intention classification result and a preset resource database, and transmitting the reply data to a user.
9. A cross-border trade question-answering apparatus comprising at least one control processor and a memory communicatively coupled to the at least one control processor, the memory storing instructions executable by the at least one control processor to enable the at least one control processor to perform a cross-border trade question-answering method according to any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon computer-executable instructions for causing a computer to perform a cross-border trade question-answering method according to any one of claims 1 to 7.

Description

Cross-border trade question-answering method, system, equipment and storage medium Technical Field The application relates to the technical field of question and answer related of cross-border trade, in particular to a question and answer method, a system, equipment and a storage medium of cross-border trade. Background Under the dual drive of deep pushing of economic globalization and vigorous development of digital trade, cross-border trade scale is continuously expanded, and the demands of trade bodies on the accuracy and high efficiency of services are increasingly severe. However, the conventional cross-border trade service platform is limited by inherent defects of a technical architecture and an algorithm system, and cannot deeply mine inherent relations among different modal data, so that the data processing efficiency is low, the information loss is serious, and the question-answering efficiency and the accuracy are low. Disclosure of Invention The present application aims to at least solve the technical problems existing in the prior art. Therefore, the application provides a question-answering method, a system, equipment and a storage medium for cross-border trade, which can realize deep intention analysis of multi-mode input and improve question efficiency and accuracy of cross-border trade. The application provides a question-answering method of cross-border trade, which comprises the following steps: acquiring text data, image data, voice data and video data of a target problem; Determining a first feature based on the text data, determining a second feature based on the image data, determining a third feature based on the speech data, determining a fourth feature based on the video data; based on the first feature, the second feature, the third feature and the fourth feature, performing feature fusion through a graph attention mechanism model to obtain a fused intention feature; inputting the fused intention characteristic into a trained multi-label ViT-CNN mixed model to obtain an intention classification result output by the trained multi-label ViT-CNN mixed model; and determining reply data of the target problem through a near-end strategy optimization algorithm based on the intention classification result and a preset resource database, and sending the reply data to a user. The question-answering method of cross-border trade according to the embodiment of the application has at least the following beneficial effects: The method comprises the steps of obtaining text data, image data, voice data and video data of a target problem, determining a first feature based on the text data, determining a second feature based on the image data, determining a third feature based on the voice data, determining a fourth feature based on the video data, carrying out feature fusion through a graph attention mechanism model based on the first feature, the second feature, the third feature and the fourth feature to obtain fused intention features, realizing multi-modal data fusion, providing more accurate data basis for subsequent steps, inputting the fused intention features into a trained multi-label ViT-CNN mixed model to obtain an intention classification result output by the trained multi-label ViT-CNN mixed model, realizing deep intention analysis of multi-modal input, finally determining answer data of the target problem based on the intention classification result and a preset resource database through a near-end policy optimization algorithm, and sending the answer data to a user, so that cross-trade question efficiency and accuracy are improved. According to some embodiments of the application, the determining a first feature based on the text data, determining a second feature based on the image data, determining a third feature based on the speech data, determining a fourth feature based on the video data, comprises: Performing text cleaning on the text data to obtain text cleaned data; Inputting the data after text cleaning into a trained BERT-large model to obtain a first characteristic output by the trained BERT-large model; performing image enhancement on the image data to obtain image enhanced data; Inputting the data after image enhancement into a trained ResNet-50 model to obtain a second feature output by the trained ResNet-50 model; Carrying out noise reduction treatment on the voice data to obtain noise-reduced data; Extracting features of the data subjected to noise reduction processing through a Mel frequency cepstrum coefficient and a two-way long-short-term memory network model to obtain the third features; the fourth feature is determined based on the video data by a network model based on an expanded three-dimensional convolutional network model and a time-series self-attention mechanism. According to some embodiments of the application, the determining the fourth feature by a network model based on an expanded three-dimensional convolutional network model and a time-series self-