CN-121982716-A - Data processing method, device, computer equipment and medium based on artificial intelligence

CN121982716ACN 121982716 ACN121982716 ACN 121982716ACN-121982716-A

Abstract

The application belongs to the technical field of artificial intelligence and relates to a data processing method based on artificial intelligence, which comprises the following steps of performing attention calculation on input image data and initial text data based on a target large-scale multi-modal model to obtain an original attention weight matrix; dividing an original attention weight matrix to obtain a first attention weight set belonging to a preset convergence point mark and a second attention weight set belonging to an effective mark, setting the first attention weight set to be zero in the original attention weight matrix, carrying out attention redistribution on the second attention weight set to obtain a target attention weight matrix, carrying out enhancement calculation on the target attention weight matrix and the original visual mark embedding to obtain a target visual embedding, and carrying out reasoning on the target visual embedding based on a decoder to generate text data and outputting the text data. The application can be applied to image processing scenes in the financial science and technology field and the digital medical field, and improves the accuracy of generating answers.

Inventors

WANG JIANZONG
ZHANG XULONG
SHI JIAQI

Assignees

平安科技（深圳）有限公司

Dates

Publication Date: 20260505
Application Date: 20260119

Claims (10)

1. A data processing method based on artificial intelligence, comprising the steps of: acquiring input image data and initial text data corresponding to the image data; Performing attention calculation processing on the image data and the initial text data based on a preset target large-scale multi-modal model to obtain a corresponding original attention weight matrix; Dividing the original attention weight matrix to obtain a first attention weight set belonging to a preset convergence point mark and a second attention weight set belonging to an effective mark; Setting the first attention weight set to zero in the original attention weight matrix, and performing attention redistribution processing on the second attention weight set to obtain a processed target attention weight matrix; performing enhancement calculation processing on the target attention weight matrix and a preset original visual mark embedding to obtain an enhanced target visual embedding; performing reasoning processing on the target visual embedding and the initial text data based on a decoder in the target large-scale multi-modal model to generate corresponding text data; and outputting the text data.
2. The method for processing artificial intelligence-based data according to claim 1, wherein the target large multi-modal model includes a visual encoder and a text encoder, and the step of performing attention calculation processing on the image data and the initial text data based on the preset target large multi-modal model to obtain a corresponding original attention weight matrix specifically includes: processing the image data based on the visual encoder to obtain a corresponding visual mark; processing the initial text data based on the text encoder to obtain a corresponding text mark; Calculating the visual mark and the text mark based on a preset cross attention mechanism to obtain a corresponding first calculation result; And taking the first calculation result as the original attention weight matrix.
3. The artificial intelligence based data processing method according to claim 1, wherein the step of performing attention redistribution processing on the second set of attention weights specifically comprises: Acquiring the sum of all weight values in the second attention weight set; invoking a preset renormalization strategy; and based on the renormalization strategy, renormalizing the second attention weight set by using the sum to finish attention reassignment processing of the second attention weight set.
4. The artificial intelligence based data processing method according to claim 1, wherein the step of performing enhancement calculation processing on the target attention weight matrix and a preset original visual marker embedding to obtain an enhanced target visual embedding specifically comprises: Acquiring an original visual mark embedding corresponding to the image data; acquiring a preset enhanced calculation strategy; Performing calculation processing on the target attention weight matrix and the original visual mark embedding based on the enhanced calculation strategy to obtain a corresponding second calculation result; And taking the second calculation result as the target vision to be embedded.
5. The artificial intelligence based data processing method according to claim 1, wherein the step of generating corresponding text data by performing inference processing on the initial text data and the target visual embedding by a decoder in the target large multi-modal model specifically comprises: acquiring a preset decoding optimization strategy; Performing optimization processing of decoding processing on the decoder based on the decoding optimization strategy to obtain a corresponding target decoder; performing inference processing on the target visual embedding and the initial text data based on the target decoder to generate a corresponding inference result; and taking the reasoning result as the text data.
6. The artificial intelligence based data processing method according to claim 1, wherein the step of performing output processing on the text data specifically comprises: Optimizing the text data based on a preset content optimizing strategy to obtain optimized target text data; Acquiring a preset text output mode; and carrying out output processing on the target text data based on the text output mode.
7. The artificial intelligence-based data processing method according to claim 6, wherein the optimizing the text data based on a preset content optimizing policy, to obtain optimized target text data, specifically comprises: performing visual information completion processing on the text data to obtain corresponding first text data; performing logic consistency optimization processing on the first text data to obtain corresponding second text data; Carrying out semantic richness optimization processing on the second text data to obtain corresponding third text data; And taking the third text data as the target text data.
8. An artificial intelligence based data processing apparatus comprising: the acquisition module is used for acquiring input image data and initial text data corresponding to the image data; The first calculation module is used for carrying out attention calculation processing on the image data and the initial text data based on a preset target large-scale multi-modal model to obtain a corresponding original attention weight matrix; the dividing module is used for dividing the original attention weight matrix to obtain a first attention weight set belonging to a preset convergence point mark and a second attention weight set belonging to an effective mark; the processing module is used for setting the first attention weight set to zero in the original attention weight matrix, and carrying out attention redistribution processing on the second attention weight set to obtain a processed target attention weight matrix; the second calculation module is used for carrying out enhancement calculation processing on the target attention weight matrix and the preset original visual mark embedding to obtain enhanced target visual embedding; The reasoning module is used for carrying out reasoning processing on the target visual embedding and the initial text data based on a decoder in the target large-scale multi-modal model to generate corresponding text data; And the output module is used for carrying out output processing on the text data.
9. A computer device comprising a memory having stored therein computer readable instructions which when executed implement the steps of the artificial intelligence based data processing method of any of claims 1 to 7.
10. A computer readable storage medium, characterized in that it has stored thereon computer readable instructions which, when executed by a processor, implement the steps of the artificial intelligence based data processing method according to any of claims 1 to 7.

Description

Data processing method, device, computer equipment and medium based on artificial intelligence Technical Field The application relates to the technical field of artificial intelligence, which can be applied to the fields of financial science and technology, digital medical treatment and the like, in particular to a data processing method, a data processing device, computer equipment and a storage medium based on artificial intelligence. Background At the present time of rapid development of large multi-modal model (LMM) technology, the method has great potential in the aspect of processing multi-modal data, and is widely applied to complex task scenes such as open question-answering, image description generation and the like. However, the existing LMM model has a significant drawback in the visual information integration process in that the model can continuously and stably assign high attention weights to specific, insignificant visual markers in the image, resulting in low visual information extraction efficiency. This inefficient attention distribution mechanism makes it difficult for the model to accurately capture the core semantic information in the image when generating the answer, thereby producing an erroneous description that is inconsistent with the image content, significantly reducing the accuracy of the answer. Especially in the open question-answering task, the problems further limit the understanding and response capability of the model to the complex problems, and restrict the performance of the model in practical application. For example, in an insurance claims audit scenario in the field of financial insurance, existing LMM models may focus high weights on irrelevant background elements in the image (e.g., ground reflections at the scene of an accident) due to attention weight distribution bias, while ignoring critical areas of vehicle damage (e.g., collision pits, bumper breaks). This results in the model misjudging the damage level or missing key evidence when generating the claims conclusion, increasing the risk of claims disputes, and reducing the efficiency and fairness of insurance services. For another example, in the medical image diagnosis assisting scenario in the medical field, the existing model may weaken the attention distribution to lesion features (such as tumor boundary, abnormal density shadow) due to excessive attention to non-lesion regions (such as normal tissue texture) in the image. This may lead to a model generated diagnostic advice that does not match the actual condition, delays patient treatment opportunities, and even poses a risk of misdiagnosis, severely affecting the reliability and safety of medical decisions. Therefore, an improved LMM model attention allocation mechanism is needed to optimize the visual information integration efficiency, and improve the accuracy and robustness of the model in multi-modal tasks, so as to promote the practical application value of the model in key fields such as financial insurance and digital medical treatment. Disclosure of Invention The embodiment of the application aims to provide a data processing method, a device, computer equipment and a storage medium based on artificial intelligence, so as to solve the technical problem that the accuracy of answers is obviously reduced by an inefficient attention distribution mechanism adopted by the existing large-scale multi-mode model. In a first aspect, there is provided an artificial intelligence based data processing method, comprising: acquiring input image data and initial text data corresponding to the image data; Performing attention calculation processing on the image data and the initial text data based on a preset target large-scale multi-modal model to obtain a corresponding original attention weight matrix; Dividing the original attention weight matrix to obtain a first attention weight set belonging to a preset convergence point mark and a second attention weight set belonging to an effective mark; Setting the first attention weight set to zero in the original attention weight matrix, and performing attention redistribution processing on the second attention weight set to obtain a processed target attention weight matrix; performing enhancement calculation processing on the target attention weight matrix and a preset original visual mark embedding to obtain an enhanced target visual embedding; performing reasoning processing on the target visual embedding and the initial text data based on a decoder in the target large-scale multi-modal model to generate corresponding text data; and outputting the text data. In a second aspect, there is provided an artificial intelligence based data processing apparatus comprising: the acquisition module is used for acquiring input image data and initial text data corresponding to the image data; The first calculation module is used for carrying out attention calculation processing on the image data and the initial text data based