CN-121982495-A - Detection method, apparatus, device, medium, and program product

CN121982495ACN 121982495 ACN121982495 ACN 121982495ACN-121982495-A

Abstract

The application provides a detection method which can be applied to the technical field of artificial intelligence. The detection method comprises the steps of dividing a first picture sample for training a large model into N areas, wherein N is an integer greater than or equal to 2, covering partial areas in the N areas with target watermarks to obtain M second picture samples, wherein different second picture samples are obtained by covering different areas with the target watermarks, M is an integer greater than or equal to 2, inputting the M second picture samples into the large model, obtaining M identification results output by the large model, wherein the identification results represent the prediction category probability of the second picture samples, and determining a detection result of a poisoning sample aiming at the large model according to the M identification results. The application also provides a detection device, a storage medium and a program product.

Inventors

AI YI
ZHANG NINGYUE
XIAO YAO
ZHOU YUCAN

Assignees

中国工商银行股份有限公司

Dates

Publication Date: 20260505
Application Date: 20250725

Claims (11)

1. A method of detection comprising: dividing a first picture sample for training a large model into N areas, wherein N is an integer greater than or equal to 2; Covering the target watermark on partial areas in the N areas to obtain M second picture samples, wherein different second picture samples are obtained by covering the target watermark on different areas, and M is an integer greater than or equal to 2; Inputting the M second picture samples into the large model, and obtaining M recognition results output by the large model, wherein the recognition results represent the prediction category probability of the second picture samples; and determining a toxicity sample detection result aiming at the large model according to the M identification results.
2. The method of claim 1, wherein the dividing the first picture sample for training the large model into N regions comprises: Identifying at least one target object region in the first picture sample using a pre-trained target detection model; dividing the at least one target object region into the N regions.
3. The method of claim 2, wherein the identifying at least one target object region in the first picture sample using a pre-trained target detection model comprises: dividing the first picture sample into a plurality of grid sets, wherein grids in each grid set have the same size, and grids in different grid sets have different sizes; The at least one target object region is identified based on the plurality of grid sets, wherein grids of different sizes are used to identify target object regions of different sizes.
4. A method according to any one of claims 1 to 3, wherein the covering the target watermark over a partial region of the N regions to obtain M second picture samples includes: covering each of the N areas with the target watermark in sequence to obtain N second picture samples; And covering the target watermark sequentially based on S combined areas in the N areas to obtain S second picture samples, wherein the combined areas comprise at least two adjacent areas, S is an integer greater than or equal to 1, and M is the sum of N and S.
5. The method of claim 4, wherein the N regions are N equally divided regions, and the S combined regions are obtained by combining any adjacent at least two regions of the N regions.
6. A method according to any one of claims 1 to 3, wherein the covering the target watermark over a partial region of the N regions to obtain M second picture samples includes: acquiring a plurality of target watermarks, wherein the plurality of target watermarks characterize different watermark contents; And covering the same partial region by using the target watermarks respectively to obtain a plurality of second picture samples.
7. The method of claim 1, wherein determining a toxicity sample detection result for the large model from the M recognition results comprises: Calculating information entropy based on the M recognition results; and under the condition that the value of the information entropy is larger than or equal to a preset value, determining that the first picture sample is not detoxified.
8. A detection device, the device comprising: The dividing module is used for dividing a first picture sample for training the large model into N areas, wherein N is an integer greater than or equal to 2; The covering module is used for covering the target watermark on partial areas in the N areas to obtain M second picture samples, wherein different second picture samples are obtained by covering the target watermark on different areas, and M is an integer greater than or equal to 2; The recognition module is used for inputting the M second picture samples into the large model, obtaining M recognition results output by the large model, wherein the recognition results represent the prediction category probability of the second picture samples, and And the determining module is used for determining a toxicity sample detection result aiming at the large model according to the M identification results.
9. An electronic device, comprising: One or more processors; a memory for storing one or more computer programs, Characterized in that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program or instructions is stored, which, when executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
11. A computer program product comprising a computer program or instructions which, when executed by a processor, implement the steps of the method according to any one of claims 1 to 7.

Description

Detection method, apparatus, device, medium, and program product Technical Field The present application relates to the field of artificial intelligence, in particular to large models and deep learning, and more particularly to a detection method, apparatus, device, medium and program product. Background With the wide application of large models in the fields of finance, medical treatment, security and the like, the poisoning attack aiming at the large models is a serious emphasis of novel network security. In the existing poisoning attack technology aiming at a large model, a discriminator and a trigger are usually trained reversely or an original model is reinforced (non-model detection), and a white box model is required to be extremely probable, namely, the training gradient of the model is known. The white box detection method is effective in some scenes, but has the problems of strong dependence, high cost of computing resources and time, high misjudgment rate and the like, so that the practical application is limited. Disclosure of Invention In view of the foregoing, the present application provides a detection method, apparatus, device, medium, and program product that saves computing resources and time costs, and improves detection efficiency. According to the first aspect of the application, a detection method is provided, and the detection method comprises the steps of dividing a first picture sample for training a large model into N areas, wherein N is an integer greater than or equal to 2, covering partial areas in the N areas with a target watermark to obtain M second picture samples, wherein different second picture samples are obtained by covering different areas with the target watermark, M is an integer greater than or equal to 2, inputting the M second picture samples into the large model, obtaining M identification results output by the large model, wherein the identification results represent the prediction category probability of the second picture samples, and determining a toxicity sample detection result for the large model according to the M identification results. According to an embodiment of the application, dividing the first picture sample for training the large model into N areas comprises identifying at least one target object area in the first picture sample by means of a pre-trained target detection model, and dividing the at least one target object area into N areas. According to an embodiment of the application, identifying at least one target object region in a first picture sample using a pre-trained target detection model comprises dividing the first picture sample into a plurality of grid sets, wherein grids in each grid set have the same size and grids in different grid sets have different sizes, and identifying at least one target object region based on the plurality of grid sets, wherein grids of different sizes are used for identifying target object regions of different sizes. According to the embodiment of the application, the target watermark is covered on partial areas in the N areas to obtain M second picture samples, wherein the target watermark is covered on each area in the N areas in sequence to obtain N second picture samples, and the target watermark is covered on the basis of S combined areas in the N areas in sequence to obtain S second picture samples, wherein the combined areas comprise at least two adjacent areas, S is an integer greater than or equal to 1, and M is the sum of N and S. According to an embodiment of the present application, the N regions are N equally divided regions, and the S combined regions are obtained by combining at least two regions arbitrarily adjacent to each other among the N regions. According to the embodiment of the application, the target watermark is covered on partial areas in the N areas, and the M second picture samples are obtained by acquiring a plurality of target watermarks, wherein the plurality of target watermarks represent different watermark contents, and the plurality of target watermarks are respectively used for covering the same partial area to obtain a plurality of second picture samples. According to the embodiment of the application, determining the detection result of the toxin-throwing sample aiming at the large model according to the M identification results comprises calculating information entropy based on the M identification results, and determining that the first picture sample is not thrown under the condition that the value of the information entropy is greater than or equal to a preset value. The second aspect of the application provides a detection device, which comprises a dividing module, a covering module, a determining module and a determining module, wherein the dividing module is used for dividing a first picture sample used for training a large model into N areas, N is an integer greater than or equal to 2, the covering module is used for covering a target watermark on partial ar