EP-4736047-A1 - A METHOD OF ASSESSING VULNERABILITY OF AN AI MODEL AND A FRAMEWORK THEREOF

EP4736047A1EP 4736047 A1EP4736047 A1EP 4736047A1EP-4736047-A1

Abstract

This invention discloses a framework (100) for assessing vulnerability of an AI model (M) and method (200) thereof. The framework (100) comprises a stolen AI Model (S), an XAI module (30) and at least a processor (20). The AI Model (M) is fed with a first set of pre-determined attack vectors to generate a first output by means of the processor (20). The stolen AI Model is initialized by examining the input predetermined attack vectors and the corresponding first output. The processor (20) is configured to update the stolen AI model (S) to an updated stolen AI model (S1...Sn) after performing multiple iterations of method step (203) by using the XAI module (30) for the stolen AI model (S). The processor (20) analyzes responses of the updated stolen AI model (Sn) for random inputs to assess vulnerability of the AI model (M).

Inventors

MANKODIYA, Harsh
PARMAR, MANOJKUMAR SOMABHAI
Kulkarni, Pavan
YUVARAJ, Govindarajulu

Assignees

Robert Bosch GmbH
Bosch Global Software Technologies Private Limited

Dates

Publication Date: 20260506
Application Date: 20240618

Claims (6)

1 . A method (200) of assessing vulnerability of an Al model (M), the method comprising: feeding (201) a first set of pre-determined attack vectors to the Al Model (M) to generate a first output by means of a processor (20); initializing (202) a stolen Al Model (S) by examining the input predetermined attack vectors and the corresponding first output; updating (203) the stolen Al model (S,S1 ...Sn) using an XAI module (30) for the stolen Al model (S,S1 ...Sn); analyzing (204) responses of the updated stolen Al model (Sn) by means of the processor (20) for random input to assess vulnerability of the Al model (M).
2. The method (200) of assessing vulnerability of an Al model (M) as claimed in claim 1 , wherein the updating (203) the stolen Al model further comprises the sub-steps: providing a random chosen input from a test set to the XAI module (30) for the stolen Al model (S) to get a saliency map (SM); comparing the saliency map (SM) with the random input to identify low importance and high importance features; adding perturbations in the low importance features of the random input to generate a refined attack vector (AV); feeding the refined attack vector (AV) as input to the Al Model (M) to generate a second output by means of the processor (20); updating the stolen Al model (S to S1 ) by examining the input refined attack vector (AV) and the corresponding second output.
3. The method (200) of assessing vulnerability of an Al model (M) as claimed in claim 1 , wherein multiple iterations of the sub-steps claimed in claim 2 are performed for different random inputs chosen from the test dataset on the latest update of the stolen Al model (S1) to get the eventual updated stolen Al model (Sn).
4. A framework (100) for assessing the vulnerability of an Al Model (M), the framework (100) comprising a stolen Al Model (S), an XAI module (30) in communication with the stolen Al model (S) and at least a processor (20), said processor (20) in communication with the Al model (M), characterized in that framework: the processor (20) configured to: feed a first set of pre-determined attack vectors to the Al Model (M) to generate a first output; initialize a stolen Al Model (S) by examining the input predetermined attack vectors and the corresponding first output; update the stolen Al model (S,S1 ...Sn) using the XAI module (30) for the stolen Al model (S,S1 ,...Sn); analyze responses of the updated stolen Al model (Sn) for random inputs to assess vulnerability of the Al model (M).
5. The framework (100) for assessing the vulnerability of an Al Model (M) as claimed in claim 5, wherein the processor (20) is configured to: provide a random input chosen from a test dataset to the XAI module (30) for the stolen Al model (S) to get a saliency map (SM1); compare the saliency map (SM1) with the random input to identify low importance and high importance features; add perturbations in the low importance features of the random input to generate a refined attack vector (AV); feed the refined attack vector (AV) as input to the Al Model (M) to generate a second output by means of the processor (20); update the stolen Al model (from S to S1 ) by examining the input refined attack vector and the corresponding second output.
6. The framework (100) for assessing the vulnerability of an Al Model (M) as claimed in claim 5, wherein the processor (20) performs multiple iterations of the sub-steps claimed in claim 5 for different random inputs chosen from the test dataset on the latest update of the stolen Al model (S1) to get the eventual updated stolen Al model (Sn).

Description

COMPLETE SPECIFICATION Title of the Invention: A method of assessing of an Al model and a framework thereof Complete Specification: The following specification describes and ascertains the nature of this invention and the manner in which it is to be performed. Field of the invention [0001] The present disclosure relates to the field of Artificial Intelligence security. In particular, it proposes a method of assessing vulnerability of an Al Model and a framework thereof. Background of the invention [0002] With the advent of data science, data processing and decision making systems are implemented using artificial intelligence modules. The artificial intelligence modules use different techniques like machine learning, neural networks, deep learning etc. Most of the Al based systems, receive large amounts of data and process the data to train Al models. Trained Al models generate output based on the use cases requested by the user. Typically the Al systems are used in the fields of computer vision, speech recognition, natural language processing, audio recognition, healthcare, autonomous driving, manufacturing, robotics etc. where they process data to generate required output based on certain rules/intelligence acquired through training. [0003] To process the inputs and give a desired output, the Al systems use various models/algorithms which are trained using the training data. Once the Al system is trained using the training data, the Al systems use the models to analyze the real time data and generate appropriate result. The models may be fine-tuned in realtime based on the results. The Al models in the Al systems form the core of the system. Lots of effort, resources (tangible and intangible), and knowledge goes into developing these models. [0004] It is possible that some adversary may try to tamper/manipulate/evade the Al model to create incorrect outputs. The adversary may use different techniques to manipulate the output of the model. One of the simplest techniques used by the adversary is where the adversary sends queries to the Al system using his own test data to compute or approximate the gradients through the model. Based on these gradients, the adversary can then manipulate the input in order to manipulate the output of the Model. Another technique is wherein the adversary may manipulate the input data to bring an artificial output. This will cause hardships to the original developer of the Al in the form of business disadvantages, loss of confidential information, loss of lead time spent in development, loss of intellectual properties, loss of future revenues etc. Hence there is a need to identify samples in the test data or generate samples that can efficiently extract internal information about the working/architecture of these models and assess the vulnerability of the Al system against those sample-based queries. [0005] There are methods known in the prior arts on the method of attacking an Al System. The prior art WO2021/095984 A1 - Apparatus and Method for Retraining Substitute Model for Evasion Attack and Evasion attack Apparatus discloses one such method. The method talks about retraining a substitute model that partially imitates the target model by allowing the target model to misclassify for specific attack data. However, in a classifier type Al Model there is a need to identify adversarial input of attack vectors spread across all classes and test the vulnerability of the Al Model against them. Brief description of the accompanying drawings [0006] An embodiment of the invention is described with reference to the following accompanying drawings: [0007] Figure 1 depicts a framework for assessing vulnerability Al Model (M); [0008] Figure 2 depicts an Al system (10); [0009] Figure 2 illustrates method steps (200) of assessing vulnerability of the Al model (M). [0010] Figure 4 is a process flow diagram for method step 203. Detailed description of the drawings [0011] It is important to understand some aspects of artificial intelligence (Al) technology and artificial intelligence (Al) based systems or artificial intelligence (Al) system. Some important aspects of the Al technology and Al systems can be explained as follows. Depending on the architecture of the implements Al systems may include many components. One such component is an Al model. A model can be defined as reference or an inference set of data, which is use different forms of correlation matrices. Using these models and the data from these models, correlations can be established between different types of data to arrive at some logical understanding of the data. A person skilled in the art would be aware of the different types of Al models such as linear regression, naive bayes classifier, support vector machine, neural networks and the like. It must be understood that this disclosure is not specific to the type of model being executed and can be applied to any Al module irrespective of the Al model being executed. A