KR-20260065460-A - METHOD FOR PERFORMING FEDERATED LEARNING CONSIDERING RESPONSIBLE ARTIFICIAL INTELLIGENCE, ELECTRONIC DEVICE SUPPORTING THE SAME, AND RECORDING MEDIUM

KR20260065460AKR 20260065460 AKR20260065460 AKR 20260065460AKR-20260065460-A

Abstract

According to one embodiment, the electronic device comprises at least one processor including a communication circuit and a processing circuit, and a memory for storing instructions, wherein the instructions, when executed individually or collectively by the at least one processor, may cause the electronic device to transmit a first global model to the plurality of external electronic devices through the communication circuit, receive a plurality of first local models from the plurality of external electronic devices through the communication circuit, obtain a plurality of second local models fine-tuned to output an ethical answer in response to the red teaming prompt based on fine-tuning the plurality of first local models using a red teaming prompt, and transmit a second global model obtained based on a plurality of parameter sets corresponding to the plurality of second local models through the communication circuit to the plurality of external electronic devices.

Inventors

노은정
박두건

Assignees

삼성전자주식회사

Dates

Publication Date: 20260508
Application Date: 20250120
Priority Date: 20241101

Claims (20)

In the electronic device (201), Communication circuit (210); At least one processor (230) including a processing circuit; and The electronic device (201) includes a memory (220) for storing instructions, and when the instructions are executed individually or collectively by the at least one processor (230): Through the communication circuit (210), a first global model is transmitted to a plurality of external electronic devices—the global model includes a parameter set for the plurality of external electronic devices to perform fine-tuning on a generative AI model stored in each of the plurality of external electronic devices—, Through the communication circuit (210), a plurality of first local models are received from a plurality of external electronic devices—the first local model includes a parameter set obtained by training a generative AI model corresponding to the first global model using user data obtained by an external electronic device transmitting the first local model, the first local model is included in the plurality of first local models, and the external electronic device is included in the plurality of external electronic devices—, Based on performing fine-tuning of the plurality of first local models using a red teaming prompt, a plurality of second local models fine-tuned to output an ethical answer in response to the red teaming prompt—the red teaming prompt includes at least one query that elicits an inappropriate answer—, An electronic device (201) that causes a second global model obtained based on a plurality of parameter sets corresponding to the plurality of second local models to be transmitted to the plurality of external electronic devices through the communication circuit (210).
In Article 1, The above instructions, when executed individually or collectively by the at least one processor (230), cause the electronic device (201): By inputting test data into the plurality of first local models, based on information output from the plurality of first local models, a plurality of unsafe local models among the plurality of first local models that output an unethical answer in response to the test data are identified—the test data includes a plurality of queries that require the local models to output an unethical answer—, By inputting the red teaming prompt into the plurality of unsafe local models, a first data including an unethical response to the red teaming prompt is obtained based on information output from the plurality of unsafe local models, and By inputting the first data into the plurality of unsafe local models, a second data including an ethical answer is obtained based on information output from the plurality of unsafe local models based on at least one constitution established for the plurality of unsafe local models, and An electronic device (201) that causes to obtain the plurality of second local models fine-tuned to output an ethical answer in response to the red teaming prompt, based on performing parameter-efficient fine-tuning (PEFT) on the plurality of unsafe local models using the red teaming prompt and the second data.
In Article 1 or Article 2, The above instructions, when executed individually or collectively by the at least one processor (230), cause the electronic device (201): Based on at least one principle above, and based on information output from the plurality of unsafe local models in response to the first data, a third data including a critique of the first data is obtained, and An electronic device (201) that causes to obtain the second data including a revised answer to the first data based on information output from the plurality of unsafe local models based on the third data.
In any one of paragraphs 1 to 3, The above instructions, when executed individually or collectively by the at least one processor (230), cause the electronic device (201): By inputting the test data into the plurality of first local models, based on the information output from the plurality of first local models, a plurality of safe local models among the plurality of first local models that output an ethical response in response to the test data are identified, and An electronic device (201) that causes to obtain a second global model that outputs an ethical answer in response to a query requesting an unethical answer, based on a plurality of parameter sets corresponding to the plurality of safe local models.
In any one of paragraphs 1 to 4, The above instructions, when executed individually or collectively by the at least one processor (230), cause the electronic device (201): By inputting a plurality of queries included in the test data into one of the plurality of first local models, a plurality of answers corresponding to the plurality of queries are obtained based on information output from the first local model, and Based on performing safety filtering on the plurality of queries and the plurality of answers, information indicating whether each of the plurality of queries and the plurality of answers is an ethical sentence is obtained, and Based on the information obtained above, the ratio of the ethical sentence among the plurality of queries and the plurality of answers is obtained, and An electronic device (201) that causes the first local model to be identified as an unsafe local model based on the fact that the above-mentioned acquired ratio is less than a set threshold ratio.
In any one of paragraphs 1 to 5, The above instructions, when executed individually or collectively by the at least one processor (230), cause the electronic device (201): By inputting multiple queries included in the test data into one of the multiple second local models, multiple answers corresponding to the multiple queries are obtained based on information output from the second local model, and Based on performing safety filtering on the plurality of queries and the plurality of answers, information indicating whether each of the plurality of queries and the plurality of answers is an ethical sentence is obtained, and Based on the information obtained above, the ratio of the ethical sentence among the plurality of queries and the plurality of answers is obtained, and An electronic device (201) that causes the second local model to be confirmed as a safe local model based on the fact that the above-mentioned acquired ratio exceeds a set threshold ratio.
In any one of paragraphs 1 to 6, The above instructions, when executed individually or collectively by the at least one processor (230), cause the electronic device (201): Receives a request from an external electronic device for providing a global model obtained based on multiple parameter sets corresponding to multiple local models, and An electronic device (201) that causes the identification information of the external electronic device to be registered in a group of client devices for federated learning based on the above request.
In the electronic device (201), Communication circuit (210); At least one processor (230) including a processing circuit; and The electronic device (201) includes a memory (220) for storing instructions, and when the instructions are executed individually or collectively by the at least one processor (230): Through the communication circuit (210), a first global model is transmitted to the plurality of external electronic devices—the global model includes a parameter set for performing fine-tuning on a generative AI model stored in each of the plurality of external electronic devices—, Through the communication circuit (210), a plurality of first local models are received from a plurality of external electronic devices—the first local model includes a parameter set obtained by training a generative AI model corresponding to the first global model using user data obtained by an external electronic device transmitting the first local model, the first local model is included in the plurality of first local models, and the external electronic device is included in the plurality of external electronic devices—, An integrated global model is obtained based on a plurality of parameter sets corresponding to the first local models, and Based on performing fine-tuning on the integrated global model using a red teaming prompt, a second global model fine-tuned to output an ethical answer in response to the red teaming prompt is obtained, and An electronic device (201) that causes the second global model to be transmitted to the plurality of external electronic devices through the communication circuit (210).
In Article 8, The above instructions, when executed individually or collectively by the at least one processor (230), cause the electronic device (201): By inputting the red teaming prompt into the integrated global model, data including an unethical response to the red teaming prompt is obtained based on the information output from the integrated global model, and Based on at least one principle established for the integrated global model, data including a critique of the unethical answer is obtained based on information output from the integrated global model in response to data including the unethical answer, and In response to data including criticism of the aforementioned unethical answer, based on information output from the integrated global model, data including a modified answer to the aforementioned unethical answer is obtained, and An electronic device (201) that causes to obtain a second global model fine-tuned to output an ethical answer in response to the red teaming prompt, based on performing parameter-efficient fine-tuning (PEFT) on the integrated global model using data including the red teaming prompt and the modified answer.
In terms of method, An operation of transmitting a first global model to a plurality of external electronic devices through a communication circuit (210) of an electronic device (201)—the global model includes a parameter set for the plurality of external electronic devices to perform fine-tuning on a generative AI model stored in each of the plurality of external electronic devices—; An operation of receiving a plurality of first local models from a plurality of external electronic devices through the above communication circuit (210)—the first local model includes a parameter set obtained by training a generative AI model corresponding to the first global model using user data obtained by an external electronic device transmitting the first local model, the first local model is included in the plurality of first local models, and the external electronic device is included in the plurality of external electronic devices—; Based on performing fine-tuning of the plurality of first local models using a red teaming prompt, the operation of obtaining a plurality of second local models fine-tuned to output an ethical answer in response to the red teaming prompt—the red teaming prompt comprises at least one query that elicits an inappropriate answer—; and A method comprising the operation of transmitting a second global model obtained based on a plurality of parameter sets corresponding to the plurality of second local models to the plurality of external electronic devices through the communication circuit (210).
In Article 10, Based on performing fine-tuning on the plurality of first local models using the red teaming prompt, the operation of obtaining the plurality of second local models fine-tuned to output an ethical answer in response to the red teaming prompt is: An operation to identify a plurality of unsafe local models among the plurality of first local models that output an unethical answer in response to the test data, based on information output from the plurality of first local models by inputting test data into the plurality of first local models—the test data includes a plurality of queries that require the local models to output an unethical answer—; The operation of obtaining first data including an unethical response to the red teaming prompt based on information output from the plurality of unsafe local models by inputting the red teaming prompt to the plurality of unsafe local models; The operation of obtaining second data including an ethical answer based on information output from the plurality of unsafe local models based on at least one constitution established for the plurality of unsafe local models by inputting the first data into the plurality of unsafe local models; and A method comprising the operation of obtaining a plurality of second local models that are fine-tuned to output an ethical answer in response to the red teaming prompt, based on performing parameter-efficient fine-tuning (PEFT) on the plurality of unsafe local models using the red teaming prompt and the second data.
In Article 10 or Article 11, The operation of obtaining second data including an ethical answer based on information output from the plurality of unsafe local models based on at least one principle established for the plurality of unsafe local models by inputting the first data into the plurality of unsafe local models is: An operation of obtaining third data including a critique of the first data based on information output from the plurality of unsafe local models in response to the first data based on the at least one principle above; and A method comprising the operation of obtaining the second data including a revised answer to the first data based on information output from the plurality of unsafe local models based on the third data.
In any one of paragraphs 10 to 12, An operation to identify a plurality of safe local models among the plurality of first local models that output an ethical response in response to the test data, based on information output from the plurality of first local models by inputting the test data into the plurality of first local models; and A method further comprising the operation of obtaining a second global model that outputs an ethical answer in response to a query requesting an unethical answer, based on a plurality of parameter sets corresponding to the plurality of safe local models.
In any one of paragraphs 10 to 13, The operation of identifying a plurality of unsafe local models among the plurality of first local models that output an unethical response in response to the test data, based on information output from the plurality of first local models by inputting test data into the plurality of first local models, is: An operation of obtaining multiple answers corresponding to the multiple queries based on information output from the first local model by inputting multiple queries included in the test data into one of the multiple first local models; An operation of obtaining information indicating whether each of the plurality of queries and the plurality of answers is an ethical sentence, based on performing safety filtering on the plurality of queries and the plurality of answers; Based on the information obtained above, the operation of obtaining the ratio of the ethical sentence among the plurality of queries and the plurality of answers; and A method comprising an operation to determine that the first local model is an unsafe local model based on the fact that the above-mentioned obtained ratio is less than a set threshold ratio.
In any one of paragraphs 10 to 14, An operation of obtaining multiple answers corresponding to the multiple queries based on information output from the second local model by inputting multiple queries included in the test data into any one of the multiple second local models; An operation of obtaining information indicating whether each of the plurality of queries and the plurality of answers is an ethical sentence, based on performing safety filtering on the plurality of queries and the plurality of answers; Based on the information obtained above, the operation of obtaining the ratio of the ethical sentence among the plurality of queries and the plurality of answers; and A method further comprising an operation to confirm that the second local model is a safe local model based on the fact that the above-mentioned obtained ratio exceeds a set threshold ratio.
In any one of Articles 10 to 15, The operation of receiving a request from an external electronic device for providing a global model obtained based on a plurality of parameter sets corresponding to a plurality of local models; and A method further comprising the operation of registering identification information of the external electronic device to a group of client devices for federated learning based on the above request.
In terms of method, An operation of transmitting a first global model to the plurality of external electronic devices through a communication circuit (210) of an electronic device (201)—the global model includes a parameter set for performing fine-tuning on a generative AI model stored in each of the plurality of external electronic devices—; An operation of receiving a plurality of first local models from a plurality of external electronic devices through the above communication circuit (210)—the first local model includes a parameter set obtained by training a generative AI model corresponding to the first global model using user data obtained by an external electronic device transmitting the first local model, the first local model is included in the plurality of first local models, and the external electronic device is included in the plurality of external electronic devices—; An operation to obtain an integrated global model based on a plurality of parameter sets corresponding to the first local models; The operation of obtaining a second global model fine-tuned to output an ethical answer in response to the red teaming prompt, based on performing fine-tuning on the integrated global model using the red teaming prompt; and A method comprising the operation of transmitting the second global model to the plurality of external electronic devices through the communication circuit (210).
In Article 17, Based on performing fine-tuning on the integrated global model using the red teaming prompt, the operation of obtaining a second global model fine-tuned to output an ethical answer in response to the red teaming prompt is: An operation of obtaining first data including an unethical response to the red teaming prompt based on information output from the integrated global model by inputting the red teaming prompt into the integrated global model; The operation of obtaining second data including an ethical answer based on information output from the integrated global model based on at least one constitution established for the integrated global model by inputting the first data into the integrated global model; and A method comprising the operation of obtaining a second global model fine-tuned to output an ethical answer in response to the red teaming prompt, based on performing parameter-efficient fine-tuning (PEFT) on the integrated global model using the red teaming prompt and the second data.
In a non-transient computer-readable storage medium recording computer-executable instructions, the computer-executable instructions, when executed individually or collectively by at least one processor (230), the electronic device (201), Through the communication circuit (210) of the electronic device (201), a first global model is transmitted to a plurality of external electronic devices—the global model includes a parameter set for the plurality of external electronic devices to perform fine-tuning on a generative AI model stored in each of the plurality of external electronic devices—, Through the communication circuit (210), a plurality of first local models are received from a plurality of external electronic devices—the first local model includes a parameter set obtained by training a generative AI model corresponding to the first global model using user data obtained by an external electronic device transmitting the first local model, the first local model is included in the plurality of first local models, and the external electronic device is included in the plurality of external electronic devices—, Based on performing fine-tuning of the plurality of first local models using a red teaming prompt, a plurality of second local models fine-tuned to output an ethical answer in response to the red teaming prompt are obtained—the red teaming prompt includes at least one query that elicits an inappropriate answer—, A storage medium that causes a second global model obtained based on a plurality of parameter sets corresponding to the plurality of second local models to be transmitted to the plurality of external electronic devices through the communication circuit (210).
In a non-transient computer-readable storage medium recording computer-executable instructions, the computer-executable instructions, when executed individually or collectively by at least one processor (230), the electronic device (201), Through the communication circuit (210), a first global model is transmitted to the plurality of external electronic devices—the global model includes a parameter set for performing fine-tuning on a generative AI model stored in each of the plurality of external electronic devices—, Through the communication circuit (210), a plurality of first local models are received from a plurality of external electronic devices—the first local model includes a parameter set obtained by training a generative AI model corresponding to the first global model using user data obtained by an external electronic device transmitting the first local model, the first local model is included in the plurality of first local models, and the external electronic device is included in the plurality of external electronic devices—, An integrated global model is obtained based on a plurality of parameter sets corresponding to the first local models, and Based on performing fine-tuning on the integrated global model using a red teaming prompt, a second global model fine-tuned to output an ethical answer in response to the red teaming prompt is obtained, and A storage medium that causes the second global model to be transmitted to the plurality of external electronic devices through the communication circuit (210).

Description

Method for performing federated learning considering responsible artificial intelligence, electronic device supporting the same, and storage medium Embodiments of the present disclosure relate to a method for performing federated learning considering responsible AI, an electronic device supporting the same, and a storage medium. For many people living in the modern era, portable digital communication devices have become an essential element. Consumers want to use these devices to receive a variety of high-quality services of their choice anytime and anywhere. Analytical artificial intelligence (analytical AI) models can perform data analysis and/or pattern recognition. In contrast, generative AI models can generate data or content in response to user input and provide the generated content. As the development of deep learning models used as generative AI models becomes more advanced, the quality of the data or content provided by generative AI models is also improving. The types of generation tasks may include, for example, text generation, image generation, code generation, speech generation, and/or video generation. Users can select an AI model that supports the desired generation task and use the service for that AI model. The information described above may be provided as related art for the purpose of aiding understanding of this document. None of the above is to be claimed as prior art related to this document, nor can it be used to determine prior art. FIG. 1 is a block diagram of an electronic device in a network environment according to one embodiment of the present disclosure. FIG. 2 is a drawing for explaining an example of the configuration of an electronic device in a network environment according to one embodiment of the present disclosure. FIG. 3 is an illustrative diagram for explaining a method of performing federated learning of an electronic device according to one embodiment of the present disclosure. FIG. 4 is a flowchart illustrating a method for an electronic device according to one embodiment of the present disclosure to perform federated learning using a plurality of clients. FIG. 5 is an illustrative diagram for explaining a local model transmitted to an electronic device according to one embodiment of the present disclosure. FIG. 6 is a flowchart illustrating a method for an electronic device according to one embodiment of the present disclosure to transmit a global model to a plurality of external electronic devices, which outputs an ethical answer in response to a query that elicits an unethical answer. FIG. 7 is an illustrative diagram illustrating a method for an electronic device according to one embodiment of the present disclosure to transmit a global model that outputs an ethical answer in response to a query that elicits an unethical answer to a plurality of external electronic devices. FIG. 8 is a flowchart illustrating a method for an electronic device according to one embodiment of the present disclosure to acquire a plurality of second local models finely tuned to output an ethical answer in response to a red teaming prompt. FIGS. 9a and 9b are exemplary diagrams illustrating a method for an electronic device according to one embodiment of the present disclosure to train a large-scale language model to output an ethical answer in response to a query that elicits an unethical answer. FIGS. 10a and FIGS. 10b are exemplary diagrams illustrating a method for an electronic device according to one embodiment of the present disclosure to perform fine-tuning on a large-scale language model using training data. FIG. 11 is a flowchart illustrating a method for obtaining a global model that outputs an ethical answer in response to a query requesting an unethical answer, based on identifying a plurality of safe local models among a plurality of local models that output an ethical answer in response to test data, according to one embodiment of the present disclosure. FIG. 12 is a flowchart illustrating a method for determining whether one of a plurality of local models received from a plurality of external electronic devices is a safe local model, according to one embodiment of the present disclosure. FIG. 13 is a flowchart illustrating a method for determining whether one of a plurality of local models obtained based on an electronic device according to one embodiment of the present disclosure is a safe local model. FIG. 14 is a flowchart illustrating a method for an electronic device according to one embodiment of the present disclosure to transmit a global model to a plurality of external electronic devices, which outputs an ethical answer in response to a query that elicits an unethical answer. FIG. 15 is an illustrative diagram illustrating a method for an electronic device according to one embodiment of the present disclosure to transmit a global model to a plurality of external electronic devices, which outputs an ethical answer in response to a query that elicits an unethi