EP-4738353-A1 - METHODS AND SYSTEMS FOR ENHANCING THE DETECTION OF FRAUDULENT AUDIO DATA

EP4738353A1EP 4738353 A1EP4738353 A1EP 4738353A1EP-4738353-A1

Abstract

A method for enhancing the detection of fraudulent audio data is provided that includes capturing audio data of a user speaking during an authentication transaction, dividing the audio data into segments, determining a quality control vector for each segment, and determining whether each segment is of adequate quality. Moreover, the method includes calculating a voice replay score and a voice cloning detection score for each adequate quality segment, and determining, by a trained machine learning model operated by the electronic device, a weight for each adequate quality segment. Furthermore, the method includes applying the weight determined for each adequate quality segment to the voice replay and voice cloning scores calculated for the respective adequate quality segment, calculating a decision score, and comparing the decision score against a threshold value. In response to determining the decision score satisfies the threshold value, determining the captured audio data is genuine.

Inventors

BLOUET, RAPHAEL
BALCIUNAS, Linas
JOVANOVIC, Kosta
MANTECON, ANA
FLOOD, GORDON
Patefield-Smith, Martin

Assignees

Daon Technology

Dates

Publication Date: 20260506
Application Date: 20251028

Claims (11)

A method for enhancing the detection of fraudulent audio data comprising the steps of: capturing, by an electronic device, audio data of a user speaking during an authentication transaction; dividing the audio data into segments; determining a quality control vector for each segment; determining whether each segment is of adequate quality based on the quality control vector for the respective segment; calculating a voice replay score and a voice cloning detection score for each adequate quality segment; determining, by a trained machine learning model operated by the electronic device, a weight for each adequate quality segment; applying the weight determined for each adequate quality segment to the voice replay and voice cloning scores calculated for the respective adequate quality segment and calculating a decision score; comparing the decision score against a threshold value; and in response to determining the decision score satisfies the threshold value, determining the captured audio data is genuine.
The method according to claim 1, further comprising determining the captured audio data is fraudulent in response to determining the decision score fails to satisfy the threshold value.
The method according to claim 1 or 2, further comprising discarding segments of inadequate quality.
The method according to any preceding claim, wherein the segments vary in duration.
The method according to any preceding claim, said step of calculating the decision score comprising combining the determined weights.
An electronic device for enhancing the detection of fraudulent audio data comprising: a processor; and a memory configured to store data, said electronic device being associated with a network and said memory being in communication with said processor and having instructions stored thereon which, when read and executed by said processor, cause said electronic device to: capture audio data of a user speaking during an authentication transaction; divide the audio data into segments; determine a quality control vector for each segment; determine whether each segment is of adequate quality based on the quality control vector for the respective segment; calculate a voice replay score and a voice cloning detection score for each adequate quality segment; determine, by a trained machine learning model operated by said electronic device, a weight for each adequate quality segment; apply the weight determined for each adequate quality segment to the voice replay and voice cloning scores calculated for the respective adequate quality segment and calculate a decision score; compare the decision score against a threshold value; and in response to determining the decision score satisfies the threshold value, determine the captured audio data is genuine.
The electronic device according to claim 6, wherein the instructions when read and executed by said processor, further cause said electronic device to determine the captured audio data is fraudulent when the decision score fails to satisfy the threshold value.
The electronic device according to claim 6 or 7, wherein the instructions when read and executed by said processor, further cause said electronic device to discard segments of inadequate quality.
The electronic device according to any one of claims 6 to 8, wherein the segments vary in duration.
The electronic device according to any one of claims 6 to 9, wherein the instructions when read and executed by said processor, further cause said electronic device to combine the determined weight to calculate the decision score.
A computer program for enhancing the detection of fraudulent audio data, which when executed by a hardware processor is configured to carry out the method of any one of claims 1 to 5.

Description

BACKGROUND OF THE INVENTION This invention relates generally to audio data obtained during authentication transactions, and more particularly, to methods and systems for enhancing the detection of fraudulent audio data. Users are required to prove who they claim to be during authentication transactions conducted under many different circumstances. For example, users may be required to prove their identity when contacting a call center or a merchant while attempting to remotely purchase a product from a merchant system over the Internet. Claims of identity may be proven during authentication transactions based on audio data captured from the user. During authentication transactions based on audio data it is known for users to speak freely or to utter a passphrase. The passphrase can be divided into segments and a local liveness score computed for each segment. It is known to average the local liveness scores to calculate a composite liveness score which is compared against a threshold value to determine whether or not a live user spoke the passphrase and thus if the audio data is fraudulent. However, some of the segments are of better quality than others. Averaging the local liveness scores reduces the impact of the higher quality segments and increases the impact of the lower quality segments on the liveness determination. As a result, the liveness determination results, and thus the fraudulent audio data detection results tend to be less rigorous, accurate and trustworthy than desired. Thus, it would be advantageous and an improvement over the relevant technology to provide a method, an electronic device, and a computer-readable recording medium capable of enhancing the detection of fraudulent audio data. BRIEF DESCRIPTION OF THE INVENTION An aspect of the present disclosure provides a method for enhancing the detection of fraudulent audio data including the steps of capturing, by an electronic device, audio data of a user speaking during an authentication transaction, dividing the audio data into segments, determining a quality control vector for each segment, and determining whether each segment is of adequate quality based on the quality control vector for the respective segment. Moreover, the method includes the steps of calculating a voice replay score and a voice cloning detection score for each adequate quality segment, determining, by a trained machine learning model operated by the electronic device, a weight for each adequate quality segment, and applying the weight determined for each adequate quality segment to the voice replay and voice cloning scores calculated for the respective adequate quality segment. A decision score is calculated and compared against a threshold value. In response to determining the decision score satisfies the threshold value, the method determines the captured audio data is genuine. In an embodiment of the present disclosure, the method further includes determining the captured audio data is fraudulent in response to determining the decision score fails to satisfy the threshold value. In another embodiment of the present disclosure, the method includes discarding segments of inadequate quality. In yet another embodiment of the present disclosure, the segments vary in duration. In yet another embodiment of the present disclosure, the step of calculating the decision score includes combining the determined weights. Another aspect of the present disclosure provides a non-transitory computer-readable recording medium in an electronic device capable of enhancing the detection of fraudulent audio data. The non-transitory computer-readable recording medium stores instructions which when executed by a hardware processor performs the steps of the methods described above. Another aspect of the present disclosure provides an electronic device for enhancing the detection of fraudulent audio data including a processor and a memory configured to store data. The electronic device is associated with a network and the memory is in communication with the processor. The memory has instructions stored thereon, when read and executed by the processor, cause the electronic device to capture audio data of a user speaking during an authentication transaction, divide the audio data into segments, determine a quality control vector for each segment, and determine whether each segment is of adequate quality based on the quality control vector for the respective segment. The instructions which when read and executed by the processor, further cause the electronic device to calculate a voice replay score and a voice cloning detection score for each adequate quality segment, determine, by a trained machine learning model operated by the electronic device, a weight for each adequate quality segment, and apply the weight determined for each adequate quality segment to the voice replay and voice cloning scores calculated for the respective adequate quality segment. Moreover, the instructions which whe