Search

US-20260129119-A1 - METHOD AND SYSTEM FOR DETECTION OF A DEEPFAKE WITHIN AN ELECTRONIC AUDIO STREAM VIA AN INTEGRATED SECURE FRAMEWORK ENVIRONMENT

US20260129119A1US 20260129119 A1US20260129119 A1US 20260129119A1US-20260129119-A1

Abstract

A method and system for detection of a deepfake within an electronic audio stream by an integrated secure framework environment may be provided. The method may include receiving and routing the electronic audio stream to a microservice communications platform and a session border controller (SBC) agent. The method may also include generating a first fork and a second fork of the electronic audio stream for transmission to a proxy interactive voice response (IVR) platform and creating an enhanced electronic audio stream. The method may also include generating an integrated multi-operation platform and transmitting the enhanced electronic audio stream to the integrated multi-operation platform. The method may also include generating at least one replica of the enhanced electronic audio stream to each of at least one downstream machine learning (ML) environment and performing the detection of the deepfake for the at least one replica by a ML model.

Inventors

  • Rohit Nilekar
  • Mohammed Ahamed Mohiseen
  • Sagrika KHATIWALA

Assignees

  • JPMORGAN CHASE BANK, N.A.

Dates

Publication Date
20260507
Application Date
20241223
Priority Date
20241104

Claims (20)

  1. 1 . A method for detection of a deepfake within an electronic audio stream by an integrated secure framework environment, the method being implemented by at least one processor, the method comprising: receiving the electronic audio stream from a user; routing the electronic audio stream to a microservice communications platform and a session border controller (SBC) agent and generating a secured call audio of the electronic audio stream from the SBC agent; generating a first fork audio stream of the electronic audio stream from the microservice communications platform and a second fork audio stream of the electronic audio stream from the SBC agent; transmitting the first fork audio stream and the second fork audio stream to a proxy interactive voice response (IVR) platform; attaching business logic key-value pairs to the first fork audio stream and the second fork audio stream at the proxy IVR platform to create an enhanced electronic audio stream; generating an integrated multi-operation platform comprising an encryption protocol standard framework and a remote procedural call (RPC) framework; transmitting the enhanced electronic audio stream and the secured call audio to the integrated multi-operation platform; generating at least one replica of the enhanced electronic audio stream at the integrated multi-operation platform for transmission to each of at least one downstream machine learning (ML) environment; and performing the detection of the deepfake for the at least one replica by a ML model operating in the at least one downstream ML environment.
  2. 2 . The method of claim 1 , wherein the encryption protocol standard framework comprises a session initiation protocol recording (SIPREC) framework; and wherein the at least one downstream ML environment comprises a first ML environment configured to perform the detection of the deepfake, a first redaction, and a first transcription of the at least one replica and a second ML environment configured to perform a second redaction and a second transcription of the at least one replica.
  3. 3 . The method of claim 2 , further comprising: receiving the at least one replica at a remote conferencing platform in the first ML environment; and performing dual operations on the at least one replica; wherein a first operation of the dual operations comprises performing the deepfake detection of the at least one replica by a ML model; and wherein a second operation of the dual operations comprises: performing the first redaction of the at least one replica that generates a first redacted version of the at least one replica, and performing the first transcribing of the first redacted version for storage on a cloud storage platform.
  4. 4 . The method of claim 2 , further comprising: receiving the at least one replica at a voice transcription handler in the second ML environment; performing the second redaction of the at least one replica that generates a second redacted version of the at least one replica; and performing the second transcribing of the second redacted version for storage on a cloud storage platform.
  5. 5 . The method of claim 2 , wherein the transmitting the enhanced electronic audio stream and the secured call audio to the integrated multi-operation platform comprises: transmitting the enhanced electronic audio stream to the SIPREC framework; and transmitting the secured call audio to the RPC framework.
  6. 6 . The method of claim 2 , wherein the generating the secured call audio comprises securing the electronic audio stream based on a secure real-time transport protocol (SRTP) that provides security protections to the electronic audio stream; and wherein the security protections comprise at least one from among validation, authentication, encryption, and replay protection of the electronic audio stream.
  7. 7 . The method of claim 6 , wherein the generating the at least one replica further comprises: generating a first metadata from the at least one replica via the SIPREC framework for input into the first ML environment.
  8. 8 . The method of claim 6 , wherein the generating the at least one replica further comprises: converting the secured call audio with a first format comprising the SRTP to a second format with a RPC protocol via the RPC framework; and transmitting the converted secured call audio to the first ML environment.
  9. 9 . The method of claim 6 , wherein the generating the at least one replica further comprises: generating an unredacted call audio of the secured call audio and a second metadata of the unredacted call audio via the RPC framework for input into the second ML environment.
  10. 10 . The method of claim 1 , further comprising: generating a control metadata via the encryption protocol standard framework for input into the RPC framework.
  11. 11 . A computing apparatus for detection of a deepfake within of an electronic audio stream by an integrated secure framework environment, comprising: a processor; a memory; a display; and a communication interface coupled to each of the processor, the memory, and the display, wherein the processor is configured to implement the integrated secure framework environment to: receive the electronic audio stream from a user; route the electronic audio stream to a microservice communications platform and a session border controller (SBC) agent and generating a secured call audio of the electronic audio stream from the SBC agent; generate a first fork audio stream of the electronic audio stream from the microservice communications platform and a second fork audio stream of the electronic audio stream from the SBC agent; transmit the first fork audio stream and the second fork audio stream to a proxy interactive voice response (IVR) platform; attach business logic key-value pairs to the first fork audio stream and the second fork audio stream at the proxy IVR platform to create an enhanced electronic audio stream; generate an integrated multi-operation platform comprising an encryption protocol standard framework and a remote procedural call (RPC) framework; transmit the enhanced electronic audio stream and the secured call audio to the integrated multi-operation platform; generate at least one replica of the enhanced electronic audio stream at the integrated multi-operation platform for transmission to each of at least one downstream machine learning (ML) environment; and perform the detection of the deepfake for the at least one replica by a ML model operating in the at least one downstream ML environment.
  12. 12 . The computing apparatus of claim 11 , wherein the encryption protocol standard framework comprises a session initiation protocol recording (SIPREC) framework; and wherein the at least one downstream ML environment comprises a first ML environment configured to perform the detection of the deepfake, a first redaction, and a first transcription of the at least one replica and a second ML environment configured to perform a second redaction and a second transcription of the at least one replica.
  13. 13 . The computing apparatus of claim 12 , wherein the processor is further configured to implement the integrated secure framework environment to: receive the at least one replica at a remote conferencing platform in the first ML environment; and perform dual operations on the at least one replica; wherein the processor performs a first operation of the dual operations by performing the deepfake detection of the at least one replica by a ML model; and wherein the processor performs a second operation of the dual operations by: performing the first redaction of the at least one replica that generates a first redacted version of the at least one replica, and performing the first transcribing of the first redacted version for storage on a cloud storage platform.
  14. 14 . The computing apparatus of claim 12 , wherein the processor is further configured to implement the integrated secure framework environment to: receive the at least one replica at a voice transcription handler in the second ML environment; perform the second redaction of the at least one replica that generates a second redacted version of the at least one replica; and perform the second transcribing of the second redacted version for storage on a cloud storage platform.
  15. 15 . The computing apparatus of claim 12 , wherein the processor is further configured to implement the integrated secure framework environment to: transmit the enhanced electronic audio stream and the secured call audio to the integrated multi-operation platform by: transmitting the enhanced electronic audio stream to the SIPREC framework, and transmitting the secured call audio to the RPC framework; and generate the secured call audio by securing the electronic audio stream based on a secure real-time transport protocol (SRTP) that provides security protections to the electronic audio stream, wherein the security protections comprise at least one from among validation, authentication, encryption, and replay protection of the electronic audio stream.
  16. 16 . The computing apparatus of claim 12 , wherein the processor is further configured to implement the integrated secure framework environment to: generate the at least one replica further by: generating a first metadata from the at least one replica via the SIPREC framework for input into the first ML environment; converting the secured call audio with a first format comprising the SRTP to a second format with a RPC protocol via the RPC framework; transmitting the converted secured call audio to the first ML environment; and generating an unredacted call audio of the secured call audio and a second metadata of the unredacted call audio via the RPC framework for input into the second ML environment.
  17. 17 . A non-transitory computer readable storage medium storing instructions for detection of a deepfake within an electronic audio stream by an integrated secure framework environment, the non-transitory computer readable storage medium comprising executable code which, when executed by a processor, causes the processor to implement the integrated secure framework environment to: receive the electronic audio stream from a user; route the electronic audio stream to a microservice communications platform and a session border controller (SBC) agent and generating a secured call audio of the electronic audio stream from the SBC agent; generate a first fork audio stream of the electronic audio stream from the microservice communications platform and a second fork audio stream of the electronic audio stream from the SBC agent; transmit the first fork audio stream and the second fork audio stream to a proxy interactive voice response (IVR) platform; attach business logic key-value pairs to the first fork audio stream and the second fork audio stream at the proxy IVR platform to create an enhanced electronic audio stream; generate an integrated multi-operation platform comprising an encryption protocol standard framework and a remote procedural call (RPC) framework; transmit the enhanced electronic audio stream and the secured call audio to the integrated multi-operation platform; generate at least one replica of the enhanced electronic audio stream at the integrated multi-operation platform for transmission to each of at least one downstream machine learning (ML) environment; and perform the detection of the deepfake for the at least one replica by a ML model operating in the at least one downstream ML environment.
  18. 18 . The non-transitory computer readable storage medium of claim 17 , wherein the encryption protocol standard framework comprises a session initiation protocol recording (SIPREC) framework; and wherein the at least one downstream ML environment comprises a first ML environment configured to perform the detection of the deepfake, a first redaction, and a first transcription of the at least one replica and a second ML environment configured to perform a second redaction and a second transcription of the at least one replica.
  19. 19 . The non-transitory computer readable storage medium of claim 18 , wherein the executable code further causes the processor to implement the integrated secure framework environment to: receive the at least one replica at a remote conferencing platform in the first ML environment; and perform dual operations on the at least one replica; wherein the processor performs a first operation of the dual operations by performing the deepfake detection of the at least one replica by a ML model; and wherein the processor performs a second operation of the dual operations by: performing the first redaction of the at least one replica that generates a first redacted version of the at least one replica, and performing the first transcribing of the first redacted version for storage on a cloud storage platform.
  20. 20 . The non-transitory computer readable storage medium of claim 18 , wherein the executable code further causes the processor to implement the integrated secure framework environment to: receive the at least one replica at a voice transcription handler in the second ML environment; perform the second redaction of the at least one replica that generates a second redacted version of the at least one replica; and perform the second transcribing of the second redacted version for storage on a cloud storage platform.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority benefit from Indian Application No. 202411084052, filed Nov. 4, 2024 in the India Patent Office, which is hereby incorporated by reference in its entirety. FIELD OF DISCLOSURE This technology generally relates to methods and systems for detection of a deepfake within an electronic audio stream via an integrated secure framework environment. BACKGROUND INFORMATION The prevalence of artificial intelligence (AI)/machine learning (ML) programs and tools makes it exceedingly easy to impersonate the audio or voice of a person. That is, creating a deepfake of a person's voice. Deepfakes are highly problematic when such impersonations are often used for nefarious purposes, e.g., in scams, frauds, misinformation campaigns, etc. Indeed, for financial institutions, the implications of deepfakes can have a significant impact of on the person whose audio or voice was impersonated. Consider, for example, a customer whose audio or voice has been impersonated using an AI/ML programs or tools. That is, a deepfake of the customer's audio or voice. A fraudster can then use this deepfake to contact the financial institution to gain access to the customer's financial information and accounts to steal the customer's money. Additionally, the financial institution can also be impacted by being subjected to lawsuits and regulatory violations. Thus, the consequences for the customer and the financial institution can be dire. Given the increasing prevalence of AI/ML programs and tools capable of performing deepfakes and the ease with which such deepfakes can be generated, there is a heightened need to detect deepfakes. While there may be models in the status quo that may provide individual services or applications relating to detecting deepfakes, the status quo does not provide a manner in which these models may be integrated with a framework or platform that is presently used by an enterprise, e.g., the financial institution, in handling real-time audio or voice from a user/customer. Therefore, to protect the customers and also the financial institutions, a platform associated with call communications capable of detecting deepfakes in real-time for real-time audio or voice of a customer in order to distinguish audio or voice of the customer versus that of a deepfake. Accordingly, there is a need for techniques to detect a deepfake of audio streams in a secure environment. SUMMARY The present disclosure, through one or more of its various aspects, embodiments, and/or specific features or sub-components, provides, inter alia, various systems, servers, devices, methods, media, programs, and platforms for detection of a deepfake within an electronic audio stream. According to an aspect of the present disclosure, a method for detection of a deepfake within an electronic audio stream by an integrated secure framework environment may be provided. The method may be implemented by at least one processor. The method may include receiving the electronic audio stream from a user and routing the electronic audio stream to a microservice communications platform and a session border controller (SBC) agent, and generating a secured call audio of the electronic audio stream from the SBC agent. The method may also include generating a first fork audio stream of the electronic audio stream from the microservice communications platform and a second fork audio stream of the electronic audio stream from the SBC agent. The method may also include transmitting the first fork audio stream and the second fork audio stream to a proxy interactive voice response (IVR) platform, and attaching business logic key-value pairs to the first fork audio stream and the second fork audio stream at the proxy IVR platform to create an enhanced electronic audio stream. The method may also include generating an integrated multi-operation platform comprising an encryption protocol standard framework and a remote procedural call (RPC) framework, and transmitting the enhanced electronic audio stream and the secured call audio to the integrated multi-operation platform. The method may also include generating at least one replica of the enhanced electronic audio stream at the integrated multi-operation platform for transmission to each of at least one downstream machine learning (ML) environment. The method may also include performing the detection of the deepfake for the at least one replica by a ML model operating in the at least one downstream ML environment. The encryption protocol standard framework may include a session initiation protocol recording (SIPREC) framework. The at least one downstream ML environment may include a first ML environment configured to perform the detection of the deepfake, a first redaction, and a first transcription of the at least one replica and a second ML environment configured to perform a second redaction and a second transcription of the at least one replica. The method