EP-4229625-B1 - SYSTEM AND/OR METHOD FOR SEMANTIC PARSING OF AIR TRAFFIC CONTROL AUDIO

EP4229625B1EP 4229625 B1EP4229625 B1EP 4229625B1EP-4229625-B1

Inventors

PUST, MICHAEL
BONDARYK, JOSEPH
GEORGE, MATTHEW

Dates

Publication Date: 20260513
Application Date: 20211013

Claims (11)

A system (100) for an aircraft for semantic parsing of air traffic control (ATC) utterances, the system (100) comprising: • a communication subsystem (110) onboard the aircraft, the communication subsystem (110) configured to receive an ATC radio signal and transform the ATC radio signal into an audio signal (102); and • a first processing system connected to the communication subsystem (110), comprising: • a speech-to-text module (120) configured to determine an utterance hypothesis (104) from the audio signal (102), wherein the speech-to-text module (120) comprises an integrated automatic speech recognition (ASR) and sentence boundary detection (SBD) module (125), the integrated ASR/SBD module comprising a pretrained neural network tuned for ATC audio; and • a question-and-answer (Q/A) module (130) configured to determine aircraft commands (106) based on the utterance hypothesis (104) using a plurality of natural language queries, the Q/A module (130) being configured to query the pre-trained neural network model according to a structured sequence of the natural language queries to determine the aircraft commands (106), the structured sequence comprising a tree-based sequence with a plurality of dependencies linking one or more of the natural language queries to a determination that the aircraft is an intended recipient of an utterance corresponding to the utterance hypothesis (104).
The system (100) of Claim 1, wherein each utterance hypothesis (104) comprises a boundary hypothesis, wherein the speech-to-text module (120) comprises a sentence boundary detection (SBD) model configured to tag entities within the audio signal (102) and generate the boundary hypothesis based on the tagged entities.
The system (100) of Claim 2, wherein the entities comprise a transition speaker, wherein the SBD model is a neural network pre-trained to identify the transition speaker within multi-utterance ATC audio based on audio artifact annotations.
The system (100) of Claim 1, wherein the speech-to-text module (120) further comprises an ATC-tuned language model, wherein determining the utterance hypothesis (104) comprises: • with the integrated ASR/SBD module (125), generate a plurality of linguistic hypotheses for each utterance; • using the ATC-tuned language model, determining a language score for each the plurality of linguistic hypotheses; and • selecting an utterance hypothesis (104) from the plurality of the linguistic hypotheses based on the corresponding language score.
The system (100) of Claim 4, wherein the integrated ASR/SBD module (125) is configured to assign a phonetic score to each of the plurality of linguistic hypotheses, wherein the utterance hypothesis (104) is selected based on a combination of the corresponding language and phonetic scores.
The system (100) of Claim 1, wherein determining the utterance hypothesis (104) comprises: • with the ASR/SBD module (125) of the speech-to-text module (120), generating a plurality of utterance hypotheses for an utterance within the audio signal (102); • using a language model, to select an utterance hypothesis (104) of the plurality of utterance hypotheses.
The system (100) of Claim 6, wherein the language model comprises the pre-trained neural network, the pre-trained neural network pre-trained using entity tagged ATC transcripts.
The system (100) of Claim 7, wherein the entity tagged ATC transcripts comprises tags corresponding to phonetically conflicting entities.
The system (100) of any preceding Claim, wherein the utterance hypothesis (104) comprises a text transcript.
The system (100) of Claim 1, wherein the utterance hypothesis (104) comprises a speaker identification, wherein the determination that the aircraft is the intended recipient is based on the speaker identification.
The system (100) of Claim 1, wherein each command comprises a command parameter and a set of values corresponding to the command parameter, wherein the command parameter is selected from a predetermined set of command parameters, wherein the set of values and the command parameter are determined via distinct natural language queries of the structured sequence.

Description

TECHNICAL FIELD This invention relates generally to the aviation field, and more specifically to a new and useful semantic parser in the aviation field. BACKGROUND US 5652897 A can be regarded as useful to understand the invention, as it describes a system that segments speech-recognizer output of air-traffic-control commands into individual instructions. Furthermore, HELMKE HARTMUT ET AL: "Machine Learning of Air Traffic Controller Command Extraction Models for Speech Recognition Applications", 2020 AIAA/IEEE 39TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), IEEE, 11 October 2020 (2020-10-11), pages 1-9, describes an automatically learned Command Extraction Model for air-traffic-controller speech that, with as little as six hours of labelled data, raises command-recognition coverage from 60 % to over 94 % (up to 98 %). SUMMARY According to the present disclosure there is provided a system for an aircraft for semantic parsing of air traffic control (ATC) utterances as described in the accompanying claims. BRIEF DESCRIPTION OF THE FIGURES FIGURE 1 is a schematic representation of a variant of the system.FIGURE 2 is a diagrammatic representation of a variant of the method.FIGURE 3 is a diagrammatic representation of a variant of the method.FIGURE 4 is a diagrammatic representation of an example of training an ASR model in a variant of the method.FIGURE 5 is a diagrammatic representation of an example of training a language model in a variant of the method.FIGURE 6 is a diagrammatic representation of an example of training a Question/Answer model in a variant of the method.FIGURE 7 is a schematic representation of an example of the system.FIGURE 8 is a graphical representation of an example of a domain expert evaluation tool in a variant of the method.FIGURE 9 is a diagrammatic representation of a variant of the method.FIGURES 10A-D are diagrammatic representations of a first, second, third, and fourth variant of the system, respectively.FIGURES 11A-C are first, second, and third examples of tree-based query structures, respectively.FIGURE 12 is a diagrammatic representation of a variant of the system and/or method. DESCRIPTION OF THE PREFERRED EMBODIMENTS The following description of the preferred embodiments of the invention is not intended to limit the invention to these preferred embodiments, but rather to enable any person skilled in the art to make and use this invention. 1. Overview. The method, an example of which is shown in FIGURE 2, can include performing inference using the system S200; and can optionally include training the system components S100. The method functions to automatically interpret flight commands from a stream of air traffic control (ATC) radio communications. The method can additionally or alternatively function to train and/or update a natural language processing system based on ATC communications. The performing inference S200 can include: at an aircraft, receiving an audio utterance from air traffic control S210, converting the audio utterance into a predetermined format S215, determining commands using a question-and-answer model S240, and optionally controlling the aircraft based on the commands S250 (example shown in FIGURE 3). The method functions to automatically interpret flight commands from the air traffic control (ATC) stream. The flight commands can be: automatically used to control aircraft flight; presented to a user (e.g., pilot, a remote teleoperator); relayed to an auto-pilot system in response to a user (e.g., pilot) confirmation; and/or otherwise used. In an illustrative example, the method can receive ATC audio stream, convert the ATC audio stream to ATC text, and provide the ATC text (as the reference text) and a predetermined set of queries (each associated with a different flight command parameter) to an ATC-tuned question and answer model (e.g., ATC-tuned BERT), which analyzes an ATC text for the query answers. The query answers (e.g., responses of the question and answer model) can then be used to select follow-up queries and/or fill out a command parameter value, which can be used for direct or indirect aircraft control. The ATC audio stream can be converted to the ATC text using an ATC-tuned integrated sentence boundary detection and automatic speech recognition model (SBD/ASR model) and an ATC-tuned language model, wherein an utterance hypotheses (e.g., a sentence hypothesis, utterance by an individual speaker, etc.) can be selected for inclusion in the ATC text based on the joint score from the SBD/ASR model and the language model. S200 can be performed using a system 100 including a Speech-to-Text module and a question and answer (Q/A) module (e.g., cooperatively forming a semantic parser). The system functions to interpret audio air traffic control (ATC) audio into flight commands, and can optionally control the aircraft based on the set of flight commands. The system 100 is preferably mounted to, installed on, integrated into, and/or configured to op