CN-122024722-A - Full duplex dialogue interruption and restoration method for vehicle-mounted voice interaction

CN122024722ACN 122024722 ACN122024722 ACN 122024722ACN-122024722-A

Abstract

The invention discloses a vehicle-mounted full duplex voice conversation interruption and restoration method, and belongs to the technical field of vehicle-mounted voice interaction. The method comprises the steps of responding to a user interrupt instruction meeting preset conditions during conversation, triggering a conversation interrupt event, generating semantic mark information corresponding to the interrupt event, generating a context snapshot containing conversation history and execution states based on the semantic mark information, searching the context snapshot according to the semantic mark information when a conversation recovery request is received, executing context reconstruction based on the searched context snapshot, and controlling a conversation management state machine to perform state transition according to the context reconstruction result so as to recover execution from a conversation node when the interrupt event occurs. The invention realizes the preservation and seamless recovery of the dialogue state and solves the problem of context loss and logic confusion recovery caused by interruption in the prior art.

Inventors

XIA YUN

Assignees

东风汽车集团股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260205

Claims (10)

1. The full duplex dialogue interruption and restoration method for vehicle-mounted voice interaction is characterized by comprising the following steps of: In response to detecting a user interrupt instruction meeting a preset condition during the conversation, triggering a conversation interrupt event; Generating semantic mark information corresponding to the interrupt event, and generating a context snapshot comprising a dialogue history and an execution state based on the semantic mark information; When a dialogue restoration request is received, retrieving the context snapshot according to the semantic mark information, and performing context reconstruction based on the retrieved context snapshot; and controlling a dialogue management state machine to perform state transition according to the context reconstruction result so as to resume execution from the dialogue node when the interrupt event occurs.
2. The method of claim 1, wherein the preset conditions include at least one of the following conditions: The detected voice command is successfully matched with keywords in a preset interrupt keyword library; the detected speech signal energy exceeds a preset energy threshold and the duration reaches a preset time threshold.
3. The method of claim 1, wherein the semantic tag information includes a session ID, a break point ID, a dialog turn, a semantic role, an intention type, and a break state; The session ID is used for uniquely identifying a complete dialogue session, the break point ID is generated by combining the session ID and dialogue turns, the break point ID is used for uniquely identifying a break event in the session, the semantic role is used for distinguishing that a speaking party at the break moment is a user or a system, the intention type is used for identifying a core task category of the broken dialogue, and the break state is used for representing an operation stage which is being executed by the system at the break moment.
4. A method according to claim 3, wherein the contextual snapshot comprises the following content determined by the semantic tag information: a session identifier corresponding to the session ID; A break point identifier corresponding to the break point ID; a last N rounds of conversation history record corresponding to the rounds of conversation, wherein each round of record includes a conversation role, conversation content, and corresponding intent; the current task information corresponding to the intention type and the interrupt state comprises a task identifier, a task state and an execution progress; User information corresponding to a current user; and the speaker identification corresponding to the semantic role is used for identifying that the speaker at the interruption moment is a user or a system.
5. The method of claim 4, wherein the session recover requests include active and passive recover requests; the active recovery request is actively triggered by a user through a voice command or touch operation, the passive recovery request is automatically triggered by the system when a preset recovery condition is detected, and the preset recovery condition comprises system restart completion, network connection recovery or application restart.
6. The method according to claim 4, wherein the context reconstruction comprises a semantic completion process, in particular comprising: Positioning to a corresponding break point according to the session identifier and the break point identifier in the context snapshot; Restoring the conversation context before interruption according to the conversation history record in the context snapshot; Restoring the task execution state and the execution progress according to the current task information in the context snapshot; determining a current speaking party according to the semantic roles in the context snapshot; based on the recovered dialog context, task execution state, and current speaker, reconstructing the state of a dialog management state machine including a wait for user input state, task execution state, and TTS broadcast state.
7. The method according to claim 6, wherein the semantic completion process further comprises fault tolerant processing for ensuring continued execution of the reconstruction process in the following abnormal scenario, specifically comprising: when the dialogue history record in the context snapshot is missing or damaged, re-extracting the latest N rounds of dialogue records from a dialogue history storage module according to the dialogue rounds in the semantic mark information; when the current task information in the context snapshot is missing or damaged, the current task information is re-acquired from a task management module according to the intention type and the interrupt state in the semantic mark information; When the semantic roles in the context snapshot are missing or damaged, the current speaking party is redetermined according to the semantic roles in the semantic mark information; And when the user information in the context snapshot is missing or damaged, re-acquiring the user identity information according to the current user login state.
8. The method according to claim 6, wherein controlling the session management state machine to perform state transitions to resume execution from the session node at the time of the interrupt event based on the result of the context reconstruction, in particular comprises: Determining the execution stage of the current task according to the task execution state and execution progress recovered in the context reconstruction; Determining an initial state of state transition according to the current speaking party determined in the context reconstruction; determining a target state of state transition according to the dialogue context restored in the context reconstruction and the current task information; based on the initial state and the target state, driving a dialogue management state machine to perform state transition, wherein the state transition comprises the following five core state transitions: The method comprises the steps of transferring from a waiting user input state to a task execution state for continuously executing an interrupted task; the task execution state is shifted to a state waiting for user input and is used for waiting for user confirmation or supplementary information; the task execution state is migrated to the TTS broadcasting state, and the TTS broadcasting state is used for broadcasting the task execution result to the user; migrating from the TTS broadcasting state to a state waiting for user input for waiting for user response; Migrating from the TTS broadcasting state to a task execution state for continuing to execute the subsequent task; after the state transition is completed, the dialogue management state machine resumes execution from the dialogue node when the interrupt event occurs.
9. The method according to claim 1, characterized in that for weak network environments, the method further comprises the following optimization measures: When the context snapshot is generated, a local cache priority strategy is adopted, the context snapshot is stored in a local storage module of the vehicle and machine in a priority mode, and asynchronous backup is carried out to cloud storage when network connection is restored; in the context reconstruction process, if network connection abnormality is detected, reading the context snapshot from the vehicle-to-machine local storage module, and executing the semantic completion process by adopting an offline semantic analysis engine; After the recovery execution, if the network connection recovery is detected, automatically synchronizing the dialogue history record and the task execution state to a cloud storage, and updating the state of the dialogue management state machine.
10. The method of claim 1, wherein for a multi-user scenario, the method further comprises the following features: when the semantic mark information is generated, acquiring a user identifier of a current user through a voiceprint recognition module, and taking the user identifier as user information in the semantic mark information; in the context snapshot, recording personalized dialogue preference and task execution history corresponding to the user identification; in the context reconstruction process, corresponding dialogue strategies and task execution parameters are loaded from a personalized configuration database according to the user identification; After resuming execution, updating the personalized dialog preference and task execution history according to the user identification, and adjusting a response strategy of the dialog management state machine based on the personalized configuration.

Description

Full duplex dialogue interruption and restoration method for vehicle-mounted voice interaction Technical Field The invention relates to the technical field of vehicle-mounted voice interaction, in particular to a vehicle-mounted full duplex voice conversation interruption and restoration method and system, which are particularly suitable for application scenes such as intelligent cabins, vehicle-mounted voice assistants and the like. Background With the rapid development of intelligent cabin technology, vehicle-mounted voice interaction has become a core mode of human-vehicle interaction. The full duplex voice conversation technology allows users to interrupt at any time in the system broadcasting process, and the naturalness and efficiency of interaction are improved. However, the prior art has the following problems in practical application: the problem of context loss is that when a user interrupts system broadcasting, the existing scheme often cannot completely save the dialogue state at the interrupt moment, so that semantic analysis and task planning are required to be carried out again after recovery, and the user experience is discontinuous. The recovery logic is disordered, the prior art lacks a standardized recovery mechanism, and recovery strategies under different scenes are inconsistent, so that state jump errors or repeated execution are easily caused. Disclosure of Invention In view of the technical defects and technical drawbacks existing in the prior art, the embodiment of the invention provides a vehicle-mounted full duplex voice conversation interruption and restoration method and system for overcoming the problems or at least partially solving the problems, and the specific scheme is as follows; a full duplex dialogue interruption and restoration method for vehicle-mounted voice interaction comprises the following steps: In response to detecting a user interrupt instruction meeting a preset condition during the conversation, triggering a conversation interrupt event; Generating semantic mark information corresponding to the interrupt event, and generating a context snapshot comprising a dialogue history and an execution state based on the semantic mark information; When a dialogue restoration request is received, retrieving the context snapshot according to the semantic mark information, and performing context reconstruction based on the retrieved context snapshot; and controlling a dialogue management state machine to perform state transition according to the context reconstruction result so as to resume execution from the dialogue node when the interrupt event occurs. In some embodiments, the preset conditions include at least one of the following conditions: The detected voice command is successfully matched with keywords in a preset interrupt keyword library; the detected speech signal energy exceeds a preset energy threshold and the duration reaches a preset time threshold. In some embodiments, the semantic tag information includes a session ID, a break point ID, a dialog turn, a semantic role, an intent type, and a break state; The session ID is used for uniquely identifying a complete dialogue session, the break point ID is generated by combining the session ID and dialogue turns, the break point ID is used for uniquely identifying a break event in the session, the semantic role is used for distinguishing that a speaking party at the break moment is a user or a system, the intention type is used for identifying a core task category of the broken dialogue, and the break state is used for representing an operation stage which is being executed by the system at the break moment. In some embodiments, the contextual snapshot includes the following content determined by the semantic tag information: a session identifier corresponding to the session ID; A break point identifier corresponding to the break point ID; a last N rounds of conversation history record corresponding to the rounds of conversation, wherein each round of record includes a conversation role, conversation content, and corresponding intent; the current task information corresponding to the intention type and the interrupt state comprises a task identifier, a task state and an execution progress; User information corresponding to a current user; and the speaker identification corresponding to the semantic role is used for identifying that the speaker at the interruption moment is a user or a system. In some embodiments, the session recovery requests include active recovery requests and passive recovery requests; the active recovery request is actively triggered by a user through a voice command or touch operation, the passive recovery request is automatically triggered by the system when a preset recovery condition is detected, and the preset recovery condition comprises system restart completion, network connection recovery or application restart. In some embodiments, the context reconstruction includes a semantic comp