CN-121998114-A - Quantum self-encoder model, video prediction method and related device

CN121998114ACN 121998114 ACN121998114 ACN 121998114ACN-121998114-A

Abstract

The application discloses a quantum self-encoder model, a video prediction method and a related device, which belong to the technical field of quantum computation, wherein the quantum self-encoder model comprises an encoder, a middle module and a decoder which are sequentially connected in series, the encoder, the middle module and the decoder all comprise different variable component sub-lines, the encoder is used for extracting characteristics of an input target image frame, the target image frame is an image frame corresponding to the moment before the target moment, the middle module is used for generating an initial predicted image frame by utilizing the characteristics extracted by the encoder and the relation between adjacent image frames constructed during the prediction of the moment before the target moment, and the decoder is used for reconstructing the initial predicted image frame to obtain the target predicted image frame at the target moment. The embodiment of the application aims to improve the accuracy of video prediction.

Inventors

ZHAO YONGJIE

Assignees

本源天工(郑州)量子科技有限公司

Dates

Publication Date: 20260508
Application Date: 20241101

Claims (12)

1. A quantum self-encoder model, characterized in that the quantum self-encoder model comprises an encoder, an intermediate module and a decoder which are sequentially connected in series, wherein the encoder, the intermediate module and the decoder all comprise different variable component sub-circuits; the encoder is used for extracting the characteristics of an input target image frame, wherein the target image frame is an image frame corresponding to the moment before the target moment; The intermediate module is used for generating an initial predicted image frame by utilizing the relation between the characteristics extracted by the encoder and the adjacent image frames constructed when predicting the moment before the target moment; The decoder is used for reconstructing the initial predicted image frame to obtain a target predicted image frame at a target moment.
2. The model of claim 1, wherein the target data is a target predicted image frame obtained by predicting a time immediately before a target time when both the target time and a time immediately before the target time are future times; When the previous moment of the target moment is not the future moment, the target data is the image frame actually generated at the moment before the target moment.
3. The model of claim 1 or 2, wherein the encoder comprises a first convolutional neural network CNN layer, a variable component sub-network layer, and a second CNN layer, in series in that order; the variable component sub-network comprises a first quantum circuit and a second quantum circuit, and the circuit structures of the first quantum circuit and the second quantum circuit are different; And the second CNN layer extracts the characteristics of fusion data, wherein the fusion data is obtained by fusing data output by the first quantum circuit and the second quantum circuit.
4. A model as claimed in claim 3, wherein the first quantum circuit comprises a coding layer consisting of an H gate and a first RZ gate acting on each qubit, a variation layer consisting of a controlled module; Each controlled module sequentially comprises a CNOT gate, a second RZ gate, a first RX gate and a CNOT gate, wherein the number of the controlled modules is n (n-1)/2, and n is the number of quantum bits in the first quantum circuit.
5. The model of claim 3, wherein the second quantum circuit comprises, in turn, an H gate, a RY gate, a second RX gate, a Y gate, a Z gate, and controlled RZZ and CNOT gates for each qubit.
6. The model of claim 3 wherein the decoder comprises a first deconvolution neural network layer, a first quantum wire, and a second deconvolution neural network layer, wherein, An output end of the first deconvolution neural network layer is connected with an input end of the first quantum circuit; The other output end of the first deconvolution neural network layer is connected with the output end of the first quantum circuit through a residual structure and the input end of the second deconvolution neural network layer.
7. The model of claim 1, wherein the intermediate module comprises M third quantum wires and N fourth quantum wires; The M third quantum circuits are configured to extract time features of first input data and first feature data, where the first input data includes features extracted by the encoder and target time feature data, the target time feature data is time feature data obtained by predicting based on target data at a previous time, and the first feature data is feature data extracted by using the M quantum circuits and used for being input to a fourth quantum circuit; The N fourth quantum circuits are configured to extract spatial features of second input data and second feature data, where the second input data includes first feature data and target spatial feature data, the target spatial feature data is spatial feature data obtained by predicting based on target data at a previous time, and the second feature data is feature data for generating an initial predicted image frame; the relationship between the adjacent image frames includes the target temporal feature data and the target spatial feature data.
8. The model of claim 7, wherein the third quantum wire and the fourth quantum wire are identical in structure, each comprising a coding layer consisting of an H gate and a third RZ gate acting on each qubit and a variable layer consisting of a CNOT gate, a U3 gate and a third RX gate.
9. A method of video prediction, the method comprising: Obtaining a target image frame, wherein the target image frame is an image frame corresponding to the moment before the target moment; Inputting the target image frame into a trained quantum self-encoder model according to any of claims 1-8, obtaining a target predicted image frame.
10. A video prediction apparatus, the apparatus comprising: the first obtaining module is used for obtaining a target image frame, wherein the target image frame is an image frame corresponding to the moment before the target moment; A second obtaining module, configured to input the target image frame into the quantum self-encoder model according to any one of claims 1-8 after training, and obtain a target predicted image frame.
11. A computer device comprising a memory storing a computer program and a processor implementing the quantum self-encoder model of any of claims 1-8 or the video prediction method of claim 9 when the computer program is executed.
12. A computer readable storage medium having stored thereon a computer program which, when executed by a computer, causes the computer to perform the quantum self-encoder model of any of claims 1-8 or the video prediction method of claim 9.

Description

Quantum self-encoder model, video prediction method and related device Technical Field The application belongs to the technical field of quantum computing, and particularly relates to a quantum self-encoder model, a video prediction method and a related device. Background Video prediction techniques require analysis of temporal dependencies in a video sequence, understanding the relationship between successive frames, and predicting future frames. With the development of automation technology, video prediction can be used as a part of an intelligent system, such as an automatic driving car, robot navigation and the like, so as to realize more intelligent decision and response. In the field of safety monitoring, the video prediction technology can early warn potential risks and threats, such as early recognition of suspicious behaviors or prediction of traffic accidents, so that safety is improved. In addition, in the entertainment and gaming industries, video prediction may provide a more immersive and interactive experience, such as adjusting the gaming environment in real-time by predicting user actions. Video data contains rich information, and video prediction techniques can help analyze and understand the data, providing support for decision making. The existing video prediction model is difficult to accurately capture dynamic changes and details due to limited computational effort of classical calculation, so that a prediction result has deviation from an actually occurring condition, and the prediction accuracy is reduced. Disclosure of Invention The application aims to provide a quantum self-encoder model, a video prediction method and a related device, and aims to improve prediction accuracy. One embodiment of the present application provides a quantum self-encoder model comprising an encoder, an intermediate module, and a decoder in series, each comprising a different variable sub-network layer, The encoder is used for extracting the characteristics of an input target image frame, wherein the target image frame is an image frame corresponding to the moment before the target moment; The intermediate module is used for generating an initial predicted image frame by utilizing the relation between the characteristics extracted by the encoder and the adjacent image frames constructed when predicting the moment before the target moment; The decoder is used for reconstructing the initial predicted image frame to obtain a target predicted image frame at a target moment. Optionally, when the target time and the time before the target time are both future times, the target data is a target predicted image frame obtained at the time before the target time; When the previous moment of the target moment is not the future moment, the target data is the image frame actually generated at the moment before the target moment. Optionally, the encoder comprises a first convolutional neural network CNN layer, a variable component sub-network layer and a second CNN layer which are sequentially connected in series; the variable component sub-network comprises a first quantum circuit and a second quantum circuit, and the circuit structures of the first quantum circuit and the second quantum circuit are different; And the second CNN layer extracts the characteristics of fusion data, wherein the fusion data is obtained by fusing data output by the first quantum circuit and the second quantum circuit. Optionally, the first quantum circuit includes a coding layer consisting of an H gate and a first RZ gate acting on each qubit, a variable layer consisting of a controlled module; Each controlled module sequentially comprises a CNOT gate, a second RZ gate, a first RX gate and a CNOT gate, wherein the number of the controlled modules is n (n-1)/2, and n is the number of quantum bits in the first quantum circuit. Optionally, the second quantum circuit comprises an H gate, an RY gate, a second RX gate, a Y gate, a Z gate, and controlled RZZ and CNOT gates acting on each qubit in turn. Optionally, the decoder comprises a first deconvolution neural network layer, a first quantum wire, and a second deconvolution neural network layer, wherein, An output end of the first deconvolution neural network layer is connected with an input end of the first quantum circuit; The other output end of the first deconvolution neural network layer is connected with the output end of the first quantum circuit through a residual structure and the input end of the second deconvolution neural network layer. Optionally, the intermediate module includes M third quantum wires and N fourth quantum wires; The M third quantum circuits are configured to extract time features of first input data and first feature data, where the first input data includes features extracted by the encoder and target time feature data, the target time feature data is time feature data obtained by predicting based on target data at a previous time, and the first feature data is fe