CN-121259674-B - Virtual reality photography composition optimization method and system based on layered reinforcement learning

CN121259674BCN 121259674 BCN121259674 BCN 121259674BCN-121259674-B

Abstract

The invention discloses a virtual reality photography composition optimization method and system based on layered reinforcement learning, which comprises the steps of acquiring image frames acquired by a virtual camera in a virtual environment in real time, inputting the image frames into a pre-trained layered reinforcement learning model, wherein the layered reinforcement learning model comprises a high-level decision module and a low-level execution module, the high-level decision module extracts and abstracts high-level features of the image frames to generate high-level guide information, the low-level execution module extracts low-level features of the image frames, generates a motion control instruction for controlling the virtual camera by combining the high-level guide information, and sends the motion control instruction back to the virtual environment to drive the virtual camera to move so as to realize real-time optimization of photography composition. The invention can integrate aesthetic evaluation, layered decision and natural interaction in a real virtual engine environment, and realize an efficient, intelligent and real-time photographic composition optimization solution.

Inventors

WU BOLUN
Zhang Liaoruxing
JIANG SHURONG
JIN XIN

Assignees

北京电子科技学院

Dates

Publication Date: 20260508
Application Date: 20251027

Claims (7)

1. The virtual reality photography composition optimizing method based on layered reinforcement learning is characterized by comprising the following steps of: S1, acquiring image frames acquired in real time by a virtual camera in a virtual environment; S2, inputting the image frames into a pre-trained hierarchical reinforcement learning model, wherein the hierarchical reinforcement learning model comprises a high-level decision module and a low-level execution module; S3, the high-level decision module performs high-level feature extraction and abstraction on the image frame to generate high-level guide information, wherein the high-level guide information comprises: extracting features of the image frames by using a feature extraction network to obtain a first feature representation; channel attention weight distribution is carried out on the first characteristic representation so as to carry out channel recalibration, and a first optimized characteristic is obtained; generating a characteristic channel modulation parameter and a state sub-target vector of the virtual camera based on the first optimized characteristic, wherein the characteristic channel modulation parameter is used for carrying out conditional modulation on a channel layer surface of a low-layer characteristic, and the state sub-target vector is used as a conditional input of a low-layer execution module to guide the low-layer execution module to generate action output conforming to a sub-target direction; S4, the low-level execution module extracts low-level characteristics of the image frames and generates a motion control instruction for controlling the virtual camera by combining the high-level instruction information, wherein the motion control instruction comprises the following steps: Extracting features of the image frames by using a feature extraction network to obtain a second feature representation; Performing channel attention weight distribution on the second characteristic representation to perform channel recalibration to obtain a second optimized characteristic; Carrying out FiLM modulation on the second optimized feature by utilizing the feature channel modulation parameter to obtain a conditional feature; Inputting an LSTM unit based on the conditional characteristics, reserving the hidden state of the last time step, and outputting the motion control instruction through smoothing processing; and S5, sending the motion control instruction back to the virtual environment to drive the virtual camera to move, so that real-time optimization of the photographing composition is realized.
2. The method for optimizing a virtual reality photography composition based on hierarchical reinforcement learning as claimed in claim 1, wherein the step of inputting the characteristic channel modulation parameters in S3 comprises inputting the first optimized characteristic to a multi-layer perceptron to generate the characteristic channel modulation parameters in parallel, wherein the characteristic channel modulation parameters comprise a scaling factor and a bias term for performing linear transformation modulation on each channel of the low-layer characteristic.
3. The method for optimizing virtual reality photography composition based on hierarchical reinforcement learning according to claim 1, wherein the motion control instruction of the virtual camera is a continuous six-degree-of-freedom camera motion vector, and the continuous six-degree-of-freedom camera motion vector corresponds to an instantaneous motion speed or displacement of the virtual camera in six degrees of freedom.
4. The method for optimizing virtual reality photography composition based on hierarchical reinforcement learning as claimed in claim 1, wherein the hierarchical reinforcement learning model is trained by maximizing a preset composite rewarding function, and the composite rewarding function at least comprises: an aesthetic prize item having a value that is positively correlated with the aesthetic evaluation score of the image frame; An action smoothing term whose value is inversely related to the magnitude of the motion control instruction; A dynamic speed adjustment term configured to adaptively adjust a punishment strategy for the magnitude of the motion control instruction according to a current aesthetic evaluation score.
5. The method for optimizing virtual reality photo composition based on hierarchical reinforcement learning of claim 4, wherein the dynamic speed adjustment term is configured to: Wherein, the Is a scale factor positively correlated with the aesthetic evaluation score of the current frame, and E [0,1], delta is a weight coefficient, And controlling the motion vector for the motion of the virtual camera.
6. The method of claim 1, wherein the higher-level decision module synchronizes higher-level instruction information to the lower-level execution module according to a preset cycle step.
7. A virtual reality photography composition optimization system based on a hierarchical reinforcement learning virtual reality photography composition optimization method as claimed in any one of claims 1 to 6, comprising: a data acquisition module configured to acquire image frames acquired in real time by a virtual camera in a virtual environment; The hierarchical decision module is in communication connection with the data acquisition module and comprises a pre-trained high-level decision module and a low-level execution module; the high-level decision module is configured to extract and abstract high-level features of the image frame and generate high-level guide information; The low-level execution module is configured to extract low-level features of the image frames and generate a motion control instruction for controlling the virtual camera in combination with the high-level instruction information; And the control execution module is in communication connection with the layering decision module and is configured to send the motion control instruction back to the virtual environment so as to drive the virtual camera to move, and real-time optimization of photographic composition is realized.

Description

Virtual reality photography composition optimization method and system based on layered reinforcement learning Technical Field The invention relates to the technical field of computer vision processing, in particular to a virtual reality photography composition optimization method and system based on layered reinforcement learning, which are suitable for a photography automation and intelligent guidance system in a virtual engine environment. Background Virtual Reality (VR), augmented Reality (AR), and "meta universe" concepts based thereon are becoming increasingly new infrastructures in the fields of digital entertainment, virtual photography, architectural design, and interactive education. In these three-dimensional virtual spaces, the quality of the visual presentation directly determines the user's immersion and experience satisfaction. The selection and composition of the visual angle of the camera, like the shooting and photographing in the real world, is a core link for determining the aesthetic feeling of the picture and the tension of the narrative. However, in the current virtual environment, acquiring an ideal photographic composition still faces significant challenges, mainly due to the following technical limitations: first, current three-dimensional camera relies on manual control more, requires the user to possess certain photographic knowledge and complex operation, has reduced efficiency and the experience of content creation. Secondly, the current mainstream composition optimization method mainly aims at static images, mostly in offline cutting, and is difficult to meet the requirements of real-time feedback and high interactivity in a three-dimensional scene. Although reinforcement learning has been studied to apply to automatic clipping or unmanned aerial vehicle path planning, these approaches are mostly limited to simplifying the simulation environment, are difficult to land in high frame rate, low delay, multi-modal interactions, etc. practical application scenarios, and have limited adaptability to complex new scenarios. Therefore, it is a need for a method and system for implementing smart photo composition in a real application environment, which can be deeply integrated into a modern virtual engine and has a hierarchical decision capability like a human photographer. Disclosure of Invention In view of the above, the invention provides a document classification and key information extraction method for multi-page bidding documents, which can integrate aesthetic evaluation, layered decision and natural interaction under a real virtual engine environment to realize an efficient, intelligent and real-time photographic composition optimization solution. In order to achieve the above purpose, the present invention adopts the following technical scheme: the invention firstly provides a virtual reality photography composition optimization method based on layered reinforcement learning, which comprises the following steps: S1, acquiring image frames acquired in real time by a virtual camera in a virtual environment; S2, inputting the image frames into a pre-trained hierarchical reinforcement learning model, wherein the hierarchical reinforcement learning model comprises a high-level decision module and a low-level execution module; s3, the high-level decision module performs high-level feature extraction and abstraction on the image frame to generate high-level guide information; s4, the low-level execution module extracts low-level features of the image frames and generates a motion control instruction for controlling the virtual camera by combining the high-level instruction information; and S5, sending the motion control instruction back to the virtual environment to drive the virtual camera to move, so that real-time optimization of the photographing composition is realized. Preferably, the step of generating the high-level guidance information in S3 includes: extracting features of the image frames by using a feature extraction network to obtain a first feature representation; channel attention weight distribution is carried out on the first characteristic representation so as to carry out channel recalibration, and a first optimized characteristic is obtained; And generating a characteristic channel modulation parameter and a state sub-target vector of the virtual camera based on the first optimized characteristic, wherein the characteristic channel modulation parameter is used for carrying out channel-level conditional modulation on the low-level characteristic, and the state sub-target vector is used as a conditional input of a low-level execution module to guide the low-level execution module to generate action output conforming to a sub-target direction. Preferably, the step of inputting the first optimized feature into a multi-layer perceptron to generate the feature channel modulation parameters in parallel, wherein the feature channel modulation parameters comprise a scaling