CN-121982611-A - Instructive video process planning method based on cascade stacking generator and flowsheet optimization

CN121982611ACN 121982611 ACN121982611 ACN 121982611ACN-121982611-A

Abstract

The invention discloses a guiding video process planning method based on cascade stacking generator and flowsheet optimization, which comprises the following steps of 1, acquiring a process planning data set And a corresponding action step truth value set A, and collecting the data set The method comprises the steps of A, dividing the training set into a training set and a testing set, 2, constructing an instructive video process planning network structure based on cascade stacking generator and flow diagram optimization, 3, constructing a loss function, 4, setting training parameters, inputting the training set into the network structure in sequence according to batches, calculating a relevant loss function through a truth value set of an action step corresponding to the training set, training to obtain a network model, and 5, sending the testing set into the trained network model to output a planning result. The method solves the problems of insufficient planning precision caused by low differentiation degree, weak correlation between states and poor sequential consistency in the middle step of the guiding video process planning in the prior art.

Inventors

ZHAO FAN
WANG YIWEN
YAO ZHUO
ZHANG LETIAN
GONG JUNHAO
WANG JING

Assignees

西安理工大学

Dates

Publication Date: 20260505
Application Date: 20260127

Claims (10)

1. The guided video process planning method based on cascade stack generator and flowsheet optimization is characterized by comprising the following steps of: Step1, acquiring a process planning dataset And a corresponding action step truth value set A, and collecting the data set Dividing the training set and the testing set into a training set and a testing set respectively; Step 2, constructing an instructive video process planning network structure based on cascade stack generator and flowsheet optimization ; Step 3, constructing a loss function ; Step 4, setting training parameters, and inputting training sets into the network structure in sequence according to batches And calculating a loss function through the action step truth value set corresponding to the training set Thereby training to obtain a network model ; Step 5, inputting the test set into the trained network model Outputting the corresponding planning result 。
2. The directed video process planning method based on cascade stack generator and flowsheet optimization of claim 1, wherein step 1 is specifically: Downloading video action sequence start and target state datasets from a video action process planning dataset official website True value set corresponding to action step , , , Representing the start and end frame images in the nth sequence of actions in V, i.e Wherein And Respectively represent A start frame and an end frame image in an nth action sequence; Representation of A truth set of action steps in an nth action sequence, T is the total number of action steps, For the nth action step in the nth action sequence, consisting of a text description of the action step and a class label, i.e And Respectively represent the action steps Corresponding text description and category labels, the total number of action sequences is N, and the data set is obtained Pressing the button The numerical proportion of the training set And test set , A start frame and an end frame image representing a p-th sequence of actions in the training set, Start frame and end frame images representing the qth sequence of actions in the test set, handle Dividing the real value set into action step truth value sets corresponding to training sets according to the same numerical proportion Action step truth value set corresponding to test set I.e. Representation of A true value set of action steps in the p-th action sequence, Representation of A set of action step truth values in the q-th action sequence in which 。
3. The method for guided video process planning based on cascade stack generator and flowsheet optimization of claim 2, wherein the guided video process planning network structure based on cascade stack generator and flowsheet optimization in step 2 The device comprises an input module, a feature extraction module, a condition cascade generation module and a prediction module which are connected in sequence; the condition cascade generation module comprises a first-stage condition generation sub-module and a second-stage condition generation sub-module which are sequentially connected; the first-stage condition generation sub-module and the second-stage condition generation sub-module comprise a condition characteristic enhancement unit, a flow diagram planning generator and a discriminator which are connected in sequence.
4. The cascade stack generator and flowsheet optimization based directed video process planning method of claim 3, wherein the cascade stack generator and flowsheet optimization based directed video process planning network structure The working flow of (2) is as follows: step 2.1, the input module extracts the training set Image set of start frame and end frame of p-th action sequence in (b) , ) Wherein The initial frame and the end frame of the action sequence are respectively images, and the dimension is ; Step 2.2, the feature extraction module extracts the initial frame image And end frame image Inputting the image to a trained image encoder to extract the characteristics of the image to obtain the initial motion state characteristics And target motion state features , ; Step 2.3, handle And Sending the data to a one-stage condition generation sub-module to obtain a generated intermediate state sequence O 1 , and then Intermediate state sequence O 1 Sending the motion to a two-stage condition generating sub-module, and outputting a final intermediate state sequence O 2 of the motion; Step 2.4, the prediction Module will O 2 and Is a series characteristic of (2) Input to a prediction module, output a predicted sequence of action steps 。
5. The directed video process planning method based on cascade stack generator and flowsheet optimization of claim 4 wherein the workflow of the one-stage condition generating sub-module of step 2.3 is: Feature of motion start status And target state features The initial state characteristics are obtained by series connection and splicing I.e. Input to conditional feature enhancement unit to output an action intermediate vision state sequence , The handle is composed of And The composed action state sequence is input into a flow chart planning generator to output a one-stage generated ordered intermediate state sequence O 1 , Inputting the intermediate state sequence O 1 generated in one stage into a discriminator D to output the data source identifier of O 1 ; the workflow of the two-stage condition generating sub-module is as follows: Feature of motion start status Intermediate state sequence O 1 generated in one stage and action target state characteristics The motion sequence state characteristics are obtained by series connection and splicing Will (i) be Input to conditional feature enhancement unit to output an action intermediate vision state sequence , Will be 、 And Input to the output of the flow graph plan generator a two-phase generated intermediate state sequence O 2 ,O 2 , The two-stage generation result O 2 is input to the arbiter D to output the data source identifier of O 2 .
6. The method for planning an instructional video process based on cascade stack generator and flowsheet optimization according to claim 5, wherein the specific working procedures of the one-stage condition generating sub-module and the two-stage condition generating sub-module in step 2.3 are as follows: step 2.3.1, the specific process of the conditional characteristic enhancing unit is as follows: (1) Input variable Feeding into a multi-layer perceptron MLP consisting of fully connected layers Activation function Composition, output vector : (1) Inputting variables in a one-stage condition generating sub-module and a two-stage condition generating sub-module The method respectively corresponds to: And ; (2) Calculation of Mean of (2) Variance of logarithm And standard deviation ; (3) Generating a condition variable White gaussian noise And Multiplication and sum Adding to obtain the generated conditional variable I.e. , ∽ Then the ith intermediate vision state The method comprises the following steps: (2) Wherein the method comprises the steps of To be distributed in standard normal ∽ , ; The said The first-stage condition generating sub-module and the second-stage condition generating sub-module are respectively corresponding to And ; Step 2.3.2, the specific process of the flow chart planning generator comprises two steps of flow chart construction and flow chart optimization, and specifically comprises the following steps: (1) The flow graph is constructed by a T-section sub-path, and the initial node and the target node of the total path are respectively And ; In the one-stage condition generation submodule, the 1 st-stage sub-path set is% , ), The t-th sub-path set is% , ), , , The T-th sub-path set is% , ), Any path from the initial node to the target node via the T-segment sub-path is recorded as In the following The node sequence appearing in (a) is recorded as , Here, where , , , , ; In the two-stage condition generation submodule, the 1 st segment of the sub-path set is% , ), The t-th sub-path set is% , ), , , The T-th sub-path set is% , ), Any path from the initial node to the target node via the T-segment sub-path is recorded as In the following The node sequence appearing in (a) is recorded as , , wherein, , , , , ; The said And Unified use below To represent; (2) Flowsheet optimization A. Calculation of Is a rating score of (2): ① Computing paths Differential attention score for a medium node sequence : Route of the path Corresponding node sequence As input, the node characteristic sequence is sent into a differential attention network module and output after the global context information and the inter-characteristic dynamic change information are fused And differential attention score , wherein, , , : (4) For differential attention score Normalization and average processing, wherein the normalization call is a normalization function normalize () in PyTorch library, the average processing call is an average function mean () in PyTorch library, and the differential attention score after normalization and average processing is obtained , : (5) Wherein the differential attention network module invokes the published on Github MultiheadDiffAttn modules in the code library are realized; ② Computing paths Mutual attention score for middle node and head-to-tail node : Handle path Corresponding node sequence Head-to-tail node characteristics of (a) Assigning values to key-value variables K and V, and Intermediate node feature sequence in (a) Assigning values to query variable Q, sending Q, K and V as inputs to a mutual attention network module, outputting a dynamically weighted, relationship driven sequence of enhanced features rich in global context information And mutual attention score : (6) Score of mutual attention Performing normalization and average processing in step ①, and performing complementary processing to obtain processed mutual attention fraction , : (7) The mutual attention network module is realized by a torch.nn.multiteadattention module in an open source machine learning library PyTorch; ③ Computing paths Smoothness score for mid-node features : Path The node characteristic sequence after the medium enhancement is expressed as Here, where , First, the cosine similarity function is utilized Calculating characteristics of two adjacent nodes And Before correlation, calculating the average value of all similarity scores as the smoothness score of the path , : (8) ④ Computing paths Composite score : (9) B. In path set The best path is selected: (10) Extraction path Intermediate characteristic sequence in The path planning result of the flow graph is recorded as a variable O, wherein O 1 and O 2 are respectively corresponding to the first-stage condition generation sub-module and the second-stage condition generation sub-module; Step 2.3.3, inputting a path planning result O of the flow diagram into a identifier output data source identifier, namely judging whether input data are generated from the flow diagram or from a real sample, wherein the identifier D consists of an MLP network module and a Sigmoid activation function, and the MLP network module consists of three full-connection layers, a ReLU activation layer and a 1X 1 convolution layer.
7. The directed video process planning method based on cascade stack generator and flowsheet optimization of claim 6, wherein the step 2.4 is specifically: Feature of initial state Two-phase generated intermediate state sequence O 2 and end state feature After being connected in series, the motion sequence is used as input to a prediction module to output a predicted motion sequence The prediction module consists of an average pooling layer, a full connection layer, a Dropout layer and a Softmax normalization layer, Representing a prediction module: (11)。
8. the directed video process planning method based on cascade stack generator and flowsheet optimization of claim 7 wherein said loss function of step 3 From generation loss Discriminating loss Loss of state supervision And action classification loss The formation of the generation loss Loss of generation by a flowsheet plan generator in a one-stage condition generation sub-module And a generation penalty of the flow graph plan generator in the two-stage condition generation submodule Constitution, the discrimination loss Generation of discriminators in sub-modules from one-stage conditions Is a loss of discrimination of (2) And a arbiter in a two-stage condition generating sub-module Is a loss of discrimination of (2) The formula is as follows: (12) (13) (14) (15) (16) (17) (18) (19) (20) Wherein, the Is the i-th stage discriminator Is a loss of discrimination of (a), , Representing the arbiter mapping function of the i-th stage, Representing the desired operator, refer to Sample data distribution generated in individual stages Intermediate state characteristic sequence obtained by intermediate sampling Is used as a reference to the desired value of (a), Represents a true value representation of the intermediate step text feature, For the loss of the KL divergence, Represents the KL divergence loss weight coefficient, Representing the feature sequence output by the conditional feature enhancement unit, = , Representing expectation operators, meaning data distribution from reality Conditional enhancement feature sequences from mid-sampling Is used as a reference to the desired value of (a), Representing the t-th feature in the intermediate state feature sequence generated in two stages, Action category truth labels representing intermediate steps, Representing the predicted action probability of the intermediate step.
9. The directed video process planning method based on cascade stack generator and flowsheet optimization of claim 8, wherein said step4 is specifically: Step 4.1, setting an instructive video process planning network structure based on cascade stack generator and flowsheet optimization Training parameters of (a), the training parameters comprising a total number of iterations Rate of learning Number of data samples used per batch of training Minimum change value of target loss function Variable of training iteration number And network optimizer ; Step 4.2 from the training data set Random reading of The individual samples are fed into an instructive video process planning network architecture based on cascade stack generator and flowsheet optimization Training; step 4.3, calculating a loss function of two iterations in the training process Absolute difference of (2) If (3) Or alternatively Ending training and outputting the network model Otherwise Using An optimizer to reverse revise an instructional video process planning network structure based on cascade stack generator and flowsheet optimization And (3) repeating the step 4.2 and the step 4.3 according to the weight coefficient of each network layer.
10. The directed video process planning method based on cascade stack generator and flowsheet optimization of claim 9, wherein said step 5 is specifically: step 5.1, inputting test set Defining the total number of videos as Video count variable q and process planning result set Initializing that q=1, =Null, i.e. empty set; Step 5.2, test set The q-th video V q test in (a) is input into the network model Obtaining a prediction result ; Step 5.3, the detection result Cortex Acanthopancis Radicis In the process, the ; Step 5.4, judging Whether or not to be smaller than If so, then Returning to the step 5.2, otherwise, outputting 。

Description

Instructive video process planning method based on cascade stacking generator and flowsheet optimization Technical Field The invention belongs to the technical field of image processing methods, and relates to a guided video process planning method based on cascade stacking generator and flowsheet optimization. Background The guiding video is used as a unique carrier for fusing time sequence steps and abundant visual information, and the research of action prediction and process planning has important significance for improving the skill imitation capability and the complex task execution capability of the robot. Currently, machine learning agents still face challenges in coping with such tasks, including high-dimensional noise interference in visual data, flexible combinability of motion sequences, and deficiencies in modeling human decision logic, resulting in difficulty in refining generic planning patterns. In addition, the problems of low differentiation degree, weak state relevance, poor time sequence consistency and the like among the intermediate steps further limit the planning precision. Under the background, the process planning research facing the guiding video breaks through the limitation of the traditional action perception, and the inherent association between actions and states is modeled and the multipath execution strategy is predicted by introducing technologies such as cascade stack generator, flow diagram optimization and the like. The research has important application value for promoting intelligent cooperation of robots in scenes such as industrial manufacturing, home service and the like, and has profound theoretical significance for reducing man-machine cognition gap. Disclosure of Invention The invention aims to provide an instructive video process planning method based on cascade stack generator and flowsheet optimization, which solves the problems of insufficient planning precision caused by low differentiation degree, weak correlation between states and poor time sequence continuity in the middle step of instructive video process planning in the prior art. The technical scheme adopted by the invention is that the method for planning the guiding video process based on cascade stacking generator and flow diagram optimization is implemented according to the following steps: Step1, acquiring a process planning dataset And a corresponding action step truth value set A, and collecting the data setDividing the training set and the testing set into a training set and a testing set respectively; Step 2, constructing an instructive video process planning network structure based on cascade stack generator and flowsheet optimization ; Step 3, constructing a loss function; Step 4, setting training parameters, and inputting training sets into the network structure in sequence according to batchesAnd calculating a loss function through the action step truth value set corresponding to the training setThereby training to obtain a network model; Step 5, inputting the test set into the trained network modelOutputting the corresponding planning result。 Further, the step 1 specifically includes: Downloading video action sequence start and target state datasets from a video action process planning dataset official website True value set corresponding to action step,,,Representing the start and end frame images in the nth sequence of actions in V, i.eWhereinAndRespectively representA start frame and an end frame image in an nth action sequence; Representation of A truth set of action steps in an nth action sequence,T is the total number of action steps,For the nth action step in the nth action sequence, consisting of a text description of the action step and a class label, i.eAndRespectively represent the action stepsCorresponding text description and category labels, the total number of action sequences is N, and the data set is obtainedPressing the buttonThe numerical proportion of the training setAnd test set,A start frame and an end frame image representing a p-th sequence of actions in the training set,Start frame and end frame images representing the qth sequence of actions in the test set, handleDividing the real value set into action step truth value sets corresponding to training sets according to the same numerical proportionAction step truth value set corresponding to test setI.e.Representation ofA true value set of action steps in the p-th action sequence,Representation ofA set of action step truth values in the q-th action sequence in which。 Further, step 2, planning network structure based on cascade stack generator and flow graph optimization for guiding video processThe device comprises an input module, a feature extraction module, a condition cascade generation module and a prediction module which are connected in sequence; The condition cascade generating module comprises a first-stage condition generating sub-module and a second-stage condition generating sub-module which are sequ