KR-102961234-B1 - METHOD FOR ESTIMATING MOTION OF OBJECT BASED ON VISION SENSOR AND OBJECT MOTION ESTIMATING DEVICE USING THE SAME

KR102961234B1KR 102961234 B1KR102961234 B1KR 102961234B1KR-102961234-B1

Abstract

The present invention relates to a method for estimating the motion of an object based on a vision sensor, comprising: (a) a step in which, when vision sensor data including a raw object relative distance, which is the distance from the vehicle to the object, and an object heading angle, which is the heading angle of the object, detected through a deep learning-based detection model from at least one image captured by at least one camera installed on the vehicle, and vehicle system data detected by the system of the vehicle are sequentially acquired, the object motion estimation device inputs Z-th vision sensor data and Z-th vehicle system data corresponding to Z-th image frame, which is the current image frame, into a data preprocessing module, thereby causing the data preprocessing module to preprocess the Z-th vision sensor data and the Z-th vehicle system data to generate a Z-th object motion estimation vector for estimating the motion of the object; and (b) a step in which the object motion estimation device inputs the object motion estimation vector to a deep learning-based sequential regression network, and causes the sequential regression network to perform recurrent learning operations on the object motion estimation vector to generate object motion data that predicts the motion of the object corresponding to the object image frame.

Inventors

김일화
양동걸

Assignees

주식회사 스트라드비젼

Dates

Publication Date: 20260507
Application Date: 20250520

Claims (20)

In a method for estimating object motion based on a monocular camera, (a) When camera acquisition data including the raw object relative distance, which is the distance from the vehicle to the object, and the object heading angle, which is the heading angle of the object, are detected through a deep learning-based detection model—wherein the detection model is a deep learning model trained to detect an object in the image and output the bounding box, class information, raw object relative distance, and object heading angle of the detected object—and vehicle system data detected by the vehicle system are sequentially acquired from at least one image captured by a monocular camera installed on the vehicle, the object motion estimation device inputs the first camera acquisition data and the first vehicle system data corresponding to the current image frame, the first image frame, into a data preprocessing module, thereby causing the data preprocessing module to preprocess the first camera acquisition data and the first vehicle system data to generate a first object motion estimation vector for estimating the motion of the object; (b) a step in which the object motion estimation device inputs the object motion estimation vector z-t to a deep learning-based sequential regression network and causes the sequential regression network to perform recurrent learning operations on the object motion estimation vector z-t to generate object motion estimation data z-t predicted object motion data that predicts the motion of the object corresponding to the object image frame z-t; and (c) The object motion estimation device inputs the t-th predicted object motion data and the t-th vehicle system data into a sequential filtering network, and causes the sequential filtering network to generate (i) t-th predicted corrected object motion data, which predicts the corrected object motion data in the t-th image frame by applying a learning operation to the t-th vehicle system data and the (t-1)th predicted object motion data, which corrects the (t-1)th predicted object motion data in the (t-1)th image frame through a state prediction model—wherein the (t-1) corrected object motion data is object motion data corrected by referencing the (t-1)th predicted object motion data and the (t-1)th vehicle system data—and the t-th predicted corrected object motion data, which predicts the corrected object motion data in the t-th image frame; and (ii) through a state vector generation module, the t-th predicted difference state vector according to the difference between the t-th predicted object motion data and the (t-1)th predicted object motion data, the t-th prediction-correction difference state vector according to the difference between the t-th predicted object motion data and the t-th predicted corrected object motion data, and the t-th predicted corrected object motion data and the (iii) generating a t-1 correction difference state vector based on the difference of the (t-1) correction object motion data, and generating a t_1 uncertainty probability value that estimates the uncertainty of the t-1 prediction correction object motion data by applying a recurrent operation to the t-1 correction difference state vector through a first filtering model, and generating a t_2 uncertainty probability value that estimates the uncertainty of the t-1 prediction object motion data by applying a recurrent operation to the t-1 prediction difference state vector and the t-1 prediction-correction difference state vector through a second filtering model, and (iv) generating a t-1 feedback gain generated by referencing the t_1 uncertainty probability value and the t_2 uncertainty probability value to the t-1 prediction-correction difference state vector to correct the t-1 prediction correction object motion data, thereby generating the t-1 correction object motion data in the t-1 image frame; A method including
delete
In paragraph 1, In step (c) above, The object motion estimation device described above enables the sequential filtering network to further generate a t-object distance state vector based on the t-predicted object relative distance included in the t-predicted object motion data through the state vector generation module, generates the t_1 uncertainty probability value which estimates the uncertainty of the t-predicted corrected object motion data by applying a recurrent operation to the t-corrected difference state vector and the t-object distance state vector through the first filtering model, and generates the t_2 uncertainty probability value which estimates the uncertainty of the t-predicted object motion data by applying a recurrent operation to the t-predicted difference state vector, the t-predicted-corrected difference state vector, and the t-object distance state vector through the second filtering model.
In paragraph 1, A method in step (c) above, wherein the z-t vehicle system data includes z-t vehicle speed, which is the speed of the vehicle in the z-t image frame; z-t vehicle acceleration, which is the acceleration of the vehicle in the z-t image frame; z-t vehicle yaw rate, which is the yaw rate of the vehicle in the z-t image frame; and z-t time difference, which is the difference from the time when the (t-1) image frame was acquired to the time when the z-t image frame was acquired.
In paragraph 1, A method in which, in step (c) above, the state prediction model is composed of a motion model including at least one of a constant velocity model and a constant acceleration model.
In paragraph 1, A method in which, in step (c) above, the first filtering model and the second filtering model are each composed of one of the following models: RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), and GRU (Gated Recurrent Unit).
In paragraph 1, In step (a) above, The object motion estimation device enables the data preprocessing module to generate, by referencing the jet camera acquisition data and the jet vehicle system data, a jet object relative distance estimation vector including jet object relative distance estimation data, a jet object absolute velocity estimation vector including jet object absolute velocity estimation data, and a jet object absolute acceleration estimation vector including jet object absolute acceleration estimation data, as a jet object motion estimation vector. In step (b) above, The object motion estimation device comprises a method for generating object motion data including object motion data including object motion relative distance, object motion absolute speed, and object motion absolute acceleration by causing the sequential regression network to generate object motion relative distance by predicting the relative distance of the object from the vehicle in the object motion relative distance estimation vector through an object relative distance estimation model, generating object motion absolute speed by predicting the absolute speed of the object in the object motion absolute speed estimation vector through an object absolute speed estimation model, and generating object motion absolute acceleration by predicting the absolute acceleration of the object in the object motion absolute acceleration estimation vector through an object absolute acceleration estimation model.
In paragraph 1, In step (a) above, The object motion estimation device enables the data preprocessing module to generate a vector for estimating the object motion that includes at least some of the following: a first raw object relative distance, a first raw object relative speed which is the relative speed of the object with respect to the vehicle in the first image frame, a first raw object absolute speed which is the absolute speed of the object in the first image frame, a first vehicle speed which is the speed of the vehicle in the first image frame, a first vehicle acceleration which is the acceleration of the vehicle in the first image frame, a first vehicle yaw rate which is the yaw rate of the vehicle in the first image frame, a first object heading angle, and a first time difference which is the difference from the time when the (t-1) image frame was acquired to the time when the first image frame was acquired. A method in which the relative distance of the first raw object, the relative velocity of the first raw object, the absolute velocity of the first raw object, the speed of the first vehicle, the acceleration of the first vehicle, the yaw rate of the first vehicle, the heading angle of the first object, and the time difference are included in or calculated by referencing the first camera acquisition data and the first vehicle system data.
In paragraph 8, In step (a) above, The object motion estimation device enables the data preprocessing module to generate, by referring to the first camera acquisition data and the first vehicle system data, a vector for estimating the first object motion, comprising: (i) a vector for estimating the first object relative distance including the first raw object relative distance, the first raw object relative velocity, the first object heading angle, and the first time difference; (ii) a vector for estimating the first object absolute velocity including the first raw object relative distance, the first raw object absolute velocity, the first vehicle speed, the first vehicle yaw rate, the first object heading angle, and the first time difference; and (iii) a vector for estimating the first object absolute acceleration including the first raw object relative distance, the first raw object absolute velocity, the first vehicle speed, the first vehicle acceleration, the first vehicle yaw rate, the first object heading angle, and the first time difference. In step (b) above, The object motion estimation device comprises a method for generating object motion data including object motion data including object motion relative distance, object motion absolute speed, and object motion absolute acceleration by causing the sequential regression network to generate object motion relative distance by predicting the relative distance of the object from the vehicle in the object motion relative distance estimation vector through an object relative distance estimation model, generating object motion absolute speed by predicting the absolute speed of the object in the object motion absolute speed estimation vector through an object absolute speed estimation model, and generating object motion absolute acceleration by predicting the absolute acceleration of the object in the object motion absolute acceleration estimation vector through an object absolute acceleration estimation model.
In Paragraph 9, In step (a) above, The object motion estimation device described above has a method in which the data preprocessing module generates the object motion estimation vector such that each of the object relative distance estimation vector, object absolute velocity estimation vector, and object absolute acceleration estimation vector further includes the vehicle pitch, which is the pitch of the vehicle in the image frame, and the vehicle roll, which is the roll of the vehicle in the image frame.
In paragraph 1, In step (b) above, The above sequential regression network is a method composed of any one of the following models: RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), and GRU (Gated Recurrent Unit).
In an object motion estimation device that estimates the motion of an object based on a monocular camera, Memory storing instructions for estimating object motion based on a monocular camera; and A processor that performs an operation to estimate the motion of the object based on the monocular camera according to the instructions stored in the memory; Includes, The above processor comprises: (I) a deep learning-based detection model from at least one image captured by the monocular camera installed in the vehicle - - The above detection model is a deep learning model trained to detect an object in the above image and output the bounding box, class information, raw object relative distance, and object heading angle of the detected object - When camera acquisition data including the raw object relative distance, which is the distance from the vehicle to the object detected through the above detection model, and the object heading angle, which is the heading angle of the object, and vehicle system data detected by the vehicle system are sequentially acquired, the Z-th camera acquisition data and Z-th vehicle system data corresponding to the current image frame, Z-th image frame, are input to a data preprocessing module, thereby causing the data preprocessing module to preprocess the Z-th camera acquisition data and the Z-th vehicle system data to generate a Z-th object motion estimation vector for estimating the motion of the object, (II) input the Z-th object motion estimation vector into a deep learning-based sequential regression network, and cause the sequential regression network to perform recurrent learning operations on the Z-th object motion estimation vector to predict the motion of the object corresponding to the Z-th image frame, thereby generating a Z-th prediction A process for generating object motion data, and (III) inputting the t-th predicted object motion data and the t-th vehicle system data into a sequential filtering network, and causing the sequential filtering network to generate (i) t-th corrected object motion data obtained by correcting the t-th predicted object motion data in the t-th image frame through a state prediction model—wherein the t-th corrected object motion data is object motion data obtained by correcting the t-th predicted object motion data by referencing the t-th predicted object motion data and the t-th vehicle system data—and t-th predicted corrected object motion data obtained by applying a learning operation to the t-th vehicle system data to predict the corrected object motion data in the t-th image frame, and (ii) through a state vector generation module, a t-th predicted difference state vector according to the difference between the t-th predicted object motion data and the t-th predicted object motion data, a t-th predicted-corrected difference state vector according to the difference between the t-th predicted object motion data and the t-th predicted corrected object motion data, and the t-th predicted corrected object motion data and An object motion estimation device that performs a process of generating a t-corrected difference state vector based on the difference of the (t-1) corrected object motion data, (iii) generating a t_1 uncertainty probability value by estimating the uncertainty of the t-predicted corrected object motion data by applying a recurrent operation to the t-corrected difference state vector through a first filtering model, and generating a t_2 uncertainty probability value by estimating the uncertainty of the t-predicted object motion data by applying a recurrent operation to the t-predicted difference state vector and the t-predicted-corrected difference state vector through a second filtering model, and (iv) generating a t-corrected object motion data in the t-image frame by correcting the t-predicted corrected object motion data by applying a t-feedback gain generated by referencing the t_1 uncertainty probability value and the t_2 uncertainty probability value to the t-predicted-corrected difference state vector.
delete
In Paragraph 12, The above processor, in the above process (III), causes the sequential filtering network to further generate a t-object distance state vector according to the t-predicted object relative distance included in the t-predicted object motion data through the state vector generation module, generates the t_1 uncertainty probability value which estimates the uncertainty of the t-predicted object motion data by applying a recurrent operation to the t-corrected difference state vector and the t-object distance state vector through the first filtering model, and generates the t_2 uncertainty probability value which estimates the uncertainty of the t-predicted object motion data by applying a recurrent operation to the t-predicted difference state vector, the t-predicted-corrected difference state vector, and the t-object distance state vector through the second filtering model.
In Paragraph 12, In the above process (III), the object motion estimation device comprises the following: the z-t vehicle system data is the z-t vehicle speed, which is the speed of the vehicle in the z-t image frame; the z-t vehicle acceleration, which is the acceleration of the vehicle in the z-t image frame; the z-t vehicle yaw rate, which is the yaw rate of the vehicle in the z-t image frame; and the z-t time difference, which is the difference between the time when the (t-1) image frame was acquired and the time when the z-t image frame was acquired.
In Paragraph 12, In the above (III) process, the state prediction model is an object motion estimation device comprising a motion model including at least one of a constant velocity model and a constant acceleration model.
In Paragraph 12, In the above process (III), the object motion estimation device wherein each of the first filtering model and the second filtering model is composed of a model of any one of an RNN (Recurrent Neural Network), an LSTM (Long Short-Term Memory), and a GRU (Gated Recurrent Unit).
In Paragraph 12, The above processor is, In the above process (I), the data preprocessing module is configured to generate, by referencing the jet camera acquisition data and the jet vehicle system data, a jet object relative distance estimation vector including jet object relative distance estimation data, a jet object absolute velocity estimation vector including jet object absolute velocity estimation data, and a jet object absolute acceleration estimation vector including jet object absolute acceleration estimation data, as a jet object motion estimation vector. An object motion estimation device that, in the above process (II), generates the object motion data including the object relative distance, the object absolute speed, and the object absolute acceleration by applying a recurrent operation to the vector for estimating the object relative distance in the object relative distance estimation model to generate the object relative distance in the object relative distance estimation model, by applying a recurrent operation to the vector for estimating the object absolute speed estimation model to generate the object absolute speed in the object absolute speed estimation model, and by applying a recurrent operation to the vector for estimating the object absolute acceleration estimation model to generate the object absolute acceleration in the object absolute acceleration estimation model.
In Paragraph 12, The processor, in the process (I), causes the data preprocessing module to generate a vector for motion estimation of the first object, comprising at least some of the following: a first raw object relative distance, a first raw object relative speed which is the relative speed of the object with respect to the vehicle in the first image frame, a first raw object absolute speed which is the absolute speed of the object in the first image frame, a first vehicle speed which is the speed of the vehicle in the first image frame, a first vehicle acceleration which is the acceleration of the vehicle in the first image frame, a first vehicle yaw rate which is the yaw rate of the vehicle in the first image frame, a first object heading angle, and a first time difference which is the difference from the time when the first image frame was acquired to the time when the first image frame was acquired; wherein the first raw object relative distance, the first raw object relative speed, the first raw object absolute speed, the first vehicle speed, the first vehicle acceleration, the first vehicle yaw rate, the first object heading angle, and the first time difference are the first camera acquisition data and the An object motion estimation device that is included in or calculated by referencing the vehicle system data.
In Paragraph 19, The above processor is, In the above process (I), the data preprocessing module is configured to generate, by referencing the first camera acquisition data and the first vehicle system data, a vector for estimating the motion of the first object, comprising: (i) a vector for estimating the relative distance of the first object including the relative distance of the first raw object, the relative velocity of the first raw object, the heading angle of the first object, and the time difference of the first object; (ii) a vector for estimating the absolute velocity of the first object including the relative distance of the first raw object, the absolute velocity of the first raw object, the speed of the first vehicle, the yaw rate of the first vehicle, the heading angle of the first object, and the time difference of the first object; and (iii) a vector for estimating the absolute acceleration of the first object including the relative distance of the first raw object, the absolute velocity of the first raw object, the speed of the first vehicle, the acceleration of the first vehicle, the yaw rate of the first vehicle, the heading angle of the first object, and the time difference of the first object. An object motion estimation device that, in the above process (II), generates the object motion data including the object relative distance, the object absolute speed, and the object absolute acceleration by applying a recurrent operation to the vector for estimating the object relative distance in the object relative distance estimation model to generate the object relative distance in the object relative distance estimation model, by applying a recurrent operation to the vector for estimating the object absolute speed estimation model to generate the object absolute speed in the object absolute speed estimation model, and by applying a recurrent operation to the vector for estimating the object absolute acceleration estimation model to generate the object absolute acceleration in the object absolute acceleration estimation model.

Description

Method for estimating motion of an object based on a vision sensor and object motion estimating device using the same The present invention relates to a method for estimating the motion of an object based on a vision sensor and an object motion estimation device using the same. Generally, ADAS (Advanced Driving Assistant System) is a system that assists and supports driving to enable safe and convenient operation of a vehicle. Automobiles are equipped with various recognition sensors such as radar, lidar, cameras, and ultrasound, and safety technologies are being applied that utilize these sensors to detect approaching vehicles, pedestrians, or obstacles, thereby warning the driver of danger in advance or supporting them in actively avoiding accidents. For example, autonomous vehicles include a Forward Collision Warning (FCW) system that monitors vehicle speed and relative speed to the vehicle ahead to warn the driver of a collision risk if the distance becomes too close; a Lane Departure Warning (LDW) system that monitors driving conditions ahead to detect when the vehicle deviates from the driving lane and warns the driver; a Blind Spot Warning (BSW) system that detects other vehicles located in blind spots on the sides that are difficult to see through the left and right side mirrors from the driver's seat and provides guidance or warnings to the driver; an Automatic Emergency Braking (AEB) system that automatically applies the vehicle's brakes to prevent a collision or mitigate the impact of an accident if a collision risk with a vehicle ahead is detected while driving; a Lane Keeping Assist System (LKAS) that warns the driver in advance to prevent the vehicle from deviating from the normal lane and assists in returning to the normal lane if it deviates; Adaptive Cruise Control (ACC) that supports driving in the center of the lane while maintaining a distance from the vehicle ahead according to the maximum speed set by the driver; and low speed Various safety technologies are being applied, such as the Parking Collision-Avoidance Assist (PCAA), which warns of anticipated collisions with other vehicles, pedestrians, or obstacles behind the vehicle and supports emergency braking, and the Smart Parking Assist (SPA), which utilizes cameras and ultrasonic sensors to search for parking spaces and automatically controls steering, gear shifting, and vehicle speed based on this to assist the driver with parking and exiting. Meanwhile, in the ADAS market, there is a growing demand for vision-based perception systems that can operate in real-time even on lean platforms that operate with minimal resources and computational resources. In particular, as attempts to implement safety and convenience features, such as automatic emergency braking systems and adaptive cruise control, using only vision sensors become more active, improving single-camera-based perception performance is becoming an important task. However, conventional filtering techniques alone had technical limitations in inferring higher-order physical quantities, such as velocity and acceleration, based on distance information estimated from a single camera. In other words, minute errors or noise in the estimated distance to an object obtained through the analysis of images captured by a single camera are amplified in calculations such as speed and acceleration, which made it difficult to secure stable physical quantities required for ADAS control. Therefore, the applicant intends to propose a technology that enables the stable estimation of physical quantities such as speed and acceleration from distance information acquired based on a vision sensor. The drawings attached below for use in describing embodiments of the present invention are merely some of the embodiments of the present invention, and other drawings can be obtained based on these drawings without inventive work by a person skilled in the art to which the present invention pertains (hereinafter "person skilled in the art"). FIG. 1 schematically illustrates an object motion estimation device that estimates the motion of an object based on a vision sensor according to an embodiment of the present invention, and FIG. 2 schematically illustrates the configuration of a deep learning network of an object motion estimation device that estimates the motion of an object based on a vision sensor according to an embodiment of the present invention, and FIG. 3 schematically illustrates a method for estimating the motion of an object based on a vision sensor according to an embodiment of the present invention, and FIG. 4 schematically illustrates the detailed process of data preprocessing and generating Z-t predicted object motion data in a method for estimating the motion of an object based on a vision sensor according to an embodiment of the present invention. FIG. 5 illustrates, by way of example, a deep learning model used for motion estimation of an object in a method for estimating the motion of