CN-117042927-B - Method and apparatus for optimizing a monocular vision-inertial positioning system

CN117042927BCN 117042927 BCN117042927 BCN 117042927BCN-117042927-B

Abstract

The methods and devices disclosed herein propose a method comprising capturing a plurality of optical data (1002) at respective locations within a portion of an environment by an optical sensor disposed on a device moving in the environment, capturing an encoder dataset (1004) corresponding to the plurality of optical data at the respective locations by a wheel encoder (258,602) disposed on the device, determining a first relative motion (1006) based on the plurality of optical data, determining a corresponding second relative motion (1008) based on the encoder dataset, and incrementing a counter indicative of a slip event of the wheel encoder (258,602) in accordance with determining that a difference between the first relative motion and the corresponding second relative motion is greater than a first threshold (1010), wherein the slip event corresponds to wheel advancement of the device and the corresponding second relative motion is less than a second threshold (1012).

Inventors

CHEN YI
HUANG KE
XI WEI

Assignees

美的集团股份有限公司

Dates

Publication Date: 20260505
Application Date: 20220114
Priority Date: 20210629

Claims (9)

1. A method, comprising: Capturing, by an optical sensor disposed on a device moving in an environment, a plurality of optical data at respective locations within a portion of the environment; Capturing, by a wheel encoder disposed on the device, an encoder dataset corresponding to the plurality of optical data at the respective locations; determining a first relative motion based on the plurality of optical data; determining a corresponding second relative motion based on the encoder dataset; in accordance with a determination that a difference between the first relative motion and the corresponding second relative motion is greater than a first threshold: Incrementing a counter indicative of a slip event of the wheel encoder, wherein the slip event corresponds to wheel advancement of the device and the corresponding second relative motion is less than a second threshold; the optical sensor includes a camera, the optical data including image frames captured by the camera; further comprising determining whether two adjacent ones of the image frames captured by the camera qualify as valid measurements prior to calculating the first relative motion; Further comprising obtaining wheel encoder readings between two adjacent ones of the image frames captured by the camera; determining a movement pattern based on the wheel encoder readings; Determining a feasible movement range according to the movement mode; in accordance with a determination that the first relative motion between the two adjacent frames is outside of the viable range of motion: making two adjacent frames of the image frames unqualified as valid measurements, and Additional image frames are captured by the camera until two adjacent ones of the captured image frames contain relative motion within the range of possible motion.
2. The method of claim 1, wherein the state of the device is set to a first state when the counter is above a second threshold, and the method comprises: when the state of the device is set to the first state, the plurality of optical data is excluded from further processing.
3. The method of claim 1, further comprising determining whether a jump in position of the device occurs between two adjacent ones of the image frames captured by the camera.
4. The method of claim 3, further comprising capturing additional image frames with the camera in accordance with a determination that the jump occurs until a position at which two adjacent ones of the captured image frames do not present the device before determining the first relative motion.
5. The method of claim 1, wherein the motion mode comprises one or more of a forward mode, a clockwise mode, a reverse mode, and a counter-clockwise mode.
6. The method of claim 1, wherein the optical sensor comprises an optical tracking sensor and determining the first relative motion comprises integrating measurements captured by the optical tracking sensor.
7. The method of claim 1, further comprising capturing, by a camera disposed on the device, a plurality of image frames corresponding to an optical data sequence and encoded data sets recorded at the respective locations within a portion of the environment.
8. An electronic device, comprising: one or more processing units; memory, and A plurality of programs stored in the memory, which when executed by the one or more processing units, cause the one or more processing units to perform the method of any of claims 1-7.
9. A non-transitory computer readable storage medium storing a plurality of programs for execution by an electronic device with one or more processing units, wherein the plurality of programs, when executed by the one or more processing units, cause the processing units to perform the method of any of claims 1-7.

Description

Method and apparatus for optimizing a monocular vision-inertial positioning system Technical Field The present disclosure relates generally to synchronous localization and mapping (SLAM) techniques in environments, and more particularly to systems and methods for characterizing a physical environment and using image data to locate a mobile robot relative to the environment of the mobile robot. Background Positioning, location recognition and environmental understanding enable mobile robots to become fully autonomous or semi-autonomous systems in the environment. Synchronous localization and mapping (SLAM) is a method of constructing a map of an environment while estimating the pose of a mobile robot in the environment (e.g., using the estimated pose of the mobile robot's camera). SLAM algorithms enable mobile robots to build a map of an unknown environment and determine their own location in the environment to perform tasks such as path planning and obstacle avoidance. Disclosure of Invention Monocular camera-based positioning techniques extract information, such as features (points and lines) or raw pixel values, from successive frames of the captured ambient environment to solve for relative pose (e.g., orientation and translation) between those frames by solving a 3D geometric problem using, for example, epipolar geometry constraints or view n-points. Since a single RGB camera cannot measure the depth of a scene (e.g., cannot directly measure the distance of objects captured in a camera frame), the distance from the associated feature to the camera center in two related frames is unknown when solving using epipolar geometry constraints. Using scale calibration, the solved translation between two related frames is valid until a certain scale is reached (e.g., the solved translation is valid after multiplying by an arbitrary scale). Prior to scale calibration, there is a scale ambiguity problem for monocular camera based positioning techniques. In some embodiments, the scale refers to the physical distance between two frame poses. In the absence of accurate scale estimation, monocular camera-based positioning methods may not provide accurate location information to their host devices. Thus, there is a particular need for more efficient methods and systems for providing scale information to visual data collected from monocular cameras. The methods and systems described herein do not involve formulating a factor graph-based optimization problem and using the relative pose changes between frames measured by the inertial measurement unit to solve the factor graph-based optimization problem. Thus, the methods and systems herein are less susceptible to numerical instability and have lower computational costs, with the result that more accurate positioning solutions and responses from mobile robots are faster. As disclosed herein, one solution relies on the use of monocular cameras, MEMS inertial sensors, wheel encoders, and optical flow sensors. Such a solution uses a multi-sensing scheme to cross check gestures from different odometer modules. The methods and systems also detect and reject data collected during wheel slip events, which may be common in mobile robotic applications involving traversing different terrain (e.g., carpets, wooden floors, tile floors, etc.). By so doing, the method and system effectively improve positioning accuracy by rejecting accumulated errors from wheel odometer data accumulated during a slip event. In addition, the back end of the algorithm for positioning and the accurately calculated pose from the various odometer sources also performs scale calibration and online optimization to recover and dynamically adjust the scale of the visual odometer, and asynchronously fuse the poses to obtain a robust and accurate pose estimate for the robot. The methods and systems described herein have several advantages. First, the described methods and systems are computationally efficient and stable, as the described methods and systems do not involve formulating factor-based optimization problems, which makes the described methods and systems better suited for real-time applications. Second, the scale (associated with the image recorded by the monocular camera) can be more accurately restored and dynamically adjusted. The overall positioning algorithm is made more adaptable to scenes with different proportions by excluding data measured during wheel slip events (e.g., data measured when the wheels recorded to the mobile robot are rotating, but the displacement of the mobile robot is substantially unchanged). Finally, the multi-sensing scheme used in the systems described herein allows for the replacement and deployment of different types of sensors, making the systems and methods flexible and scalable for different applications. The systems and methods described herein provide stable, accurate online scale calibration (e.g., performing online scale calibration while a mobile robot is operating