US-20260127752-A1 - System and method for search space reduction for identifying an item

US20260127752A1US 20260127752 A1US20260127752 A1US 20260127752A1US-20260127752-A1

Abstract

A plurality of images are captured of the first item and a plurality of cropped images are generated based on the images. For each cropped image, a first encoded vector is generated and compared to encoded vectors in an encoded vector library that are tagged as a front image. Based on the comparison, a second encoded vector is selected from the encoded vector library that most closely matches with the first encoded vector. An item identifier is identified that is associated with the second encoded vector. A particular item identifier is selected that is identified for a particular cropped image.

Inventors

Sumedh Vilas Datar
Sailesh Bharathwaaj Krishnamurthy
Shashipal Reddy Masini

Assignees

7-ELEVEN, INC.

Dates

Publication Date: 20260507
Application Date: 20251218

Claims (20)

1 . An item tracking system, comprising: a plurality of cameras, wherein each camera is configured to capture images of at least a portion of a platform; and a memory configured to store: an encoded vector library comprising: a plurality of encoded vectors; each encoded vector describes one or more attributes of a particular item and is associated with an item identifier for the particular item; and each encoded vector is tagged as a front image or a back image of a particular item; and one or more processors communicatively coupled to the memory, and configured to: capture a plurality of images of the first item on the platform using two or more cameras of the plurality of cameras; generate a cropped image of the first item based on each of the images of the first item by editing the image to isolate at least a portion of the first item, wherein the cropped images correspond to the first item depicted in the respective images; for each cropped image: generate a first encoded vector for the cropped image, wherein the first encoded vector describes one or more attributes of the first item based on the cropped image; compare the first encoded vector to the encoded vectors in the encoded vector library tagged as a front image; select, based on the comparison, a second encoded vector from the encoded vector library that most closely matches with the first encoded vector, wherein a numerical similarity value indicates a degree of similarity between the first encoded vector and the selected second encoded vector; and identify an item identifier in the encoded vector library that is associated with the second encoded vector; select a particular item identifier that is identified for a particular cropped image; associate the particular item identifier with the first item; and display an indicator of the particular item identifier on a user interface device.
2 . The item tracking system of claim 1 , wherein each of the plurality of images of the first item is captured by a different camera.
3 . The item tracking system of claim 1 , wherein the one or more processors are further configured to: determine a first set of item identifiers from a plurality of item identifiers that are identified for a respective plurality of the cropped images, wherein each item identifier from the first set of item identifiers is identified for a respective cropped image based on a similarity value that equals or exceeds a threshold similarity value; and determine whether a same item identifier from the first set of item identifiers is identified for a majority of the respective cropped images.
4 . The item tracking system of claim 3 , wherein the one or more processors are further configured to select the particular item identifier by: determining that the same item identifier from the first set of item identifiers is identified for the majority of the respective cropped images; and in response to determining that the same item identifier from the first set of item identifiers is identified for the majority of the respective cropped images, selecting the same item identifier as the particular item identifier for association with the first item.
5 . The item tracking system of claim 3 , wherein the one or more processors are further configured to: determine that the same item identifier from the first set of item identifiers is not identified for a majority of the respective cropped images; and in response to determining that the same item identifier from the first set of item identifiers is not identified for the majority of the respective cropped images: determine a second item identifier from the first set of item identifiers that was identified based on a highest similarity value among the similarity values corresponding to the item identifiers in the first set of item identifiers; determine a third item identifier from the first set of item identifiers that was identified based on a next highest similarity value among the similarity values corresponding to the item identifiers in the first set of item identifiers; and determine whether a difference between the highest similarity value and the next highest similarity value equals or exceeds a threshold.
6 . The item tracking system of claim 5 , wherein the one or more processors are further configured to select the particular item identifier by: determining that the difference between the highest similarity value and the next highest similarity value equals or exceeds a threshold; and in response to determining that the difference between the highest similarity value and the next highest similarity value equals or exceeds a threshold, selecting the second item identifier as the particular item identifier for association with the first item.
7 . The item tracking system of claim 5 , wherein the one or more processors are further configured to select the particular item identifier by: determining that the difference between the highest similarity value and the next highest similarity value does not equal or exceed a threshold; and in response to determining that the difference between the highest similarity value and the next highest similarity value does not equal or exceed a threshold: displaying, on the user interface device, a plurality of different item identifiers from the first set of item identifiers; receiving, from the user interface device, a user selection of a third item identifier from the plurality of different item identifiers; and assigning the third item identifier as the particular item identifier for association with the first item.
8 . The item tracking system of claim 1 , wherein: each encoded vector from the encoded vector library is associated with a particular cropped image; and the one or more processors are further configured to: input the particular cropped image associated with each encoded vector from the encoded vector library into a machine learning model, wherein the machine learning model is configured to output whether the particular cropped image is a back image of an item or a front image of an item; obtain the output from the machine learning model indicating whether the particular cropped image is a back image of an item or a front image of the item; and tag the encoded vector in the encoded vector library as a back image or a front image based on the output of the machine learning model.
9 . A method for identifying an item, comprising: capturing a plurality of images of the first item on the platform using two or more cameras of a plurality of cameras, wherein each camera is configured to capture images of at least a portion of a platform; generating a cropped image of the first item based on each of the images of the first item by editing the image to isolate at least a portion of the first item, wherein the cropped images correspond to the first item depicted in the respective images; for each cropped image: generating a first encoded vector for the cropped image, wherein the first encoded vector describes one or more attributes of the first item based on the cropped image; comparing the first encoded vector to encoded vectors in an encoded vector library tagged as a front image, wherein the encoded vector library comprises: a plurality of encoded vectors; each encoded vector describes one or more attributes of a particular item and is associated with an item identifier for the particular item; and each encoded vector is tagged as a front image or a back image of a particular item; selecting, based on the comparison, a second encoded vector from the encoded vector library that most closely matches with the first encoded vector, wherein a numerical similarity value indicates a degree of similarity between the first encoded vector and the selected second encoded vector; and identifying an item identifier in the encoded vector library that is associated with the second encoded vector; selecting a particular item identifier that is identified for a particular cropped image; associating the particular item identifier with the first item; and displaying an indicator of the particular item identifier on a user interface device.
10 . The method of claim 9 , wherein each of the plurality of images of the first item is captured by a different camera.
11 . The method of claim 9 , further comprising: determining a first set of item identifiers from a plurality of item identifiers that are identified for a respective plurality of the cropped images, wherein each item identifier from the first set of item identifiers is identified for a respective cropped image based on a similarity value that equals or exceeds a threshold similarity value; and determining whether a same item identifier from the first set of item identifiers is identified for a majority of the respective cropped images.
12 . The method of claim 11 , wherein selecting the particular item identifier comprises: determining that the same item identifier from the first set of item identifiers is identified for the majority of the respective cropped images; and in response to determining that the same item identifier from the first set of item identifiers is identified for the majority of the respective cropped images, selecting the same item identifier as the particular item identifier for association with the first item.
13 . The method of claim 11 , further comprising: determining that the same item identifier from the first set of item identifiers is not identified for a majority of the respective cropped images; and in response to determining that the same item identifier from the first set of item identifiers is not identified for the majority of the respective cropped images: determining a second item identifier from the first set of item identifiers that was identified based on a highest similarity value among the similarity values corresponding to the item identifiers in the first set of item identifiers; determining a third item identifier from the first set of item identifiers that was identified based on a next highest similarity value among the similarity values corresponding to the item identifiers in the first set of item identifiers; and determining whether a difference between the highest similarity value and the next highest similarity value equals or exceeds a threshold.
14 . The method of claim 13 , wherein selecting the particular item identifier comprises: determining that the difference between the highest similarity value and the next highest similarity value equals or exceeds a threshold; and in response to determining that the difference between the highest similarity value and the next highest similarity value equals or exceeds a threshold, selecting the second item identifier as the particular item identifier for association with the first item.
15 . The method of claim 13 , wherein selecting the particular item identifier comprises: determining that the difference between the highest similarity value and the next highest similarity value does not equal or exceed a threshold; and in response to determining that the difference between the highest similarity value and the next highest similarity value does not equal or exceed a threshold: displaying, on the user interface device, a plurality of different item identifiers from the first set of item identifiers; receiving, from the user interface device, a user selection of a third item identifier from the plurality of different item identifiers; and assigning the third item identifier as the particular item identifier for association with the first item.
16 . The method of claim 9 , wherein: each encoded vector from the encoded vector library is associated with a particular cropped image; and further comprising: inputting the particular cropped image associated with each encoded vector from the encoded vector library into a machine learning model, wherein the machine learning model is configured to output whether the particular cropped image is a back image of an item or a front image of an item; obtaining the output from the machine learning model indicating whether the particular cropped image is a back image of an item or a front image of the item; and tagging the encoded vector in the encoded vector library as a back image or a front image based on the output of the machine learning model.
17 . A non-transitory computer-readable medium storing instructions that when executed by one or more processors cause the processors to: capture a plurality of images of the first item on the platform using two or more cameras of a plurality of cameras, wherein each camera is configured to capture images of at least a portion of a platform; generate a cropped image of the first item based on each of the images of the first item by editing the image to isolate at least a portion of the first item, wherein the cropped images correspond to the first item depicted in the respective images; for each cropped image: generate a first encoded vector for the cropped image, wherein the first encoded vector describes one or more attributes of the first item based on the cropped image; compare the first encoded vector to encoded vectors in an encoded vector library tagged as a front image, wherein the encoded vector library comprises: a plurality of encoded vectors; each encoded vector describes one or more attributes of a particular item and is associated with an item identifier for the particular item; and each encoded vector is tagged as a front image or a back image of a particular item; select, based on the comparison, a second encoded vector from the encoded vector library that most closely matches with the first encoded vector, wherein a numerical similarity value indicates a degree of similarity between the first encoded vector and the selected second encoded vector; and identify an item identifier in the encoded vector library that is associated with the second encoded vector; select a particular item identifier that is identified for a particular cropped image; associate the particular item identifier with the first item; and display an indicator of the particular item identifier on a user interface device.
18 . The non-transitory computer-readable medium of claim 17 , wherein each of the plurality of images of the first item is captured by a different camera.
19 . The non-transitory computer-readable medium of claim 17 , wherein the instructions further cause the one or more processors to: determine a first set of item identifiers from a plurality of item identifiers that are identified for a respective plurality of the cropped images, wherein each item identifier from the first set of item identifiers is identified for a respective cropped image based on a similarity value that equals or exceeds a threshold similarity value; and determine whether a same item identifier from the first set of item identifiers is identified for a majority of the respective cropped images.
20 . The non-transitory computer-readable medium of claim 19 , wherein selecting the particular item identifier comprises: determining that the same item identifier from the first set of item identifiers is identified for the majority of the respective cropped images; and in response to determining that the same item identifier from the first set of item identifiers is identified for the majority of the respective cropped images, selecting the same item identifier as the particular item identifier for association with the first item.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/476,511 filed Sep. 28, 2023, entitled “SYSTEM AND METHOD FOR SEARCH SPACE REDUCTION FOR IDENTIFYING AN ITEM, which is a continuation-in-part of U.S. patent application Ser. No. 18/366,155 filed on Aug. 7, 2023, entitled “SYSTEM AND METHOD FOR IDENTIFYING A SECOND ITEM BASED ON AN ASSOCIATION WITH A FIRST ITEM′, which is a continuation-in-part of U.S. patent application Ser. No. 17/455,903 filed on Nov. 19, 2021, entitled “ITEM LOCATION DETECTION USING HOMOGRAPHIES,” now U.S. Pat. No. 12,217,441 issued Feb. 4, 2025, which is a continuation-in-part of U.S. patent application Ser. No. 17/362,261 filed Jun. 29, 2021, entitled “ITEM IDENTIFICATION USING DIGITAL IMAGE PROCESSING,” now U.S. Pat. No. 11,887,332 issued Jan. 30, 2024 which are all incorporated herein by reference. TECHNICAL FIELD The present disclosure relates generally to digital image processing, and more specifically to a system and method for search space reduction for identifying an item. BACKGROUND Identifying and tracking objects within a space poses several technical challenges. For example, identifying different features of an item that can be used to later identify the item in an image is computationally intensive when the image includes several items. This process may involve identifying an individual item within the image and then comparing the features for an item against every item in a database that may contain thousands of items. In addition to being computationally intensive, this process requires a significant amount of time which means that this process is not compatible with real-time applications. This problem becomes intractable when trying to simultaneously identify and track multiple items. SUMMARY The system disclosed in the present application provides a technical solution to the technical problems discussed above by using a combination of cameras and three-dimensional (3D) sensors to identify and track items that are placed on a platform. The disclosed system provides several practical applications and technical advantages which include a process for selecting a combination of cameras on an imaging device to capture images of items that are placed on a platform, identifying the items that are placed on the platform, and assigning the items to a user. Requiring a user to scan or manually identify items creates a bottleneck in the system's ability to quickly identify items. In contrast, the disclosed process is able to identify items from images of the items and assign the items to a user without requiring the user to scan or otherwise identify the items. This process provides a practical application of image detection and tracking by improving the system's ability to quickly identify multiple items. These practical applications not only improve the system's ability to identify items but also improve the underlying network and the devices within the network. For example, this disclosed process allows the system to service a larger number of users by reducing the amount of time that it takes to identify items and assign items to a user, while improving the throughput of image detection processing. In other words, this process improves hardware utilization without requiring additional hardware resources which increases the number of hardware resources that are available for other processes and increases the throughput of the system. Additionally, these technical improvements allow for scaling of the item identification and tracking functionality described herein. In one embodiment, the item tracking system comprises an item tracking device that is configured to detect a triggering event at a platform of an imaging device. The triggering event may correspond with when a user approaches or interacts with the imaging device by placing items on the platform. The item tracking device is configured to capture a depth image of items on the platform using a 3D sensor and to determine an object pose for each item on the platform based on the depth image. The pose corresponds with the location and the orientation of an item with respect to the platform. The item tracking device is further configured to identify one or more cameras from among a plurality of cameras on the imaging device based on the object pose for each item on the platform. This process allows the item tracking device to select the cameras with the best views of the items on the platform which reduces the number of images that are processed to identify the items. The item tracking device is further configured to capture images of the items on the platform using the identified cameras and to identify the items within the images based on features of the items. The item tracking device is further configured to identify a user associated with the identified items on the platform, to identify an account that is associated with the user, and to add the items to the