US-12620228-B2 - System and method for space search reduction in identifying items from images via item height
Abstract
A device detects a triggering event that corresponds to a placement of a first item on a platform. In response, the device captures an image of the first item and generates a first encoded vector for the image. The first encoded vector describes one or more attributes of the first item. The device determines a height of the first item. The device identifies one or more items in an encoded vector library that are associated with average heights within a threshold range from the determined height of the first item. The device compares the first encoded vector with a second encoded vector associated with a second item from among the one or more items. The device determines that the first encoded vector corresponds to the second encoded vector. In response, the device determines that the first item corresponds to the second item.
Inventors
- Crystal Maung
- Sailesh Bharathwaaj Krishnamurthy
- Nithya Thyagarajan
- Hiranya Garbha Kumar
Assignees
- 7-ELEVEN, INC.
Dates
- Publication Date
- 20260505
- Application Date
- 20230928
Claims (20)
- 1 . An object tracking system, comprising: a plurality of cameras, wherein each camera is configured to capture images of at least a portion of a platform; a memory configured to store an encoded vector library comprising a plurality of encoded vectors, wherein: each encoded vector describes one or more attributes of a respective item; and each encoded vector is associated with a respective average height and a standard deviation from the respective average height associated with the respective item; and one or more processors communicatively coupled to the memory, and configured to: detect a triggering event at the platform, wherein the triggering event corresponds to a placement of a first item on the platform; in response to detecting the triggering event, capture an image of the first item using a camera from among the plurality of cameras; generate a first encoded vector for the image, wherein the first encoded vector describes one or more attributes of the first item; determine a height associated with the first item; identify one or more items in the encoded vector library that are associated with average heights within a threshold range from the determined height of the first item; compare the first encoded vector with a second encoded vector associated with a second item from among the one or more items; determine that the first encoded vector corresponds to the second encoded vector; and in response to determining that the first encoded vector corresponds to the second encoded vector, determine that the first item corresponds to the second item.
- 2 . The object tracking system of claim 1 , wherein the one or more processors are further configured to add the first item to a virtual shopping cart associated with a user.
- 3 . The object tracking system of claim 1 , wherein to determine the height associated with the first item, the one or more processors are further configured to: determine a first distance between the camera and a surface on top of the first item; determine a second distance between the camera and the platform; and determine a difference between the first distance and the second distance, wherein the height associated with the first item corresponds to the difference between the first distance and the second distance.
- 4 . The object tracking system of claim 1 , wherein for the respective item, the respective average height is determined by: capturing a plurality of images of the respective item placed on different parts of the platform, wherein each of the plurality of images shows the respective item placed on a different part of the platform; determining a plurality of heights of the respective item, wherein each of the plurality of heights is determined based at least in part upon a respective distance between the camera and a top surface of the respective item; and determining an average of the plurality of heights, wherein the average of the plurality of heights corresponds to the respective average height of the respective item.
- 5 . The object tracking system of claim 1 , wherein the standard deviation corresponds to the threshold range from the determined height of the first item.
- 6 . The object tracking system of claim 1 , wherein to determine that the first encoded vector corresponds to the second encoded vector, the one or more processors are further configured to: identify a first set of attributes associated with the first item, wherein the first set of attributes is indicated in the first encoded vector; identify a second set of attributes associated with the second item, wherein the second set of attributes is indicated in the second encoded vector; compare each attribute from among the first set of attributes with a counterpart attribute from among the second set of attributes; and determine that more than a threshold percentage of the first set of attributes correspond to counterpart attributes from among the second set of attributes.
- 7 . The object tracking system of claim 1 , wherein to determine that the first encoded vector corresponds to the second encoded vector, the one or more processors are further configured to: identify a first set of attributes associated with the first item, wherein the first set of attributes is indicated in the first encoded vector; identify a second set of attributes associated with the second item, wherein the second set of attributes is indicated in the second encoded vector; compare each attribute from among the first set of attributes with a counterpart attribute from among the second set of attributes; and determine that more than a threshold percentage of the first set of attributes correspond to counterpart attributes from among the second set of attributes.
- 8 . A method comprising: detecting a triggering event at a platform, wherein the triggering event corresponds to a placement of a first item on the platform; in response to detecting the triggering event, capturing an image of the first item using a camera from among a plurality of cameras wherein each camera is configured to capture images of at least a portion of a platform; generating a first encoded vector for the image, wherein the first encoded vector describes one or more attributes of the first item; determining a height associated with the first item; identifying one or more items in an encoded vector library that are associated with average heights within a threshold range from the determined height of the first item, wherein: the encoded vector library comprises a plurality of encoded vectors; each encoded vector describes one or more attributes of a respective item; and each encoded vector is associated with a respective average height and a standard deviation from the respective average height associated with the respective item; comparing the first encoded vector with a second encoded vector associated with a second item from among the one or more items; determining that the first encoded vector corresponds to the second encoded vector; and in response to determining that the first encoded vector corresponds to the second encoded vector, determining that the first item corresponds to the second item.
- 9 . The method of claim 8 , further comprising adding the first item to a virtual shopping cart associated with a user.
- 10 . The method of claim 8 , wherein determining the height associated with the first item is in response to: determining a first distance between the camera and a surface on top of the first item; determining a second distance between the camera and the platform; and determining a difference between the first distance and the second distance, wherein the height associated with the first item corresponds to the difference between the first distance and the second distance.
- 11 . The method of claim 8 , wherein for the respective item, the respective average height is determined by: capturing a plurality of images of the respective item placed on different parts of the platform, wherein each of the plurality of images shows the respective item placed on a different part of the platform; determining a plurality of heights of the respective item, wherein each of the plurality of heights is determined based at least in part upon a respective distance between the camera and a top surface of the respective item; and determining an average of the plurality of heights, wherein the average of the plurality of heights corresponds to the respective average height of the respective item.
- 12 . The method of claim 8 , wherein the standard deviation corresponds to the threshold range from the determined height of the first item.
- 13 . The method of claim 8 , wherein determining that the first encoded vector corresponds to the second encoded vector is in response to: identifying a first set of attributes associated with the first item, wherein the first set of attributes is indicated in the first encoded vector; identifying a second set of attributes associated with the second item, wherein the second set of attributes is indicated in the second encoded vector; comparing each attribute from among the first set of attributes with a counterpart attribute from among the second set of attributes; and determining that more than a threshold percentage of the first set of attributes correspond to counterpart attributes from among the second set of attributes.
- 14 . The method of claim 8 , wherein: the image shows a top-view of the first item; and the camera is a top-view camera placed above the platform.
- 15 . A non-transitory computer-readable medium storing instructions that when executed by one or more processors, cause the one or more processors to: detect a triggering event at a platform, wherein the triggering event corresponds to a placement of a first item on the platform; in response to detecting the triggering event, capture an image of the first item using a camera from among a plurality of cameras wherein each camera is configured to capture images of at least a portion of a platform; generate a first encoded vector for the image, wherein the first encoded vector describes one or more attributes of the first item; determine a height associated with the first item; identify one or more items in an encoded vector library that are associated with average heights within a threshold range from the determined height of the first item, wherein: the encoded vector library comprises a plurality of encoded vectors; each encoded vector describes one or more attributes of a respective item; and each encoded vector is associated with a respective average height and a standard deviation from the respective average height associated with the respective item; comparing the first encoded vector with a second encoded vector associated with a second item from among the one or more items; determine that the first encoded vector corresponds to the second encoded vector; and in response to determining that the first encoded vector corresponds to the second encoded vector, determine that the first item corresponds to the second item.
- 16 . The non-transitory computer-readable medium of claim 15 , wherein the instructions further cause the one or more processors to add the first item to a virtual shopping cart associated with a user.
- 17 . The non-transitory computer-readable medium of claim 15 , wherein to determine the height associated with the first item, the instructions further cause the one or more processors to: determine a first distance between the camera and a surface on top of the first item; determine a second distance between the camera and the platform; and determine a difference between the first distance and the second distance, wherein the height associated with the first item corresponds to the difference between the first distance and the second distance.
- 18 . The non-transitory computer-readable medium of claim 15 , wherein for the respective item, the respective average height is determined by: capturing a plurality of images of the respective item placed on different parts of the platform, wherein each of the plurality of images shows the respective item placed on a different part of the platform; determining a plurality of heights of the respective item, wherein each of the plurality of heights is determined based at least in part upon a respective distance between the camera and a top surface of the respective item; and determining an average of the plurality of heights, wherein the average of the plurality of heights corresponds to the respective average height of the respective item.
- 19 . The non-transitory computer-readable medium of claim 15 , wherein the standard deviation corresponds to the threshold range from the determined height of the first item.
- 20 . The non-transitory computer-readable medium of claim 15 , wherein to determine that the first encoded vector corresponds to the second encoded vector, the instructions further cause the one or more processors to: identify a first set of attributes associated with the first item, wherein the first set of attributes is indicated in the first encoded vector; identify a second set of attributes associated with the second item, wherein the second set of attributes is indicated in the second encoded vector; compare each attribute from among the first set of attributes with a counterpart attribute from among the second set of attributes; and determine that more than a threshold percentage of the first set of attributes correspond to counterpart attributes from among the second set of attributes.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation-in-part of U.S. patent application Ser. No. 18/366,155 filed on Aug. 7, 2023, entitled “SYSTEM AND METHOD FOR IDENTIFYING A SECOND ITEM BASED ON AN ASSOCIATION WITH A FIRST ITEM”, which is a continuation-in-part of U.S. patent application Ser. No. 17/455,903 filed on Nov. 19, 2021, entitled “ITEM LOCATION DETECTION USING HOMOGRAPHIES,” which is a continuation-in-part of U.S. patent application Ser. No. 17/362,261 filed Jun. 29, 2021, entitled “ITEM IDENTIFICATION USING DIGITAL IMAGE PROCESSING,” which are all incorporated herein by reference. TECHNICAL FIELD The present disclosure relates generally to digital image processing, and more specifically to a system and method for space search reduction in identifying items from images via item height. BACKGROUND Identifying and tracking objects within a space poses several technical challenges. For example, identifying different features of an item that can be used to later identify the item in an image is computationally intensive when the image includes several items. This process may involve identifying an individual item within the image and then comparing the features for an item against every item in a database that may contain thousands of items. In addition to being computationally intensive, this process requires a significant amount of time which means that this process is not compatible with real-time applications. This problem becomes intractable when trying to simultaneously identify and track multiple items. SUMMARY The system disclosed in the present application provides a technical solution to the technical problems discussed above by using a combination of cameras and three-dimensional (3D) sensors to identify and track items that are placed on a platform. The disclosed system provides several practical applications and technical advantages which include a process for selecting a combination of cameras on an imaging device to capture images of items that are placed on a platform, identifying the items that are placed on the platform, and assigning the items to a user. Requiring a user to scan or manually identify items creates a bottleneck in the system's ability to quickly identify items. In contrast, the disclosed process is able to identify items from images of the items and assign the items to a user without requiring the user to scan or otherwise identify the items. This process provides a practical application of image detection and tracking by improving the system's ability to quickly identify multiple items. These practical applications not only improve the system's ability to identify items but also improve the underlying network and the devices within the network. For example, this disclosed process allows the system to service a larger number of users by reducing the amount of time that it takes to identify items and assign items to a user, while improving the throughput of image detection processing. In other words, this process improves hardware utilization without requiring additional hardware resources which increases the number of hardware resources that are available for other processes and increases the throughput of the system. Additionally, these technical improvements allow for scaling of the item identification and tracking functionality described herein. In one embodiment, the item tracking system comprises an item tracking device that is configured to detect a triggering event at a platform of an imaging device. The triggering event may correspond with when a user approaches or interacts with the imaging device by placing items on the platform. The item tracking device is configured to capture a depth image of items on the platform using a 3D sensor and to determine an object pose for each item on the platform based on the depth image. The pose corresponds with the location and the orientation of an item with respect to the platform. The item tracking device is further configured to identify one or more cameras from among a plurality of cameras on the imaging device based on the object pose for each item on the platform. This process allows the item tracking device to select the cameras with the best views of the items on the platform which reduces the number of images that are processed to identify the items. The item tracking device is further configured to capture images of the items on the platform using the identified cameras and to identify the items within the images based on features of the items. The item tracking device is further configured to identify a user associated with the identified items on the platform, to identify an account that is associated with the user, and to add the items to the account that is associated with the user. In another embodiment, the item tracking system comprises an item tracking device that is configured to capture a first overhead depth image of the platform using a 3D sensor at a first time instance and