US-12626483-B2 - Reducing false positive identifications during video conferencing tracking and detection
Abstract
A method including detecting, in a digital image, a set of sub-images matching a selected object type. The method also includes generating a first confidence score that a first sub-image in set of sub-images matches a selected object type. The method also includes generating a second confidence score that a second sub-image in set of sub-images matches the selected object type. The method also includes generating a similarity measure by comparing the first sub-image to the second sub-image. The method also includes removing, responsive to the similarity measure exceeding a similarity threshold value and the first confidence score exceeding the second confidence score, the second sub-image from the set of sub-images. The method also includes processing, after removing, the digital image using the set of sub-images.
Inventors
- Raghavendra Balavalikar Krishnamurthy
- Rajen Bhatt
- David Bryan
- Yong Yan
Assignees
- HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P.
Dates
- Publication Date
- 20260512
- Application Date
- 20220502
Claims (12)
- 1 . A controller comprising: a processor coupled to a non-transitory computer readable storage medium, wherein the processor is to: execute an image processing controller to: detect, in a digital image of a video stream, that a plurality of sub-images correspond to a selected object type, and assign, to the plurality of sub-images, a plurality of confidence scores corresponding to the plurality of sub-images, wherein the plurality of confidence scores comprise measures that the plurality of sub-images are of the selected object type; execute a first filter to: block use of a first subset of the plurality of sub-images when modifying the video stream, the first subset comprising first ones of the plurality of sub-images having confidence scores below a confidence threshold value; execute a second filter to: delay, by a threshold time interval, use of a second subset of the plurality of sub-images when modifying the video stream, wherein the second subset comprises second ones of the plurality of sub-images detected before the threshold time interval; execute a third filter to: block use of a selected sub-image in the plurality of sub-images when modifying the video stream, wherein the selected sub-image is selected from one of a first sub-image having a first similarity score within a similarity threshold value of a second similarity score of a second sub-image, and wherein the selected sub-image comprises a lower confidence score in the plurality of confidence scores; and wherein a video controller is to: modify the video stream using the first filter, the second filter, and the third filter; wherein the selected object type comprises heads of people; wherein the video controller is further configured to modify the video stream by performing one of: i) zooming in on and framing of the heads after the first filter, ii) the second filter, and iii) the third filter; and wherein the first subset, the second subset, and the selected sub-image comprise reflections of the heads from a reflective object in the video stream.
- 2 . The controller of claim 1 , wherein the controller further comprises: a tracking and detection controller configured to recognize and track the heads in the video stream.
- 3 . The controller of claim 1 , further comprising: a communication device, in communication with the processor, and configured to receive the video stream.
- 4 . The controller of claim 1 , wherein the processor is to execute the second filter to: determine whether continual detection of newly detected sub-images exists in subsequent digital images of the video stream.
- 5 . The controller of claim 1 , wherein delay by the second filter of use of the second subset distinguishes between sub-images created from reflected objects and the sub-images created from physical objects based on detection consistency over time.
- 6 . The controller of claim 1 , wherein the processor is to execute the third filter to: generate a similarity matrix containing similarity measures between pairs of sub-images in the plurality of sub-images.
- 7 . The controller of claim 6 , wherein the processor is to execute the third filter to: identify, using the similarity matrix, the first sub-image having the first similarity score within the similarity threshold value of the second similarity score of the second sub-image.
- 8 . The controller of claim 7 , wherein the processor is to execute the third filter to: block the use of a selected sub-image in the plurality of sub-images when modifying the video stream.
- 9 . The controller of claim 8 , wherein the selected sub-image is selected from one of the first sub-image and the second sub-image.
- 10 . The controller of claim 8 , wherein the selected sub-image comprises a lower confidence score in the plurality of confidence scores.
- 11 . The controller of claim 1 , wherein the video controller is to modify the video stream using the first filter, the second filter, and the third filter to adjust a zoom level of the video stream to include only non-blocked sub-images of the plurality of sub-images.
- 12 . A controller comprising: a processor coupled to a non-transitory computer readable storage medium, wherein the processor is to: execute an image processing controller to: detect, in a digital image of a video stream, that a plurality of sub-images correspond to a selected object type, and assign, to the plurality of sub-images, a plurality of confidence scores corresponding to the plurality of sub-images, wherein the plurality of confidence scores comprise measures that the plurality of sub-images are of the selected object type; execute a first filter to: block use of a first subset of the plurality of sub-images when modifying the video stream, the first subset comprising first ones of the plurality of sub-images having confidence scores below a confidence threshold value; execute a second filter to: delay, by a threshold time interval, use of a second subset of the plurality of sub-images when modifying the video stream, wherein the second subset comprises second ones of the plurality of sub-images detected before the threshold time interval; and execute a third filter to: generate a similarity matrix containing similarity measures between pairs of sub-images in the plurality of sub-images, identify, using the similarity matrix, a first sub-image having a first similarity score within a similarity threshold value of a second similarity score of a second sub-image, and block use of a selected sub-image in the plurality of sub-images when modifying the video stream, wherein the selected sub-image is selected from one of the first sub-image having the first similarity score within the similarity threshold value of the second similarity score of the second sub-image, wherein the first sub-image and the second sub-image represent a physical object and a reflection of the physical object from a reflective surface in the digital image, wherein the selected sub-image comprises a lower confidence score in the plurality of confidence scores, and wherein a video controller is to: modify the video stream using the first filter, the second filter, and the third filter.
Description
BACKGROUND Video conferencing systems may use detection and tracking and detection software to identify sub-images of objects shown in an image or a video stream. However, the tracking and detection and detection software may undesirably detect a sub-image of a reflection of a person as a sub-image of a real person. Thus, for example, if a camera is capturing an image or a video stream of a conference room having a glass wall, glass window, or any reflective surface, then the tracking and detection and detection software undesirably may treat images of person's reflection in the glass as images of a real person. SUMMARY The one or more embodiments provide for a method. The method includes detecting, in a digital image, a set of sub-images matching a selected object type. The method also includes generating a first confidence score that a first sub-image in set of sub-images matches a selected object type. The method also includes generating a second confidence score that a second sub-image in set of sub-images matches the selected object type. The method also includes generating a similarity measure by comparing the first sub-image to the second sub-image. The method also includes removing, responsive to the similarity measure exceeding a similarity threshold value and the first confidence score exceeding the second confidence score, the second sub-image from the set of sub-images. The method also includes processing, after removing, the digital image using the set of sub-images. The one or more embodiments provide for another method. The method includes detecting, at a first time, a sub-image of an object matching an object type in a first digital image in video stream. The method also includes determining, based on detecting, whether continual detection of the sub-image of the object exists in digital images that are subsequent to the first digital image in the video stream. The method also includes blocking a use of the sub-image of the object for a modification of the video stream, at least until a second time has passed after the first time. The one or more embodiments also provide for a controller. The controller includes an image processing controller executable by a processor to detect, in a digital image of a video stream, that sub-images correspond to a selected object type. The image processing controller is also executable by the processor to assign, to sub-images, confidence scores corresponding to the sub-images. The confidence scores including measures that the sub-images are of the selected object type. The controller also includes a first filter executable by the processor to block use of a first subset of the sub-images when modifying the video stream. The first subset includes first ones of the sub-images having confidence scores below a confidence threshold value. The controller also includes a second filter executable by the processor to delay, by a threshold time interval, use of a second subset of the sub-images when modifying the video stream. The second subset includes second ones of the sub-images detected before the threshold time interval. The controller also includes a third filter executable by the processor to block use of a selected sub-image in the sub-images when modifying the video stream. The selected sub-image is selected from one of a first sub-image having a first similarity score within a similarity threshold value of second similarity score of a second sub image. The selected sub-image includes a lower confidence score in the confidence scores. The controller also includes a video controller configured to modify the video stream using the first filter, the second filter, and the third filter. Other aspects of the one or more embodiments will be apparent from the following description and the appended claims. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 shows a computing system, in accordance with one or more embodiments. FIG. 2 and FIG. 3 show flow diagrams illustrating a set of steps of a method for filtering a video stream, in accordance with one or more embodiments. FIG. 4, FIG. 5, and FIG. 6 show an example of filtering a video stream, in accordance with one or more embodiments. FIG. 7 is another method for filtering a video stream, in accordance with one or more embodiments. FIG. 8 and FIG. 9 show examples of matrices used with respect to filtering a video stream, in accordance with one or more embodiments. DETAILED DESCRIPTION In general, the one or more embodiments relate to filtering a video stream. In particular, the one or more embodiments are useful for preventing video software or image tracking and detection software from undesirably detecting an image of a reflection of a person as an image of a physical person. In an example, reflections of people off glass walls, windows, or other reflective surfaces may be common in an indoor video conferencing environment. Reflections also may be amplified depending on camera placement and lighting conditions. The reflectio