US-12620203-B2 - Data collection and classifier training in edge video devices

US12620203B2US 12620203 B2US12620203 B2US 12620203B2US-12620203-B2

Abstract

A digital video camera architecture for updating an object identification and tracking model deployed with the camera is disclosed. The invention comprises optics, a processor, a memory, and an artificial intelligence logic which may further comprise artificial neural networks. The architecture may identify objects according to the confidence threshold of a model. The confidence threshold may be monitored over time, and the model may be updated if the confidence threshold drops below an acceptable level. The data for retraining is ideally generated substantially internal to the camera. A classifier is generated to process the entire field data set stored on the camera to create a field data subset also stored on the camera. The field data subset may be run through the model to generate cases that may be used in further monitoring, training, and updating of the model. Classifiers may also be generated for images in different domains (e.g., lighting, weather, surveillance area, indoor, outdoor, urban, rural, etc.). These classifiers can be used to train the model to accurately identify objects and features independent of the domain of origin of the image being evaluated.

Inventors

Damien Kah
Qian Zhong
Shaomin Xiong
Toshiki Hirano

Assignees

WESTERN DIGITAL TECHNOLOGIES, INC.

Dates

Publication Date: 20260505
Application Date: 20220412

Claims (20)

1 . A device, comprising: an artificial intelligence logic configured to: evaluate a source domain validation data set with a first detector, use annotations of an output of the first detector to generate a source domain surrogate data set, train a source domain classifier with the source domain surrogate data set, evaluate a source domain entire field data set with the source domain classifier to generate a source domain field data subset, evaluate the source domain field data subset with the first detector to generate a source domain final field data subset, update the source domain validation data set from the source domain final field data subset, and train, using the source domain classifier, a target domain classifier to: minimize an error function at an output of a label classification head of the target domain classifier, and maximize the error function at an output of a domain classification head of the target domain classifier to filter features that are unique to the source domain classifier.
2 . The device of claim 1 , wherein: the target domain classifier comprises a feature extractor derived from a first portion of the source domain classifier; the label classification head is derived from a second portion of the source domain classifier; and the domain classification head is coupled in parallel to the label classification head.
3 . The device of claim 2 , wherein: the artificial intelligence logic is further configured to: extract first features from the source domain validation data set using the feature extractor, extract second features from a target domain validation data set using the feature extractor, run the first features through both the label classification head and the domain classification head, and run the second features only through the domain classification head.
4 . The device of claim 3 , wherein: the artificial intelligence logic is further configured to: apply positive feedback to the output of the label classification head, and apply negative feedback to the output of the domain classification head.
5 . The device of claim 4 , wherein the negative feedback is adversarial training.
6 . The device of claim 4 , wherein the positive feedback is backpropagation to reinforce features similar to the source domain classifier.
7 . The device of claim 4 , wherein: the artificial intelligence logic is further configured to: train the target domain classifier to: label objects in images with substantially the same accuracy at the output of the label classification head with images from either the source domain validation data set or the target domain validation data set, update a main validation data set from the source domain validation data set, update the main validation data set from the target domain validation data set, and train a second detector using the updated main validation data set.
8 . The device of claim 1 , further comprising: an optics module.
9 . A system, comprising: a computational system configured to: evaluate a source domain validation data set with a first detector, use annotation of an output of the first detector to generate a source domain surrogate data set, and train a source domain classifier with the source domain surrogate data set; and a device, comprising: an artificial intelligence logic configured to: run a source domain entire field data set through the source domain classifier to generate a source domain field data subset, evaluate the source domain field data subset with the first detector to generate a source domain final field data subset, update the source domain validation data set from the source domain final field data subset, and train, using the source domain classifier, a target domain classifier to: minimize an error function at an output of a label classification head of the target domain classifier, and maximize the error function at an output of a domain classification head of the target domain classifier to filter features that are unique to the source domain classifier.
10 . The system of claim 9 , further comprising: an optics module.
11 . The system of claim 9 , wherein: the target domain classifier comprises a feature extractor derived from a first portion of the source domain classifier; the label classification head is derived from a second portion of the source domain classifier; and the domain classification head is coupled in parallel to the label classification head.
12 . The system of claim 11 , wherein: the artificial intelligence logic is further configured to: extract first features from the source domain validation data set using the feature extractor, extract second features from a target domain validation data set using the feature extractor, run the first features through both the label classification head and the domain classification head, and run the second features only through the domain classification head.
13 . The system of claim 12 , wherein: the artificial intelligence logic is further configured to: train the target domain classifier to: label objects in images with substantially the same accuracy at the output of the label classification head with images from either the source domain validation data set or the target domain validation data set, update a main validation data set from the source domain validation data set, update the main validation data set from the target domain validation data set, and train a second detector using the updated main validation data set.
14 . The system of claim 12 , wherein: the artificial intelligence logic is further configured to: apply positive feedback to the output of the label classification head, and apply negative feedback to the output of the domain classification head.
15 . The system of claim 14 , wherein the positive feedback is backpropagation to reinforce features similar to the source domain classifier.
16 . The system of claim 14 , wherein the negative feedback is adversarial training.
17 . A system, comprising: means for evaluating a source domain validation data set with a first detector; means for using annotations of an output of the first detector to generate a source domain surrogate data set; means for training a source domain classifier with the source domain surrogate data set; means for running a source domain entire field data set through the source domain classifier to generate a source domain field data subset; means for evaluating the source domain field data subset with the first detector to generate a source domain final field data subset; means for updating the source domain validation data set from the source domain final field data subset; and means for training, using the source domain classifier, a target domain classifier to: minimize an error function at an output of a label classification head of the target domain classifier, and maximize the error function at an output of a domain classification head of the target domain classifier to filter features that are unique to the source domain classifier, wherein; the target domain classifier comprises a feature extractor derived from a first portion of the source domain classifier, the label classification head is derived from a second portion of the source domain classifier, and the domain classification head is coupled in parallel to the label classification head.
18 . The system of claim 17 , further comprising: means for extracting first features from the source domain validation data set using the feature extractor; means for extracting second features from a target domain validation data set using the feature extractor; means for running the first features through both the label classification head and the domain classification head; means for running the second features only through the domain classification head; means for applying positive feedback to the output of the label classification head; and means for applying negative feedback to the output of the domain classification head.
19 . The system of claim 18 , further comprising: means for updating a main validation data set from the source domain validation data set; means for updating the main validation data set from the target domain validation data set; and means for training a second detector using the updated main validation data set.
20 . The system of claim 18 , further comprising: a digital video camera.

Description

RELATED APPLICATIONS This application is related to application Ser. Nos. 17/515,977 and 17/516,188, both of which were filed Nov. 1, 2021 and owned by the same assigneee, and are incorporated herein by reference in its entirety. FIELD The present disclosure relates to image processing. More particularly, the present disclosure technically relates to efficient training and retraining of artificial neural networks in video data processing in edge video devices. BACKGROUND As technology has grown over the last decade, the quantity of time-series data such as video content has increased dramatically. This increase in time-series data has generated a greater demand for automatic object identification and classification. In response, neural networks and other artificial intelligence methods have been increasingly utilized to generate automatic classifications, specific detections, and segmentations. In the case of video processing, computer vision trends have progressively focused on object detection, image classification, and other segmentation tasks to parse semantic meaning from video content. In particular, there is a need to improve the models used for object detection. It is desirable for this process to be automated as much as possible and to be as independent of capture conditions as possible. BRIEF DESCRIPTION OF DRAWINGS The above and other aspects, features, and advantages of several embodiments of the present disclosure will be more apparent from the following description as presented in conjunction with the following several figures of the drawings. FIG. 1 is a conceptual illustration of a video image data processing system in accordance with an embodiment of the disclosure; FIG. 2 is a conceptual illustration of an artificial neural network in accordance with an embodiment of the disclosure; FIG. 3 is a schematic block diagram of a digital video camera in accordance with an embodiment of the disclosure; FIG. 4 is a schematic block diagram of a video processing and storage system in accordance with an embodiment of the disclosure; FIG. 5 is a graph of a Precision and Recall (PR) curve in accordance with an embodiment of the disclosure; FIG. 6 is a graph of machine learning model decay in accordance with an embodiment of the disclosure; FIG. 7A is a conceptual diagram of a video processing and storage system in accordance with an embodiment of the disclosure; FIG. 7B is a conceptual diagram of annotating a frame for a surrogate data set in accordance with an embodiment of the disclosure; FIG. 8 is a flowchart of a process for collecting data and retraining a model in accordance with an embodiment of the disclosure; FIG. 9 is a comparison of two similar pictures illustrating the effects of picture quality on image recognition in accordance with an embodiment of the disclosure; FIG. 10 is a conceptual block diagram of different groups of cameras in accordance with an embodiment of the disclosure; FIG. 11 is a conceptual diagram of the latent space encoding of two sorts of auto-encoders in accordance with an embodiment of the disclosure; FIG. 12A is a conceptual block diagram of a source domain classifier in accordance with an embodiment of the disclosure; FIG. 12B is a conceptual block diagram of a target domain classifier in accordance with an embodiment of the disclosure; FIG. 12C is a conceptual block diagram of the training of a target domain classifier in accordance with an embodiment of the disclosure; FIG. 13A is the first portion of a conceptual diagram of a video processing and storage computational system in accordance with an embodiment of the disclosure; FIG. 13B is the second portion of a conceptual diagram of a video processing and storage computational system in accordance with an embodiment of the disclosure; FIG. 14A is a first portion of a flowchart of a process for training a classifier in accordance with an embodiment of the disclosure; FIG. 14B is a second portion of a flowchart of a process for training a classifier in accordance with an embodiment of the disclosure; and FIG. 15 is a flowchart of a process for training a classifier in accordance with an embodiment of the disclosure. Corresponding reference characters indicate corresponding components throughout the several figures of the drawings. Elements in the several figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures might be emphasized relative to other elements for facilitating understanding of the various presently disclosed embodiments. In addition, common but well-understood elements that are useful or necessary in a commercially feasible embodiment are often not depicted in order to facilitate a less obstructed view of these various embodiments of the present disclosure. DETAILED DESCRIPTION In response to the issues described above, systems and methods are discussed herein that describe processes for streamli