EP-4738281-A1 - METHODS, SYSTEMS, ARTICLES OF MANUFACTURE AND APPARATUS TO LABEL DATA

EP4738281A1EP 4738281 A1EP4738281 A1EP 4738281A1EP-4738281-A1

Abstract

Systems, apparatus, articles of manufacture, and methods are disclosed to label data. An example apparatus includes interface circuitry, machine-readable instructions, and at least one processor circuit to be programmed by the machine-readable instructions to separate label data in a first data set from portions of an image, generate candidate labeled data based on associated ones of unlabeled portions of the image and optical character recognition (OCR) data, generate key performance indicator (KPI) metric values based on a comparison between the candidate labeled data and a second data set, and adjust weights of a model based on the KPI metric values.

Inventors

MARTÍNEZ CEBRIÁN, Javier
MARTINEZ, Elena
CORRALES SÁNCHEZ, HÉCTOR
YEBES TORRES, JOSE JAVIER

Assignees

Nielsen Consumer LLC

Dates

Publication Date: 20260506
Application Date: 20251029

Claims (15)

An apparatus comprising: interface circuitry; machine-readable instructions; and at least one processor circuit to be programmed by the machine-readable instructions to: separate label data in a first data set from portions of an image; generate candidate labeled data based on associated ones of unlabeled portions of the image and optical character recognition (OCR) data; generate key performance indicator (KPI) metric values based on a comparison between the candidate labeled data and a second data set; and adjust weights of a model based on the KPI metric values.
The apparatus as defined in claim 1, wherein the machine-readable instructions are to cause one or more of the at least one processor circuit to generate the first data set and the second data set based on labeled image data associated with the image.
The apparatus as defined in claims 1 or 2, wherein the second data set retains the label data, the retained label data unmodified from an original format.
The apparatus as defined in claim 3, wherein the machine-readable instructions are to cause one or more of the at least one processor circuit to compare the candidate labeled data with the retained label data associated with the second data set.
The apparatus as defined in claims 1 or 2, wherein the machine-readable instructions are to cause one or more of the at least one processor circuit to: generate first polygons corresponding to the unlabeled portions; and generate second polygons corresponding to the OCR data.
The apparatus as defined in claim 5, wherein the machine-readable instructions are to cause one or more of the at least one processor circuit to identify the associated ones of the unlabeled portions and the OCR data based on respective intersections of the first polygons and the second polygons.
The apparatus as defined in claim 1, wherein the machine-readable instructions are to cause one or more of the at least one processor circuit to unlink label data in the first data set, the second data set including originally labeled data associated with the crops of the image.
The apparatus as defined in any of claims 1-7, wherein the model is a machine-learning model, and wherein the machine-readable instructions are to cause one or more of the at least one processor circuit to adjust weights of the machine-learning model.
The apparatus as defined in any of claims 1-8, wherein the portions of the image represent separate product images within the image.
At least machine-readable medium comprising machine-readable instructions to cause at least one processor circuit to at least: unlink label data in a first data set from portions of an image; generate candidate labeled portion data based on associated ones of unlabeled portions of the image and optical character recognition (OCR) data; generate key performance indicator (KPI) metric values based on a comparison between the candidate labeled portion data and a second data set; and adjust weights of a model based on the KPI metric values.
The at least one machine-readable medium as defined in claim 10, wherein the machine-readable instructions are to cause one or more of the at least one processor circuit to generate the first data set and the second data set based on labeled image data associated with the image.
The at least one machine-readable medium as defined in claims 10 or 11, wherein the machine-readable instructions are to cause one or more of the at least one processor circuit to retain the label data in an unmodified format.
The at least one machine-readable medium as defined in claim 12, wherein the machine-readable instructions are to cause one or more of the at least one processor circuit to compare the candidate labeled data with the retained label data associated with the second data set.
The at least one machine-readable medium as defined in claims 10 or 11, wherein the machine-readable instructions are to cause one or more of the at least one processor circuit to: generate first polygons corresponding to the unlabeled portions; and generate second polygons corresponding to the OCR data.
The at least one machine-readable medium as defined in claim 14, wherein the machine-readable instructions are to cause one or more of the at least one processor circuit to identify the associated ones of the unlabeled portions and the OCR data based on respective intersections of the first polygons and the second polygons.

Description

FIELD OF THE DISCLOSURE This disclosure relates generally to machine learning training and, more particularly, to methods, systems, articles of manufacture and apparatus to label data. BACKGROUND Artificial intelligence (AI) and machine learning (ML) techniques enable new insights to be learned from data sources. Industries that benefit from such techniques include medical initiatives to determine cancer related proteins, pharmaceutical treatment projections, consumer retail activities, and farming crop management. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram of an example environment in which example data label circuitry operates to generate labeled data.FIG. 2 is a block diagram of an example implementation of the data label circuitry of FIG. 1.FIG. 3 is a block diagram of an example distant supervision pipeline generated by the data label circuitry of FIGS. 1 and 2 to label data.FIG. 4 is a block diagram of an example active learning pipeline generated by the data label circuitry of FIGS. 1 and 2 to label data.FIG. 5 is a block diagram of an example merged pipeline generated by the data label circuitry of FIGS. 1 and 2 to label data.FIG. 6 is a block diagram of an example implementation of the data label circuitry of FIGS. 1 and 2 to label data.FIGS. 7-11 are flowcharts representative of example machine readable instructions and/or example operations that may be executed, instantiated, and/or performed by example programmable circuitry to implement the data label circuitry of FIGS. 1, 2 and 6.FIG. 12 is a block diagram of an example processing platform including programmable circuitry structured to execute, instantiate, and/or perform the example machine readable instructions and/or perform the example operations of FIGS. 7-11 to implement the data label circuitry of FIGS. 2 and 6.FIG. 13 is a block diagram of an example implementation of the programmable circuitry of FIG. 12.FIG. 14 is a block diagram of another example implementation of the programmable circuitry of FIG. 12.FIG. 15 is a block diagram of an example software/firmware/instructions distribution platform (e.g., one or more servers) to distribute software, instructions, and/or firmware (e.g., corresponding to the example machine readable instructions of FIGS. 7-11) to client devices associated with end users and/or consumers (e.g., for license, sale, and/or use), retailers (e.g., for sale, re-sale, license, and/or sub-license), and/or original equipment manufacturers (OEMs) (e.g., for inclusion in products to be distributed to, for example, retailers and/or to other end users such as direct buy customers). In general, the same reference numbers will be used throughout the drawing(s) and accompanying written description to refer to the same or like parts. The figures are not necessarily to scale. DETAILED DESCRIPTION Artificial Intelligence (AI) and machine learning (ML) are applicable to a broad landscape of use cases related to text and image understanding. Developing solutions to address use cases requires training data that is relevant to such use cases. Additionally, such relevant training data should have a sufficient number of data samples to permit AI/ML training operations to result in model tuning with fewer errors. Artificial intelligence (AI), including machine learning (ML), deep learning (DL), and/or other artificial machine-driven logic, enables machines (e.g., computers, logic circuits, etc.) to use a model to process input data to generate an output based on patterns and/or associations previously learned by the model via a training process. For instance, the model may be trained with data to recognize patterns and/or associations and follow such patterns and/or associations when processing input data such that other input(s) result in output(s) consistent with the recognized patterns and/or associations. Many different types of machine learning models and/or machine learning architectures exist. In some examples disclosed herein, self-supervised models, semi-supervised models, transformer models and distant supervision models are used. However, other types of machine learning models could additionally or alternatively be used. In general, implementing a ML/AI system involves two phases, a learning/training phase and an inference phase. In the learning/training phase, a training algorithm is used to train a model to operate in accordance with patterns and/or associations based on, for example, training data. In general, the model includes internal parameters that guide how input data is transformed into output data, such as through a series of nodes and connections within the model to transform input data into output data. Additionally, hyperparameters are used as part of the training process to control how the learning is performed (e.g., a learning rate, a number of layers to be used in the machine learning model, etc.). Hyperparameters are defined to be training parameters that are determined prior to initia