CN-121980220-A - Learning model-oriented data intelligent labeling and management method and system

CN121980220ACN 121980220 ACN121980220 ACN 121980220ACN-121980220-A

Abstract

The invention relates to the technical field of data annotation, and particularly discloses a learning model-oriented data intelligent annotation and management method and system, wherein the method comprises the steps of determining a tag sequence containing time information of each track; the method comprises the steps of obtaining marked media data and marking results thereof, clustering the marked media data according to the marking results to obtain media data sets with the marking results as indexes, comparing tag sequences of different tracks of the media data for each media data set, extracting feature tag tuples, marking the media data stream in real time based on the feature tag tuples, and performing traversal matching on the real-time tag sequences of the media data through the feature tag sets in the follow-up marking process.

Inventors

LIANG LAN

Assignees

浙江联合永道信息技术有限公司

Dates

Publication Date: 20260505
Application Date: 20260121

Claims (10)

1. The intelligent data labeling and management method for the learning model is characterized by comprising the following steps of: Extracting an image track, an audio track and a text track of media data, and independently identifying the three tracks to determine a tag sequence containing time information of each track; acquiring marked media data and marking results thereof, and clustering the marked media data according to the marking results to obtain a media data set with the marking results as indexes; for each media data set, comparing tag sequences of different tracks of the media data, and extracting feature tag tuples; And marking the media data stream in real time based on the feature tag tuple, verifying the marking result based on the marking end, and recursively updating the marked media data and the extraction process of the feature tag tuple according to the verification result.
2. The intelligent annotation and management method of learning model oriented data according to claim 1, wherein the step of extracting the image track, the audio track and the text track of the media data, identifying the three tracks independently, and determining the tag sequence containing the time information of each track comprises: extracting an image track, an audio track and a text track of the media data for any one media data; Dividing an image track, an audio track and a text track based on the same time axis to obtain data elements, wherein the data elements of the image track are image frames, the data elements of the audio track are audio segments, the data elements of the text track are text contents, all the data elements contain time information, the time information of the image frame is the moment of the image frame, the time information of the audio segment is the time interval of the audio segment, and the time information of the text contents is the time interval from the current text contents to the next text contents; extracting the content of the image frames, the audio segments and the text content to obtain a tag set; and counting the tag set based on the time information to obtain a tag sequence containing the time information of each track.
3. The intelligent data labeling and management method for learning models according to claim 1, wherein the step of obtaining labeled media data and labeling results thereof, clustering the labeled media data according to the labeling results, and obtaining a media data set indexed by the labeling results comprises: Intercepting a media data stream, randomly extracting media data from the media data stream, sending the selected media data to a labeling end, and receiving a labeling result fed back by the labeling end, wherein the labeling result is in a tree structure; comparing the labeling results, and calculating the labeling distance of the labeling results, wherein the calculation process of the labeling distance is as follows: In the formula (I), in the formula (II), The distance to be noted is indicated, The maximum value of the number of layers representing the two labeling results, Is the first The layer corresponds to a preset weight; represent the first Layer 1 The first label and the second label result The minimum word vector distance for each tag in the layer, Representing the first two labeling results In the comparison process, when any label does not have a label corresponding to the label, a preset default distance is adopted; Clustering the marked media data based on the marking distance to obtain various media data sets; and for any media data set, calculating the comprehensive result of the labeling result of each media data as an index.
4. The intelligent annotation and management method of learning model oriented data as claimed in claim 1, wherein the step of extracting feature tag tuples comprises, for each media dataset, comparing tag sequences of different tracks of the media data: Constructing a time point sequence based on the time interval of the image frames; sequentially reading tag sets of different tracks corresponding to the time points to obtain tag tuples of each time; Comparing any two media data in the same media data set with tag tuples at each moment, and marking the corresponding tag tuple when the similarity reaches a preset similarity threshold; The method comprises the steps of calculating the marking times of each tag tuple, selecting tag tuples with marking times reaching a preset time threshold, and combining to obtain feature tag tuples, wherein the feature tag tuples take the time threshold as an index, one marking result corresponds to one media data set, and a plurality of feature tag tuples taking the time threshold as an index are corresponding to one marking result.
5. The learning model oriented intelligent data labeling and management method according to claim 4, wherein the steps of labeling the media data stream in real time based on the feature tag tuple, verifying the labeling result based on the labeling end, and recursively updating the extraction process of the labeled media data and the feature tag tuple according to the verification result comprise: executing a tag sequence extraction process on the received media data stream in real time to obtain a tag sequence of the real-time media data, which is called a real-time tag sequence; sequentially inquiring all feature tag tuples according to the ascending order of the frequency threshold value, traversing the real-time tag sequence based on the feature tag tuples, and calculating the matching degree; Inquiring the length of the characteristic tag tuple, and correcting the matching degree according to the length; Selecting a feature tag tuple with the corrected matching degree reaching a preset matching degree threshold, inquiring a corresponding marking result, and marking the media data in real time; And verifying the labeling result based on the labeling end, and recursively updating the extraction process of the labeled media data and the feature tag tuple according to the verification result.
6. The intelligent data labeling and management method for learning models according to claim 5, wherein the steps of verifying the labeling result based on the labeling end, recursively updating the extraction process of the labeled media data and feature tag tuples according to the verification result comprise: Transmitting the media data subjected to real-time marking to a marking end, and receiving a marking result fed back by the marking end as a real marking result; Comparing the real labeling result with the real-time labeling result, and calculating the difference degree; when the difference reaches a preset difference threshold, the similarity threshold in the label tuple comparison process is increased based on a preset first step length, and all times of thresholds are increased based on a preset second step length.
7. A learning model oriented data intelligent labeling and management system, the system comprising: the track extraction and identification module is used for extracting an image track, an audio track and a text track of the media data, and independently identifying the three tracks to determine a tag sequence containing time information of each track; the media data clustering module is used for acquiring marked media data and marking results thereof, and clustering the marked media data according to the marking results to obtain a media data set with the marking results as indexes; The feature tag extraction module is used for comparing tag sequences of different tracks of the media data for each media data set to extract feature tag tuples; And the recursion updating module is used for marking the media data stream in real time based on the feature tag tuple, verifying the marking result based on the marking end, and recursively updating the marked media data and the extraction process of the feature tag tuple according to the verification result.
8. The learning model oriented data intelligent labeling and management system of claim 7, wherein the track extraction and identification module comprises: The track extraction unit is used for extracting an image track, an audio track and a text track of any media data; The system comprises a data element generation unit, a text element generation unit and a display unit, wherein the data element generation unit is used for carrying out box division on an image track, an audio track and a text track based on the same time axis to obtain data elements, the data elements of the image track are image frames, the data elements of the audio track are audio segments, the data elements of the text track are text contents, all the data elements contain time information, the time information of the image frames is the moment of the image frames, the time information of the audio segments is the time interval of the audio segments, and the time information of the text contents is the time interval from the current text contents to the next text contents; The content extraction unit is used for extracting the content of the image frames, the audio segments and the text content to obtain a tag set; and the tag set statistics unit is used for counting the tag sets based on the time information to obtain tag sequences containing the time information of each track.
9. The learning model oriented data intelligent labeling and management system of claim 7, wherein the media data clustering module comprises: the system comprises a data interception unit, a marking end, a data extraction unit and a data processing unit, wherein the data interception unit is used for intercepting a media data stream, randomly extracting media data from the media data stream, sending the selected media data to the marking end, and receiving a marking result fed back by the marking end; The labeling distance calculating unit is used for comparing the labeling results and calculating the labeling distance of the labeling results, wherein the labeling distance calculating process comprises the following steps: In the formula (I), in the formula (II), The distance to be noted is indicated, The maximum value of the number of layers representing the two labeling results, Is the first The layer corresponds to a preset weight; represent the first Layer 1 The first label and the second label result The minimum word vector distance for each tag in the layer, Representing the first two labeling results In the comparison process, when any label does not have a label corresponding to the label, a preset default distance is adopted; The clustering execution unit is used for clustering the marked media data based on the marking distance to obtain various media data sets; And the index generating unit is used for calculating the comprehensive result of the labeling result of each media data as an index for any type of media data set.
10. The learning model oriented data intelligent labeling and management system of claim 7, wherein the feature tag extraction module comprises: a sequence construction unit for constructing a sequence of time points based on the time intervals of the image frames; The tuple generating unit is used for sequentially reading tag sets of different tracks corresponding to the time points to obtain tag tuples of each time; The tuple comparison unit is used for comparing any two media data in the same media data set with the tag tuples at each moment, and marking the corresponding tag tuples when the similarity reaches a preset similarity threshold value; the method comprises the steps of selecting a combination unit, selecting tag tuples with the marking times reaching a preset frequency threshold value, and combining to obtain feature tag tuples, wherein the feature tag tuples take the frequency threshold value as an index, one marking result corresponds to one media data set, and a plurality of feature tag tuples taking the frequency threshold value as the index are corresponding to one marking result.

Description

Learning model-oriented data intelligent labeling and management method and system Technical Field The invention relates to the technical field of data annotation, in particular to a learning model-oriented data intelligent annotation and management method and system. Background The intelligent data annotation is characterized in that the intelligent data annotation takes artificial intelligence and an automation technology as a core, multi-mode original data such as texts, images, voices, videos, point clouds and the like are automatically or semi-automatically classified, marked, annotated and structured to generate high-quality annotation data for machine learning model training, under the prior background technology, the processing requirement of visual data is very vigorous, a large number of learning models facing the visual data also appear correspondingly, the learning process needs a large number of marked data, the traditional marking mode is still a traditional artificial marking scheme, and the efficiency is not high under the condition of extremely many visual data, so how to improve the marking speed of the visual data is the technical problem to be solved by the technical scheme of the invention. Disclosure of Invention The invention aims to provide a learning model-oriented data intelligent labeling and management method and system, which are used for solving the problems in the background technology. In order to achieve the above purpose, the present invention provides the following technical solutions: a learning model-oriented data intelligent labeling and management method comprises the following steps: Extracting an image track, an audio track and a text track of media data, and independently identifying the three tracks to determine a tag sequence containing time information of each track; acquiring marked media data and marking results thereof, and clustering the marked media data according to the marking results to obtain a media data set with the marking results as indexes; for each media data set, comparing tag sequences of different tracks of the media data, and extracting feature tag tuples; And marking the media data stream in real time based on the feature tag tuple, verifying the marking result based on the marking end, and recursively updating the marked media data and the extraction process of the feature tag tuple according to the verification result. The method is characterized in that the steps of extracting the image track, the audio track and the text track of the media data, independently identifying the three tracks and determining the tag sequence containing time information of each track comprise the following steps: extracting an image track, an audio track and a text track of the media data for any one media data; Dividing an image track, an audio track and a text track based on the same time axis to obtain data elements, wherein the data elements of the image track are image frames, the data elements of the audio track are audio segments, the data elements of the text track are text contents, all the data elements contain time information, the time information of the image frame is the moment of the image frame, the time information of the audio segment is the time interval of the audio segment, and the time information of the text contents is the time interval from the current text contents to the next text contents; extracting the content of the image frames, the audio segments and the text content to obtain a tag set; and counting the tag set based on the time information to obtain a tag sequence containing the time information of each track. The method comprises the steps of obtaining marked media data and marking results thereof, clustering the marked media data according to the marking results, and obtaining a media data set with the marking results as indexes, wherein the marking results comprise the following steps: Intercepting a media data stream, randomly extracting media data from the media data stream, sending the selected media data to a labeling end, and receiving a labeling result fed back by the labeling end, wherein the labeling result is in a tree structure; comparing the labeling results, and calculating the labeling distance of the labeling results, wherein the calculation process of the labeling distance is as follows: In the formula (I), in the formula (II), The distance to be noted is indicated,The maximum value of the number of layers representing the two labeling results,Is the firstThe layer corresponds to a preset weight; represent the first Layer 1The first label and the second label resultThe minimum word vector distance for each tag in the layer,Representing the first two labeling resultsIn the comparison process, when any label does not have a label corresponding to the label, a preset default distance is adopted; Clustering the marked media data based on the marking distance to obtain various media data sets; and for any media dat