CN-122023786-A - E-commerce category fine granularity mining method based on multi-mode pre-training model
Abstract
The invention discloses an e-commerce category fine granularity mining method based on a multi-mode pre-training model. The commodity segmentation accuracy of the image segmentation algorithm Sem-Fpn on the E-commerce image set is improved through a transfer learning method, and then the multi-mode pre-training model is pre-trained through the image segmentation enhanced data set. Then, increasing text message abundance through text semantic alignment and large language based text message expansion, and pre-training the multimodal pre-training model by on the text enhanced dataset. And then, respectively recoding the image text information characteristics in the electronic commerce data set based on the obtained encoders in the multi-mode pre-training model, and realizing fine granularity division of commodity categories by using a clustering algorithm. And finally, labeling the commodity division categories by using a category optimization labeling algorithm based on a large model. The method realizes fine granularity mining of categories on the MEP-3M data set, enriches the information of the data set and improves the fine granularity of the data categories of the E-commerce data set.
Inventors
- Guan Zhangqingyun
- LIU FAN
- ZHANG XUEJIE
Assignees
- 河海大学
Dates
- Publication Date
- 20260512
- Application Date
- 20241112
Claims (7)
- 1. The fine granularity mining method for the electronic commerce category based on the multi-mode pre-training model is characterized by comprising the following steps of: Step 1, realizing image segmentation based on a multi-mode pre-training model migration method, and training a semantic segmentation model by adopting a model of pre-training and fine-tuning; Step 2, performing image segmentation enhancement on the multi-mode data set MEP-3M based on the image segmentation enhancement fine tuning multi-mode pre-training model, and fine tuning the multi-mode pre-training model CLIP on the MEP-3M-SEG data set; Step 3, processing and enhancing the characteristics of default and deficiency of text modal data on the electronic commerce data set respectively based on a two-stage text fusion enhancement method of a large model; And 4, carrying out TEXT enhancement on the multi-mode data set MEP-3M based on a two-stage TEXT fusion enhancement method of the large model to obtain the TEXT enhanced multi-mode data set MEP-3M-TEXT. And fine tuning the multi-modal pre-training model CLIP in the MEP-3M-TEXT dataset; step 5, realizing fine granularity division of the category of the data sets, and carrying out feature extraction and feature coding on each mode data set in the electronic commerce data set; And 6, optimizing category labeling, and generating labels by utilizing the RAM large model according to the text and image information divided by the new subclass divided by the initial class. Setting an optimization function, and finding out the most reasonable special characterization label as a new label for class mining.
- 2. The method for mining the fine granularity of the electronic commerce category based on the multi-mode pre-training model is characterized in that the specific process of the step 1 is as follows: 1-1, initializing parameters of a backbone network of an image segmentation model by adopting weights of a CLIP pre-training image encoder, so that image characteristics of electronic commerce data can be better extracted after the initialization is completed; 1-2, freezing the backbone network in the network training process, keeping the characteristic extraction part unchanged and fine-tuning the decoder part.
- 3. The method for mining the fine granularity of the electronic commerce category based on the multi-mode pre-training model is characterized by comprising the following specific processes of: 2-1, training a semantic segmentation model by utilizing a model of pretraining and fine tuning, and then carrying out image segmentation enhancement on the multi-modal data set MEP-3M to obtain an image segmentation enhanced multi-modal data set MEP-3M-SEG; 2-2, aligning text labels of the electronic commerce data set, comparing and aligning text modal information in the original multi-modal commodity data set with the information generated in the first step, automatically screening according to the rationality of generating the text labels, and transmitting reasonable results to the subsequent steps for overall generation; 2-3, fine tuning the multimodal pre-training model CLIP on the MEP-3M-SEG dataset. 2-4, CLIP pre-training adopts InfoNCE loss method. The negative-sample multi-classification method is adopted to replace the negative-sample single-classification method, so that the degree of difference between the true value and the predicted value can be estimated. Assuming that the given text set k= { K 0 ,...,k N-1 } and one image q, contains one positive sample (q, K + ) and N-1 negative samples, the calculation formula of InfoNCE loss is as follows: Wherein, F (-) and G (-) represent the image encoder and the text encoder respectively, the image characteristic and the text characteristic can be output, and tau is the temperature coefficient. The magnitude of τ may control the degree of differentiation of the model from negative samples.
- 4. The method for mining the fine granularity of the electronic commerce category based on the multi-mode pre-training model is characterized by comprising the following specific processes of: 3-1, firstly, generating a basic tag description of a commodity image by using a large model RAM, and supplementing image information of the commodity on a text mode for an electronic commerce data set, wherein the image information can specifically comprise commodity color, shape and some characteristic information about the commodity image in the image; 3-2, aligning text labels of the electronic commerce data set, comparing and aligning text modal information in the original multi-modal commodity data set with the information generated in the first step, automatically screening according to the rationality of generating the text labels, and transmitting reasonable results to the subsequent steps for overall generation; 3-3, inputting the obtained text information, including the original information of the sample text information and the generated label obtained in the first step, into the ChatGLM big model in a peer-to-peer mode to supplement the peripheral information of the commodity label so as to expand the text semantics and improve the utilization efficiency of the data set in the text mode.
- 5. The method for mining the fine granularity of the electronic commerce category based on the multi-mode pre-training model is characterized by comprising the following specific processes of: 4-1, firstly utilizing a text fusion enhancement method to realize text enhancement by utilizing image data and text data in a multi-mode electronic commerce data set; 4-2, using the enhanced text data and the initial image data for fine tuning the multi-modal pre-training model.
- 6. The method for mining the fine granularity of the electronic commerce category based on the multi-mode pre-training model is characterized by comprising the following specific processes of: 5-1, classifying feature codes of the commodities by adopting a self-supervision method; 5-2, classifying according to the image features and the text features, and aligning according to the classification result to realize fine-grained subclass division.
- 7. The method for mining the fine granularity of the electronic commerce category based on the multi-mode pre-training model is characterized by comprising the following specific processes of: 6-1, generating a label aiming at the text and image information divided by the new subclass divided by the initial class by utilizing the RAM big model; And 6-2, setting an optimization function, and finding out the most reasonable specific characterization label as a new label for category mining. The selection requirement of the sub-class labels comprises two points, wherein one point is that label information of the same sub-class commodity is as consistent as possible, and the label information is as different as possible from other labels in the large class to which the sub-class belongs.
Description
E-commerce category fine granularity mining method based on multi-mode pre-training model Technical Field The invention relates to fine granularity mining of electronic commerce, in particular to a fine granularity mining method of electronic commerce based on a multi-mode pre-training model, and belongs to the technical field of computers. Background In recent years, with the development of artificial intelligence related technologies, machine learning and related applications have also achieved important results in the directions of speech recognition, computer vision, natural language processing, and the like. Wherein, classifying objects according to descriptive information is a fundamental and important task of machine learning. Generally, classification work can be classified into coarse-grained classification, fine-grained classification, and instance-level classification according to the feature level that the object itself has. For example, classification work on coarse-grained levels can generally be of the large class of dogs, cars, flowers, etc., characterized by large inter-class differences. The fine granularity classification is based on coarse granularity classification, and more fine subclassification is carried out on the large class, such as the class of dogs, the style of vehicles, the variety of flowers and the like. The example-level classification is characterized by further distinguishing different individuals, and realizing classification of the individuals, such as face recognition and application thereof. As artificial intelligence research has advanced, various related research efforts on coarse-grained and example sets have progressed rapidly. However, it has been difficult to meet the research requirements of artificial intelligence algorithms in some specific scenarios in modern society, especially in such work as e-commerce that fine-grained classification is urgent. Early common object classification datasets, such as COCO and VOC, contained 80 categories and 20 categories, respectively. The granularity level of these data set categories is generally focused on such broad categories as dogs, cars, flowers. Therefore, the method is difficult to meet the commodity classification requirements in the electronic commerce field, and realizes commodity brand classification or commodity characteristic classification. The construction mode of the early commodity data set is similar to that of the general object data set, the construction is often carried out on the basis of the scene of a retail store, the variety and the number of contained commodities are relatively small, and the abundance and the hierarchy of the data set required by classifying commodities and searching the commodities are in a lower hierarchy, so that the coarse-granularity classification mode can meet the application research requirement under the scene. With the development of the internet and information technology, the scale of electronic commerce is continuously increasing, and the variety and number of commodities are becoming more and more. Early classification levels have become increasingly difficult to meet practical application needs, and the need for fine-grained classification in the e-commerce industry has also become increasingly urgent. So for new scenarios, it is also increasingly important to build e-commerce oriented fine-grained datasets. An e-commerce oriented data set is constructed from scratch, and a series of complex processes such as data collection, data cleaning and the like are needed. This effort can be quite time-consuming and labor-consuming, especially considering the data size required by the e-commerce dataset. At the same time, developing a work on a completely new data set is unfavorable for the continuity of the work, and requires higher algorithm design and training costs. Therefore, the related method is designed and utilized to carry out fine-granularity upgrade on the existing data set, so that on one hand, the cost can be saved, and on the other hand, the application difficulty on the downstream task can be reduced. With the explosion of internet technology, information carriers for goods have also become diversified. Images, text, audio, video, etc. are media that can carry information. Through deeper mining of the relation and information among the modalities, the existing old e-commerce data sets can be further subjected to category fine granularity mining through related technology design methods such as multiple modalities, and upgrading and expansion of the existing data sets are achieved. The multi-mode pre-training model CLIP is combined with category fine granularity mining, an e-commerce multi-mode pre-training model trained on a large-scale e-commerce data set MEP-3M is used for learning commodity image characterization with good semanteme, and the multi-mode pre-training model is migrated to the e-commerce category fine granularity mining work. Thus, by mea