CN-116244630-B - Commodity material classification method, system, medium and equipment based on deep learning
Abstract
The invention provides a commodity material classification method, a system, a medium and equipment based on deep learning, which comprise the steps of S1, designing and constructing a deep learning algorithm model, training by using the processed data, S3, optimizing a deep learning algorithm model framework and data characteristics, performing a comparison test to prove the effectiveness of the deep learning algorithm model, and S4, completing classification of given materials by using the optimized deep learning algorithm model. According to the invention, the feature vector is independently constructed for the text of each field, and the feature learning mode is carried out, so that the situation that the vector length of the text corresponding to each field is too low and the semantic information is cracked is avoided.
Inventors
- YAO ZEKUN
- ZHU JUN
- LI YANBEI
- SHEN DAFENG
- XIA JINGXIANG
- YAN CHENGUANG
- SUN ZHIQIANG
- DAI ZHIXIN
Assignees
- 欧冶工业品股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20230306
Claims (4)
- 1. The commodity material classification method based on deep learning is characterized by comprising the following steps of: Step S1, acquiring material data and preprocessing the data; S2, designing and constructing a deep learning algorithm model, and training by using the processed data; step S3, optimizing the deep learning algorithm model framework and the data characteristics, and performing a comparison test to prove the effectiveness of the deep learning algorithm model; s4, completing classification of given materials by using the optimized deep learning algorithm model; in the step S1: acquiring material data, and preprocessing the material description, the material technical attribute, the material shape gauge and the material name in the material data; Step S1.1, extracting text from the material description field, cleaning and removing irrelevant symbols, obtaining corresponding codes of each word in the text through inquiring a dictionary, and preliminarily converting each line of text into an index vector as a characteristic As input to Bi-LSTM; s1.2, cleaning and word segmentation are carried out on data according to material technical attributes, a dictionary is built, corresponding codes are obtained for each word in the processed text through inquiring the dictionary, and each line of text is initially converted into an index vector As input to Bi-LSTM; s1.3, processing the text corresponding to the material specification field, removing symbols incapable of representing the text, reserving letters and numbers according to definition in combination with the writing specification of the material specification, obtaining corresponding codes for each word in the processed text by inquiring a dictionary, and primarily converting each line of text into vectors serving as characteristics As input to Bi-LSTM; S1.4, the text of the material name part contains the existing material type specification data, the Chinese character part is reserved, redundant letters, numbers and punctuation stop words are removed, corresponding codes are obtained for each word in the processed text by inquiring a dictionary, and each line of text is initially converted into an index vector As input to Bi-LSTM; in the step S2: s2.1, designing a word embedding layer, taking the vectorized short text as input, and outputting word vectors; Designing word embedding layers to 、 、 And As input, and encodes it into word vectors: (1) (2) (3) (4) Wherein, the Wherein For the strip of data Field 1 An embedded representation of the individual characters, For the piece of data Text embedding length of the field; wherein For the strip of data Field 1 An embedded representation of the individual characters, For the piece of data Text embedding length of the field; wherein For the strip of data Field 1 An embedded representation of the individual characters, For the piece of data Text embedding length of the field; wherein For the strip of data Field 1 An embedded representation of the individual characters, For the piece of data Text embedding length of the field; s2.2, designing four parallel Bi-directional long-short-time memory networks Bi-LSTM, taking word vectors of an embedded layer as input, and capturing long-distance characteristic dependence of texts; s2.3, designing a one-dimensional convolution layer, taking the output of the Bi-LSTM layer as input, and further capturing the local characteristic relation; S2.4, designing a full-connection layer, outputting word vectors trained by the Bi-LSTM and the one-dimensional convolution layer successively to the full-connection layer for classification, and outputting classification results; In the step S3: step S3.1 for Cleaning special characters, calculating word frequency of all entries, manually extracting preset features, and converting material technical attribute description into a group of matrixes consisting of 0 and 1 values by means of whether each material has a certain sub-attribute or not and using binary classification variables to represent text information corresponding to the sub-attribute; S3.2, convolving the output of the Bi-LSTM layer in the original model with four one-dimensional convolution layers, and finally inputting the convolved output with the proposed manual characteristics into a Gaussian kernel support vector machine to obtain a classification result; And S3.3, performing experiments by using a machine learning model comprising a random forest and a support vector machine as comparison, comparing models before and after optimization, and evaluating the models by using four evaluation indexes of accuracy, precision, recall rate and F1 value to prove the effectiveness of the models.
- 2. A deep learning-based commodity material classification system, comprising: the module M1 is used for acquiring material data and preprocessing the data; the module M2 is used for designing and constructing a deep learning algorithm model and training by using the processed data; the module M3 optimizes the architecture and the data characteristics of the deep learning algorithm model, and performs a comparison test to prove the effectiveness of the deep learning algorithm model; The module M4 is used for completing classification of given materials by utilizing the optimized deep learning algorithm model; In the module M1: acquiring material data, and preprocessing the material description, the material technical attribute, the material shape gauge and the material name in the material data; Module M1.1 extracting text from the object description field and cleaning to remove extraneous symbols, obtaining corresponding codes for each word in the text by querying the dictionary, and preliminarily converting each line of text into an index vector as a feature As input to Bi-LSTM; the module M1.2 is used for cleaning and word segmentation of data according to material technical attributes, constructing a dictionary, obtaining corresponding codes for each word in the processed text by inquiring the dictionary, and preliminarily converting each line of text into an index vector As input to Bi-LSTM; Processing text corresponding to the material specification field, removing symbols incapable of representing the text, reserving letters and numbers according to definition in combination with the writing specification of the material specification, obtaining corresponding codes for each word in the processed text by inquiring a dictionary, and preliminarily converting each line of text into vectors serving as characteristics As input to Bi-LSTM; module M1.4, the text of the material name part contains the existing material type specification data, the Chinese character part is reserved, redundant letters, numbers and punctuation stop words are removed, corresponding codes are obtained for each word in the processed text by inquiring a dictionary, and each line of text is initially converted into an index vector As input to Bi-LSTM; in the module M2: The module M2.1 designs a word embedding layer, takes the vectorized short text as input and outputs word vectors; Designing word embedding layers to 、 、 And As input, and encodes it into word vectors: (1) (2) (3) (4) Wherein, the Wherein For the strip of data Field 1 An embedded representation of the individual characters, For the piece of data Text embedding length of the field; wherein For the strip of data Field 1 An embedded representation of the individual characters, For the piece of data Text embedding length of the field; wherein For the strip of data Field 1 An embedded representation of the individual characters, For the piece of data Text embedding length of the field; wherein For the strip of data Field 1 An embedded representation of the individual characters, For the piece of data Text embedding length of the field; Designing four parallel Bi-directional long-short-time memory networks Bi-LSTM, taking word vectors of an embedded layer as input, and capturing long-distance characteristic dependence of texts; the module M2.3 designs a one-dimensional convolution layer, takes the output of the Bi-LSTM layer as input, and further captures the local characteristic relation; Designing a full-connection layer, outputting word vectors trained by the Bi-LSTM and one-dimensional convolution layers successively to the full-connection layer for classification, and outputting classification results; in the module M3: Module M3.1 for Cleaning special characters, calculating word frequency of all entries, manually extracting preset features, and converting material technical attribute description into a group of matrixes consisting of 0 and 1 values by means of whether each material has a certain sub-attribute or not and using binary classification variables to represent text information corresponding to the sub-attribute; the module M3.2 is used for convoluting the output of the Bi-LSTM layer in the original model with four one-dimensional convolution layers, and finally inputting the convoluting result with the proposed manual characteristic into a Gaussian kernel support vector machine to obtain a classification result; And the module M3.3 is used for carrying out experiments by using a machine learning model comprising a random forest and a support vector machine as comparison, comparing the models before and after optimization, and evaluating the models by using four evaluation indexes of accuracy, precision, recall rate and F1 value to prove the effectiveness of the models.
- 3. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the deep learning based commodity material classifying method according to claim 1.
- 4. The commodity material classification device based on deep learning is characterized by comprising a controller; The controller comprises the computer readable storage medium storing a computer program according to claim 3, which when executed by a processor implements the steps of the deep learning based commodity material classification method according to claim 1, or the controller comprises the deep learning based commodity material classification system according to claim 2.
Description
Commodity material classification method, system, medium and equipment based on deep learning Technical Field The invention relates to the field of material detection, in particular to a commodity material classification method, a commodity material classification system, a commodity material classification medium and commodity material classification equipment based on deep learning. Background At present, a plurality of classification methods and applications thereof exist, but the classification of the existing classification methods facing the industrial field is poor, the existing classification methods do not utilize the related information of the materials, and the related characteristics of the materials in the industrial field cannot be captured, so that the material classification effect is poor. The patent document CN115410131A (application number: CN 202211121406.6) discloses a method for intelligent classification of short videos, which comprises the following steps of a, loading an original video, b, preprocessing data, c, extracting characteristics of video data, d, bertModel semantic tag characteristic fusion training, and e, automatic intelligent classification of the video. The method for intelligent classification of short videos is used for classifying videos, targets are different in main body, and in addition, the method does not conduct specific treatment on industrial materials, so that good effects of the method applied to the field of materials cannot be guaranteed. Patent document CN114011750a (application number: CN202111253062. X) discloses a material visual inspection classification device and a classification method, the material visual inspection classification device includes a residue dust removal component, a finished product classification box, a discharge air nozzle component, a front size inspection camera, a side size inspection camera, a bottom size inspection camera, an appearance inspection camera one, an appearance inspection camera two, a feeding guide component and a rotary feeding component. However, the invention does not carry out specific treatment on industrial materials, and can not ensure that the method has good effect in the field of materials. Disclosure of Invention Aiming at the defects in the prior art, the invention aims to provide a commodity material classification method, a commodity material classification system, a commodity material classification medium and commodity material classification equipment based on deep learning. The commodity material classification method based on deep learning provided by the invention comprises the following steps: Step S1, acquiring material data and preprocessing the data; S2, designing and constructing a deep learning algorithm model, and training by using the processed data; step S3, optimizing the deep learning algorithm model framework and the data characteristics, and performing a comparison test to prove the effectiveness of the deep learning algorithm model; and S4, completing classification of the given materials by using the optimized deep learning algorithm model. Preferably, in said step S1: acquiring material data, and preprocessing the material description, the material technical attribute, the material shape gauge and the material name in the material data; Extracting text from the object description field, cleaning and removing irrelevant symbols, obtaining corresponding codes of each word in the text through a query dictionary, and preliminarily converting each line of text into an index vector serving as a characteristic Item A to be used as input of Bi-LSTM; step S1.2, cleaning and word segmentation are carried out on data according to material technical attributes, a dictionary is built, corresponding codes are obtained for each word in the processed text through inquiring the dictionary, and each line of text is preliminarily converted into an index vector Item B to be used as input of Bi-LSTM; S1.3, processing texts corresponding to the material specification fields, removing symbols incapable of representing the texts, reserving letters and numbers according to definition in combination with writing specifications of the material specifications, obtaining corresponding codes for words in the processed texts by inquiring a dictionary, and primarily converting each line of texts into vectors serving as characteristic Item C to be used as input of Bi-LSTM; and S1.4, the text of the material name part contains the existing material type specification data, the Chinese character part is reserved, redundant letters, numbers and punctuation stop words are removed, corresponding codes are obtained for each word in the processed text through inquiring a dictionary, and each line of text is initially converted into an index vector Item D to be used as input of Bi-LSTM. Preferably, in said step S2: s2.1, designing a word embedding layer, taking the vectorized short text as input, and outputting wor