CN-116070122-B - Method and device for identifying context scene data
Abstract
The invention discloses a method and a device for identifying context scene data, wherein the method comprises the steps of respectively inputting context data and context data in context text data into a single sentence topic model to obtain context topic representation and context topic representation, inputting the context text data into the context topic model to obtain context topic representation, calculating the similarity of any two of the context topic representation, the context topic representation and the context topic representation, judging whether the similarity meets a threshold condition, and judging the context data and the context data as the context scene data if the similarity meets the threshold condition. By applying the method for identifying the context scene data, through the two-round context data discovery process, the sufficiency of the context scene data discovery is improved, and the manpower, time and cost consumed in the context scene data labeling process are reduced.
Inventors
- MA JIAN
- DUAN QINGLONG
- Zeng Shuifei
- KONG LINGLEI
- ZHANG JINGRUI
- LI MIN
- LIU WEIQIANG
Assignees
- 青岛海尔电冰箱有限公司
- 海尔智家股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20230103
Claims (9)
- 1. A method for identifying contextual scene data, comprising the steps of: Respectively inputting the context data and the context data in the context text data into a single sentence topic model to obtain the context topic representation and the context topic representation, wherein the single sentence topic model is a model obtained by training a large number of single sentence text data; Inputting the context text data into a context topic model to obtain a context topic representation, wherein the context topic model is a model obtained through training of a large amount of context data; calculating the similarity of any two of the above topic representation, the below topic representation and the above topic representation; Judging whether the similarity meets a threshold condition, if so, judging the context data and the context data as context scene data; The calculating the similarity of any two of the contextual theme representation, the contextual theme representation and the contextual theme representation comprises: Calculating a first similarity of the above topic representation and the below topic representation, wherein the value range is (0, 1); Calculating a second similarity between the above topic representation and the above topic representation, wherein the value range is (0, 1); calculating a third similarity between the context topic representation and the context topic representation, wherein the value range is (0, 1); and weighting and summing the first similarity, the second similarity and the third similarity, and calculating the total similarity.
- 2. The method of claim 1, wherein the single sentence topic model and the contextual topic model are built based on an LDA topic model.
- 3. The method of claim 1, wherein the textual data content of the single sentence topic model and the contextual topic model are different.
- 4. The method of claim 1, wherein the step of determining whether the similarity satisfies a threshold condition further comprises: if not, retraining the single sentence topic model and the contextual topic model by the contextual data and the contextual data.
- 5. The method of claim 1, wherein the single sentence topic model and the contextual topic model are both semantic representation frameworks, and wherein the text content is encoded using a network of shared parameters of the same structure to obtain a vector representation of the text.
- 6. The method according to claim 5, wherein the first similarity, the second similarity, and the third similarity are calculated by the same method, and are calculated as the similarity between vectors, and the sum of the weight values of the first similarity, the second similarity, and the third similarity is 1.
- 7. A contextual scene data identifying apparatus, comprising: The single sentence representation module is used for respectively inputting the context data and the context data in the context text data into a single sentence topic model to obtain the context topic representation and the context topic representation, wherein the single sentence topic model is a model obtained by training a large number of single sentence text data; The context representation module is used for inputting the context text data into a context topic model to obtain a context topic representation, wherein the context topic model is a model obtained through training of a large amount of context data; The similarity calculation module is used for calculating the similarity of any two of the upper theme representation, the lower theme representation and the upper theme representation, and comprises the steps of calculating the first similarity of the upper theme representation and the lower theme representation, calculating the second similarity of the upper theme representation and the lower theme representation, calculating the third similarity of the lower theme representation and the upper theme representation, calculating the value range of the third similarity of the lower theme representation and the upper theme representation, and calculating the total similarity, wherein the value range of the first similarity, the second similarity and the third similarity is weighted and summed; And the judging module is used for judging whether the similarity meets a threshold condition, if so, judging the context data and the context data as context scene data, and if not, retraining the single sentence theme model and the context theme model through the context data and the context data.
- 8. An electronic device, comprising: a storage module storing a computer program; a processing module, when executing the computer program, is capable of implementing the steps in the method for identifying contextual scene data according to any one of claims 1 to 6.
- 9. A readable storage medium storing a computer program, which when executed by a processing module performs the steps of the method for identifying contextual scene data according to any of claims 1 to 6.
Description
Method and device for identifying context scene data Technical Field The present invention relates to the field of deep learning, and in particular, to a method and apparatus for identifying contextual scene data. Background When training a man-machine interaction model, a large amount of dialogue corpora are required to be input into the model, the corpora often need to manually mark the intention relation of the context, and a large amount of manpower and material resources are required to be consumed due to huge workload. There are some schemes at present, for example, similar context texts are clustered by a clustering method, but the clustering method only can obtain clustering results from the surface meanings of words, which can miss corpus data which are not literally related but have context relations in a subject, but only leave corpora which are literally similar in meaning, and the trained interaction model cannot fully understand corpora which are related in the subject but are literally irrelevant, so that more intelligent ways are needed to obtain more sufficient context dialogue data, and a more suitable model is established. Disclosure of Invention In order to solve at least one of the above problems, an object of the present invention is to provide a method and apparatus for recognizing context scene data that more accurately recognizes a context relationship . In order to achieve the above object, an embodiment of the present invention provides a method for identifying contextual scene data, including the steps of: Respectively inputting the context data and the context data in the context text data into a single sentence topic model to obtain the context topic representation and the context topic representation, wherein the single sentence topic model is a model obtained by training a large number of single sentence text data; Inputting the context text data into a context topic model to obtain a context topic representation, wherein the context topic model is a model obtained through training of a large amount of context data; calculating the similarity of any two of the above topic representation, the below topic representation and the above topic representation; Judging whether the similarity meets a threshold condition, if so, judging the context data and the context data as context scene data. As a further improvement of the invention, the single sentence topic model and the contextual topic model are established based on an LDA topic model. As a further improvement of the invention, the text data content of the single sentence topic model and the contextual topic model are trained differently. As a further improvement of the present invention, the step of judging whether the similarity satisfies the threshold condition further includes: if not, retraining the single sentence topic model and the contextual topic model by the contextual data and the contextual data. As a further improvement of the present invention, the single sentence topic model and the contextual topic model are both semantic representation frameworks, and text content is encoded using a network of shared parameters of the same structure, so as to obtain a vector representation of the text. As a further refinement of the present invention, the step of calculating the similarity of any two of the above subject matter representation, the below subject matter representation, and the above subject matter representation comprises: Calculating a first similarity of the above topic representation and the below topic representation, wherein the value range is (0, 1); Calculating a second similarity between the above topic representation and the above topic representation, wherein the value range is (0, 1); calculating a third similarity between the context topic representation and the context topic representation, wherein the value range is (0, 1); and weighting and summing the first similarity, the second similarity and the third similarity, and calculating the total similarity. As a further improvement of the present invention, the calculation methods of the first similarity, the second similarity, and the third similarity are the same, and are all the similarity between the calculation vectors, and the sum of the weight values of the first similarity, the second similarity, and the third similarity is 1. To achieve one of the above objects, an embodiment of the present invention provides a context scene data recognition apparatus, including: The single sentence representation module is used for respectively inputting the context data and the context data in the context text data into a single sentence topic model to obtain the context topic representation and the context topic representation, wherein the single sentence topic model is a model obtained by training a large number of single sentence text data; The context representation module is used for inputting the context text data into a context topic model to obtai