CN-120874541-B - Real world model training method and system based on local physical environment

CN120874541BCN 120874541 BCN120874541 BCN 120874541BCN-120874541-B

Abstract

The invention relates to the technical field of artificial intelligence, in particular to a real-world model training method and a real-world model training system based on a local physical environment, which provide a commonality rule of second modal characteristics of a target physical place in a historical time period through a local rule knowledge base, classify the second modal characteristics of the target physical place in the target time period, add category labels for the second modal characteristics and add a training set of a cloud-world model, the cloud world model is subjected to model distillation together with the local rule knowledge base, the first modal characteristics and the second modal characteristics so as to define personalized characteristics corresponding to the target physical place, so that the real world model can dynamically adapt to personalized rules and dynamic changes of the target physical place, and the representation degree and the suitability of the real world model to the target physical place are improved.

Inventors

HAN XIAO
GONG HAIBIN
LIU QIANYA
Gu Zhuolun
LU QI
WANG XIAO

Assignees

北京奇岱松科技有限公司

Dates

Publication Date: 20260508
Application Date: 20250714

Claims (7)

1. A real world model training method based on a local physical environment, the training method comprising the steps of: S10, acquiring a local rule knowledge base corresponding to a target physical place before a target time period and a first modal feature and a second modal feature corresponding to the target physical place in the target time period, wherein the first modal feature comprises a content feature vector, a space structure feature vector and a time sequence feature vector, the second modal feature comprises an entity feature vector and a relation feature vector, and S10 comprises the following steps: s101, obtaining feature vectors of each reference physical place in M preset time periods before the target time period, wherein the second modes comprise an entity mode and a relation mode, the feature vectors of the entity mode are entity feature vectors, the feature vectors of the relation mode are relation feature vectors, and M is a positive integer; s102, clustering all feature vectors corresponding to all reference physical places by the current second modality according to any second modality, and obtaining a plurality of feature cluster sets corresponding to the current second modality; s103, storing all feature cluster sets corresponding to the current second modality into a local rule knowledge base corresponding to the target physical place; S20, classifying the second modal features in the target time period according to the local rule knowledge base, and obtaining a target classification result corresponding to each feature vector in the second modal features, wherein the target classification result comprises known feature data and unknown feature data, and S20 comprises the following steps: s201, regarding any second mode, taking any feature vector corresponding to the current second mode in the target time period of the target physical place as a target feature vector; s202, calculating a first distance between the target feature vector and each feature cluster set corresponding to the current second modality according to the target feature vector and each feature cluster set corresponding to the current second modality; S203, obtaining a target classification result corresponding to the target feature vector according to a first distance between the target feature vector and each feature cluster set corresponding to the current second modality; S204, traversing all feature vectors corresponding to all second modes of the target physical place in the target time period, and obtaining a target classification result corresponding to each feature vector in second mode features corresponding to the target physical place in the target time period; S30, performing model distillation on a cloud world model of the cloud according to the local rule knowledge base, the first modal feature, the second modal feature and a target classification result corresponding to each feature vector in the second modal feature to obtain a real world model corresponding to the target physical place, wherein the real world model is used for executing a target application task according to the multi-modal feature corresponding to the target physical place.
2. The real world model training method based on the local physical environment according to claim 1, wherein S101 comprises the steps of: S1011, acquiring a plurality of history acquisition data corresponding to each reference physical place in M preset time periods, wherein each history acquisition data is a history acquisition image or a history acquisition text, and each history acquisition data represents the states of a plurality of target objects in the corresponding reference physical place in the current time period; S1012, for any preset time period corresponding to any reference physical place, encoding each historical acquisition data corresponding to the current preset time period of the current reference physical place according to a preset content encoder to obtain a content feature vector corresponding to each historical acquisition data; S1013, coding each history acquisition image corresponding to the current preset time period of the current reference physical field according to a preset space structure coder and a preset character entity coder to obtain a space structure feature vector corresponding to each history acquisition image and an entity feature vector corresponding to each target object in each history acquisition image; s1014, splicing all content feature vectors, space structure feature vectors and entity feature vectors corresponding to the current reference physical place in the current preset time period to obtain scene current feature vectors corresponding to the current reference physical place in the current preset time period; S1015, splicing and compressing scene current characteristic vectors corresponding to the current reference physical field in a current preset time period and N preset time periods before the current preset time period to obtain time sequence characteristic vectors corresponding to the current reference physical field in the current preset time period, wherein N+1 is the preset vector splicing number, and the scene current characteristic vector before the first preset time period is the preset characteristic vector; S1016, inputting the content feature vector, the space structure feature vector and the entity feature vector corresponding to the current reference physical place in the current preset time period into the first initial world model, and obtaining a plurality of relation feature vectors corresponding to the current reference physical place in the current preset time period.
3. The real world model training method based on the local physical environment according to claim 2, wherein S10 comprises the steps of: S110, acquiring a plurality of target acquisition data corresponding to the target physical place in the target time period, wherein each target acquisition data is a target acquisition image or a target acquisition text, and each target acquisition data correspondingly represents the states of a plurality of target objects in the target physical place in the target time period; s120, encoding each target acquisition data corresponding to the target time period according to the preset content encoder to obtain a content feature vector corresponding to each target acquisition data; S130, encoding each target acquisition image according to the preset space structure encoder and the preset character entity encoder to obtain a space structure feature vector corresponding to each target acquisition image and an entity feature vector corresponding to each target object in each target acquisition image; s140, all content feature vectors, space structure feature vectors and entity feature vectors corresponding to the target physical place in the target time period are spliced, and scene current feature vectors corresponding to the target physical place in the target time period are obtained; S150, splicing and compressing scene current characteristic vectors corresponding to the target physical place in the target time period and scene current characteristic vectors corresponding to N-M+1 to M preset time periods before the target time period to obtain time sequence characteristic vectors corresponding to the target physical place in the target time period; S160, inputting the content feature vector, the space structure feature vector and the entity feature vector corresponding to the target physical place in the target time period into a second initial world model, and obtaining a plurality of relation feature vectors corresponding to the target physical place in the target time period.
4. The real world model training method based on the local physical environment according to claim 1, wherein S203 comprises the steps of: s2031, acquiring a preset first distance threshold; s2032, if the first distance between the target feature vector and each feature cluster set corresponding to the current second modality is smaller than or equal to the preset first distance threshold, determining that the target classification result corresponding to the target feature vector is known feature data; S2033, if the first distances between the target feature vector and all feature cluster sets corresponding to the current second modality are larger than the preset first distance threshold, determining that the target classification result corresponding to the target feature vector is unknown feature data.
5. The real world model training method based on the local physical environment according to claim 1, wherein the number of parameters corresponding to the real world model is smaller than the number of parameters corresponding to the cloud world model.
6. A real-world model training system based on a local physical environment, the training system comprising: the data acquisition module is configured to acquire a local rule knowledge base corresponding to a target physical location before a target time period and a first modal feature and a second modal feature corresponding to the target physical location in the target time period, where the first modal feature includes a content feature vector, a spatial structure feature vector and a time sequence feature vector, the second modal feature includes an entity feature vector and a relationship feature vector, and the data acquisition module includes: The feature vector acquisition sub-module is used for acquiring feature vectors of each reference physical place for each second mode in M preset time periods before the target time period, wherein the second modes comprise an entity mode and a relation mode, the feature vectors of the entity mode are entity feature vectors, the feature vectors of the relation mode are relation feature vectors, and M is a positive integer; The feature vector clustering sub-module is used for clustering all feature vectors corresponding to all reference physical places in the current second mode aiming at any second mode to obtain a plurality of feature cluster sets corresponding to the current second mode; The data storage sub-module is used for storing all feature cluster sets corresponding to the current second modality into a local rule knowledge base corresponding to the target physical place; The feature classification module is configured to classify the second modal feature in the target time period according to the local rule knowledge base, and obtain a target classification result corresponding to each feature vector in the second modal feature, where the target classification result includes known feature data and unknown feature data, and the feature classification module includes: a target feature vector determining sub-module, configured to, for any second modality, use any feature vector corresponding to the current second modality in the target time period of the target physical location as a target feature vector; The first distance calculation sub-module is used for calculating a first distance between the target feature vector and each feature cluster set corresponding to the current second mode according to the target feature vector and each feature cluster set corresponding to the current second mode; the first target classification result acquisition sub-module is used for acquiring a target classification result corresponding to the target feature vector according to a first distance between the target feature vector and each feature cluster set corresponding to the current second mode; The second target classification result obtaining sub-module is used for traversing all feature vectors corresponding to all second modes in the target time period of the target physical place and obtaining a target classification result corresponding to each feature vector in the second mode features corresponding to the target physical place in the target time period; the model training module is used for carrying out model distillation on a cloud world model of a cloud according to the local rule knowledge base, the first modal feature, the second modal feature and a target classification result corresponding to each feature vector in the second modal feature to obtain a real world model corresponding to the target physical place, wherein the real world model is used for executing a target application task according to the multi-modal feature corresponding to the target physical place.
7. The local physical environment-based real world model training system of claim 6, wherein the feature vector acquisition submodule comprises: The system comprises a history acquisition data acquisition unit, a storage unit and a storage unit, wherein the history acquisition data acquisition unit is used for acquiring a plurality of history acquisition data corresponding to each reference physical place in M preset time periods, each history acquisition data is a history acquisition image or a history acquisition text, and each history acquisition data represents the states of a plurality of target objects in the corresponding reference physical place in the current time period; The first feature coding unit is used for coding each historical acquisition data corresponding to the current preset time period of the current reference physical field according to a preset content coder aiming at any preset time period corresponding to any reference physical field, and obtaining a content feature vector corresponding to each historical acquisition data; the second feature coding unit is used for coding each historical acquisition image corresponding to the current preset time period of the current reference physical field according to a preset space structure coder and a preset character entity coder to obtain a space structure feature vector corresponding to each historical acquisition image and an entity feature vector corresponding to each target object in each historical acquisition image; The first feature stitching unit is used for stitching all content feature vectors, space structure feature vectors and entity feature vectors corresponding to the current reference physical place in the current preset time period to obtain scene current feature vectors corresponding to the current reference physical place in the current preset time period; The second feature stitching unit is used for stitching and compressing the scene current feature vectors corresponding to the current reference physical place in the current preset time period and N preset time periods before the current preset time period, and acquiring the time sequence feature vectors corresponding to the current reference physical place in the current preset time period, wherein N+1 is the preset vector stitching number, and the scene current feature vectors before the first preset time period are preset feature vectors; The relation feature acquisition unit is used for inputting the content feature vector, the space structure feature vector and the entity feature vector corresponding to the current reference physical place in the current preset time period into the first initial world model, and acquiring a plurality of relation feature vectors corresponding to the current reference physical place in the current preset time period.

Description

Real world model training method and system based on local physical environment Technical Field The invention relates to the technical field of artificial intelligence, in particular to a real-world model training method and system based on a local physical environment. Background With the development of the Internet of things, computer vision and sensor technology, training a digital model capable of accurately representing a physical place becomes a key of intelligent scene landing. Existing real-world model training methods rely on learning general rules from features in a large number of general scenarios to build models with wide applicability. However, the method only analyzes the learned common law, and is difficult to capture personalized modal characteristics in the target scene, so that the model lacks pertinence when modeling the target scene in real time, and cannot be efficiently and cooperatively trained with the existing universal world model in the cloud, so that the model cannot fully learn the personalized law of the target scene, the suitability of the model and the target scene is low, and the entity relationship and the behavior pattern in the target scene are difficult to understand accurately, so that the actual task effect is influenced. Therefore, how to train to obtain a real world model according to the characteristics of the local physical environment so as to improve the adaptability between the real world model and the target physical location is a problem to be solved. Disclosure of Invention Aiming at the technical problems, the technical scheme adopted by the invention is a real-world model training method based on a local physical environment, which comprises the following steps: S10, a local rule knowledge base corresponding to the target physical place before the target time period and a first mode feature and a second mode feature corresponding to the target physical place in the target time period are obtained, wherein the first mode feature comprises a content feature vector, a space structure feature vector and a time sequence feature vector, and the second mode feature comprises an entity feature vector and a relation feature vector. S20, classifying the second modal features in the target time period according to the local rule knowledge base, and obtaining a target classification result corresponding to each feature vector in the second modal features, wherein the target classification result comprises known feature data and unknown feature data. S30, performing model distillation on a cloud world model of the cloud according to a local rule knowledge base, the first modal feature, the second modal feature and a target classification result corresponding to each feature vector in the second modal feature to obtain a real world model corresponding to a target physical place, wherein the real world model is used for executing a target application task according to multi-modal features corresponding to the target physical place. The invention also provides a real-world model training system based on the local physical environment, which comprises: The data acquisition module is used for acquiring a local rule knowledge base corresponding to the target physical place before the target time period and a first modal feature and a second modal feature corresponding to the target physical place in the target time period, wherein the first modal feature comprises a content feature vector, a space structure feature vector and a time sequence feature vector, and the second modal feature comprises an entity feature vector and a relationship feature vector. The feature classification module is used for classifying the second modal features in the target time period according to the local rule knowledge base, and obtaining a target classification result corresponding to each feature vector in the second modal features, wherein the target classification result comprises known feature data and unknown feature data. The model training module is used for carrying out model distillation on the cloud world model of the cloud according to the local rule knowledge base, the first modal feature, the second modal feature and the target classification result corresponding to each feature vector in the second modal feature to obtain a real world model corresponding to the target physical place, wherein the real world model is used for executing the target application task according to the multi-modal feature corresponding to the target physical place. The invention has at least the following beneficial effects: the method comprises the steps of providing a local rule knowledge base corresponding to a target physical place before a target time period, providing a commonality rule corresponding to second mode features of different reference physical places in the history time period, obtaining a content feature vector, a space structure feature vector, a physical feature vector, a time sequen