CN-121980062-A - User portrait construction method and system based on multi-source heterogeneous data fusion
Abstract
The invention relates to the field of big data analysis and discloses a user portrait construction method and a system based on multi-source heterogeneous data fusion, wherein the method comprises the steps of obtaining an original log stream from a multi-source heterogeneous environment, extracting features to obtain feature expression vectors, analyzing weights, and carrying out weighting treatment to obtain a weighted feature sequence; calculating the mutual information value of the weighted feature sequence, carrying out feature fusion, carrying out matching degree verification according to the historical reference portrait, sequencing and user binding to obtain a binding behavior tag set, analyzing the binding behavior tag set, inputting the binding behavior tag set into a pre-trained incremental learning model, obtaining a core tag value after variance verification, mapping to generate an optimized recommendation sequence, executing the optimized recommendation sequence, and collecting an interactive behavior log to realize dynamic evolution of the portrait of the user. The invention can realize the efficient integration of multi-source data and the dynamic update of the portrait tag, thereby improving the precision and timeliness of the portrait of the user.
Inventors
- ZHANG XIAOHUA
- CHEN RUI
- Liao Weina
Assignees
- 上海微创软件股份有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260407
Claims (9)
- 1. A user portrait construction method based on multi-source heterogeneous data fusion is characterized by comprising the following steps: the method comprises the steps of obtaining an original log stream from a multi-source heterogeneous environment, mapping the original log stream, constructing a dynamic topological structure diagram, inputting the dynamic topological structure diagram into a pre-trained graphic neural network, and generating a characteristic representation vector; Analyzing the source importance degree of the original log stream by adopting a preset attention mechanism to obtain an attention weight coefficient, and carrying out weighting treatment on the feature expression vector according to the attention weight coefficient to obtain a weighted feature sequence; Calculating mutual information values between every two dimension features in the weighted feature sequence to obtain a first mutual information value, judging the two dimension features as high correlation features when the first mutual information value exceeds a preset correlation threshold, and inducing all the high correlation features to obtain a unified data structure; Inquiring a historical reference image of a user, inputting the unified data structure and the historical reference image into a pre-trained matching degree verification model to obtain matching scores, and performing label reasoning and binding according to the matching scores to obtain a binding behavior label set; analyzing the binding behavior tag set by using a preset feature mapping table to obtain a current feature vector, inputting the current feature vector into a pre-trained incremental learning model to obtain a core tag value, and checking the core tag value to stabilize the core tag value; Indexing the stable core label value to obtain an initial candidate commodity set, and rearranging commodities in the initial candidate commodity set through a preset feature matching degree model check sum to generate an optimized recommendation sequence; And executing the optimized recommendation sequence, collecting an interactive behavior log of the user on the optimized recommendation sequence, and readjusting the binding behavior tag set according to the interactive behavior log to realize dynamic evolution of the user portrait.
- 2. The user portrayal construction method based on multi-source heterogeneous data fusion of claim 1, wherein the mapping the original log stream to construct a dynamic topology structure map and inputting to a pre-trained graph neural network to generate a feature representation vector comprises: Performing standardized mapping on the original log stream through a predefined semantic analysis rule to generate an intermediate state data sequence; Constructing a dynamic topological relation diagram reflecting a connection relation according to the intermediate state data sequence, and converting the dynamic topological relation diagram into diagram structure data by vectorization; And inputting the graph structure data into a pre-trained graph neural network, and updating the node embedded representation of the graph structure data to obtain a feature representation vector.
- 3. The user portrait construction method based on multi-source heterogeneous data fusion according to claim 1, wherein said inducing all of said high correlation features to obtain a unified data structure includes: Summarizing all the high-correlation features to obtain a high-correlation feature set, mapping the high-correlation feature set to the feature expression vector, and splicing to obtain a feature fusion tensor; And splicing the feature fusion tensor and the high-correlation feature set to obtain a unified data structure.
- 4. The user portrayal construction method based on multi-source heterogeneous data fusion of claim 1, wherein the querying the historical reference portrayal of the user inputs the unified data structure and the historical reference portrayal into a pre-trained matching degree verification model to obtain a matching score, performs label reasoning and binding according to the matching score to obtain a binding behavior label set, comprising: extracting specific key value pairs in the unified data structure as a user behavior sequence; Inquiring to obtain a historical reference image of a user, calculating the matching score of the user behavior sequence and the historical reference image through a pre-trained matching degree verification model, and if the matching score exceeds a preset abnormality judgment threshold, adding corresponding behavior labels according to a preset behavior-label mapping table, and summarizing all the behavior labels to obtain an atomic binding label group; Calculating mutual information values of every two labels in the atom binding label group, sequencing the priority of the atom binding label group according to the magnitude of the mutual information values, mapping every two labels to a preset reasoning label mapping table to obtain corresponding reasoning labels, and summarizing all the reasoning labels to obtain a reasoning binding label group; Extracting an inference tag with the maximum mutual information value in the inference binding tag group, mapping the inference tag to a preset decision tag mapping table to obtain decision tags, and binding all tags in a mapping chain with the current user ID in real time to obtain a binding behavior tag set.
- 5. The user portrait construction method based on multi-source heterogeneous data fusion according to claim 1, wherein the analyzing the binding behavior tag set by using a preset feature mapping table to obtain a current feature vector, inputting the current feature vector to a pre-trained incremental learning model to obtain a core tag value, and verifying the core tag value to obtain a stable core tag value includes: performing directed semantic disassembly on the binding behavior tag set through a preset feature mapping table to obtain preference dimension data; the preference dimension data is subjected to standardized conversion to obtain a current feature vector, and the current feature vector is input into a pre-trained incremental learning model to obtain a core tag value; Inquiring a historical calculation result of the core tag value, calculating a variance of the historical calculation result, and obtaining a stable core tag value when the variance is smaller than a preset stability threshold.
- 6. The user portrait construction method based on multi-source heterogeneous data fusion according to claim 1, wherein the indexing the stable core tag value to obtain an initial candidate commodity set, rearranging commodities in the initial candidate commodity set through a preset feature matching degree model checksum, and generating an optimized recommendation sequence includes: mapping the stable core label value to a preset association rule set to obtain an association commodity category set; indexing a preset commodity feature library according to the associated commodity category set to obtain an initial candidate commodity set; Verifying commodities in the initial candidate commodity set through a preset feature matching degree model to obtain a matching degree score of an effective commodity set and each commodity; and according to the matching degree score, carrying out descending order arrangement on the commodities in the effective commodity set, and generating an optimized recommendation sequence.
- 7. The method of claim 6, wherein verifying the items in the initial candidate item set by a predetermined feature matching degree model to obtain a matching degree score for the valid item set and each item comprises: Inquiring the attributes of the commodities in the effective commodity set according to a preset commodity attribute library to obtain a commodity attribute set; performing association degree calculation according to the commodity attribute set and the binding behavior label set to obtain a matching degree score of each commodity; When the commodity attribute set contains attributes conflicting with the binding behavior tag set, marking the commodity as non-purchasable commodity; and subtracting the initial candidate commodity set from a set formed by all the non-purchasable commodities to obtain the effective commodity set.
- 8. The method for constructing the user portraits based on the multi-source heterogeneous data fusion according to claim 1, wherein the executing the optimized recommendation sequence, collecting the interactive behavior log of the user for the optimized recommendation sequence, and readjusting the binding behavior tag set according to the interactive behavior log, so as to realize dynamic evolution of the user portraits, comprises: Executing the optimized recommendation sequence, acquiring an interactive behavior log of a user on the optimized recommendation sequence, and analyzing the interactive behavior log through a predefined semantic analysis rule to obtain a feedback data stream; calculating a loss deviation value of the feedback data stream by using the preset attention mechanism, and adjusting the attention weight coefficient by using the loss deviation value to obtain an enhanced attention weight coefficient; And according to the enhanced attention weight coefficient, recalculating a first mutual information value and binding the labels again, and for the conflicting binding labels, replacing the old label with the new label to realize continuous updating and dynamic evolution of the user portrait.
- 9. A user portrayal construction system based on multi-source heterogeneous data fusion is characterized by comprising: The feature extraction module is used for acquiring an original log stream from a multi-source heterogeneous environment, mapping the original log stream, constructing a dynamic topological structure diagram, inputting the dynamic topological structure diagram into a pre-trained graph neural network and generating a feature representation vector; the feature weighting module is used for analyzing the source importance degree of the original log stream by adopting a preset attention mechanism to obtain an attention weight coefficient, and carrying out weighting processing on the feature representation vector according to the attention weight coefficient to obtain a weighted feature sequence; the feature fusion module is used for calculating mutual information values between every two dimension features in the weighted feature sequence to obtain a first mutual information value, and judging that the two dimension features are high-correlation features when the first mutual information value exceeds a preset correlation threshold value, and inducing all the high-correlation features to obtain a unified data structure; The user portrait module inquires a historical reference image of a user, inputs the unified data structure and the historical reference image into a pre-trained matching degree verification model to obtain matching scores, and performs label reasoning and binding according to the matching scores to obtain a binding behavior label set; The variance checking module is used for analyzing the binding behavior label set by utilizing a preset feature mapping table to obtain a current feature vector, inputting the current feature vector into a pre-trained incremental learning model to obtain a core label value, and checking the core label value to stabilize the core label value; The commodity recommendation module is used for indexing the stable core label value to obtain an initial candidate commodity set, and rearranging commodities in the initial candidate commodity set through a preset feature matching degree model check sum to generate an optimized recommendation sequence; and the portrait evolution module is used for executing the optimized recommendation sequence, collecting the interactive behavior log of the user on the optimized recommendation sequence, and readjusting the binding behavior tag set according to the interactive behavior log to realize the portrait dynamic evolution of the user.
Description
User portrait construction method and system based on multi-source heterogeneous data fusion Technical Field The invention relates to the field of big data analysis, in particular to a user portrait construction method and system based on multi-source heterogeneous data fusion. Background At the moment of digital transformation deep pushing, how to break data barriers and integrate heterogeneous data sources based on big data mining and big data analysis, and to construct comprehensive, accurate and dynamic user figures becomes a research key point in the field. Currently, in the conventional user portrait construction technology, a traditional fixed processing mode is generally adopted in multi-source heterogeneous data processing, the core is that different service system data are butted through a fixed interface, are uniformly stored after being converted according to a preset format, and are matched with user portrait labels through static association logic, and the method is widely applied in the initial stage of multi-source data processing by partially using manual arrangement or auxiliary integration of an intermediate database. However, the objective defects are prominent, the existing mechanism lacks dynamic self-adaption and real-time association mining capability for multi-source heterogeneous data, so that data access stiffness, value fusion resistance and portrait update hysteresis are caused, and high-efficiency and accurate application in a large-scale scene is difficult to support. In summary, the existing user portrait construction technology lacks dynamic self-adaption and real-time association mining capability for multi-source heterogeneous data, which directly leads to data access stiffness, value fusion blocking and portrait update hysteresis, so that portrait accuracy and timeliness are seriously damaged, and high-efficiency and accurate application in a large-scale scene is difficult to support. Disclosure of Invention The invention provides a user portrait construction method and a system based on multi-source heterogeneous data fusion, which are used for realizing efficient integration of multi-source data and dynamic updating of portrait labels, further improving the accuracy and timeliness of user portraits and supporting the core requirements of data driving decisions in the digital upgrading process of each industry. In order to solve the technical problems, the present invention provides a user portrait construction method based on multi-source heterogeneous data fusion, including: the method comprises the steps of obtaining an original log stream from a multi-source heterogeneous environment, mapping the original log stream, constructing a dynamic topological structure diagram, inputting the dynamic topological structure diagram into a pre-trained graphic neural network, and generating a characteristic representation vector; Analyzing the source importance degree of the original log stream by adopting a preset attention mechanism to obtain an attention weight coefficient, and carrying out weighting treatment on the feature expression vector according to the attention weight coefficient to obtain a weighted feature sequence; Calculating mutual information values between every two dimension features in the weighted feature sequence to obtain a first mutual information value, judging the two dimension features as high correlation features when the first mutual information value exceeds a preset correlation threshold, and inducing all the high correlation features to obtain a unified data structure; Inquiring a historical reference image of a user, inputting the unified data structure and the historical reference image into a pre-trained matching degree verification model to obtain matching scores, and performing label reasoning and binding according to the matching scores to obtain a binding behavior label set; analyzing the binding behavior tag set by using a preset feature mapping table to obtain a current feature vector, inputting the current feature vector into a pre-trained incremental learning model to obtain a core tag value, and checking the core tag value to stabilize the core tag value; Indexing the stable core label value to obtain an initial candidate commodity set, and rearranging commodities in the initial candidate commodity set through a preset feature matching degree model check sum to generate an optimized recommendation sequence; And executing the optimized recommendation sequence, collecting an interactive behavior log of the user on the optimized recommendation sequence, and readjusting the binding behavior tag set according to the interactive behavior log to realize dynamic evolution of the user portrait. In a second aspect, the present invention provides a user portrayal construction system based on multi-source heterogeneous data fusion, including: The feature extraction module is used for acquiring an original log stream from a multi-source heterogeneou