CN-122019757-A - E-commerce data integration analysis method and system
Abstract
The invention relates to the technical field of computers, in particular to an electronic commerce data integration analysis method and system, wherein the method comprises the steps of obtaining and correlating structured business data and unstructured text data of an electronic commerce platform; the method comprises the steps of cleaning unstructured text data, segmenting words and normalizing word forms to obtain word term sequences, generating word term weight characteristics and semantic embedded characteristics based on the word term sequences, determining word term weight characteristics by single text occurrence degrees and corpus differentiation degrees, obtaining semantic embedded characteristics by central word prediction context word training, inputting the word term weight characteristics, the semantic embedded characteristics and structured business data into a deep learning semantic model, outputting probability distribution and confidence degrees of emotion or intention, constructing physiological signal characteristics according to the probability distribution, constructing interaction state characteristics based on time sequence information, message direction information and interaction event sequences to characterize change trend of message quantity along with time, conversation dominant relation and upgrading moment, standardizing and fusing multi-source characteristics to form a perception-context matrix, achieving mapping from key terms to service quality dimensions, training a set of isolated forest algorithm trees aiming at each service quality dimension, randomly selecting characteristics and segmentation points to be recursively divided into samples or meeting preset termination conditions, obtaining expected path lengths, calculating abnormal scores according to the expected path lengths, combining decision paths to generate abnormal marks and abnormal interpretations, outputting treatment suggestions for updating perception-context parameters, and performing update-isolation and deep learning parameters. The invention can solve the problems that structured business data and unstructured text data in an e-commerce platform are difficult to effectively correlate and fuse, service quality abnormality in a conversation process is difficult to identify in time, and abnormality sources and key trigger factors are difficult to explain.
Inventors
- LUO WEIWEI
- YANG MENG
- WU YUHAO
Assignees
- 深圳市伟跃电子商务有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260130
Claims (10)
- 1. An electronic commerce data integration analysis method, which is characterized by comprising the following steps: the method comprises the steps of obtaining structured business data and unstructured text data of an electronic commerce platform, and associating the structured business data with the unstructured text data; Cleaning, word segmentation and word shape normalization are carried out on the unstructured text data, and a term sequence is obtained; generating a term weight feature and a semantic embedding feature based on the term sequence, wherein the term weight feature is determined by the occurrence degree of the term in a single text and the degree of distinction in a corpus together, and the semantic embedding feature is obtained through training in a mode of predicting a context word by a central word; Inputting the term weight characteristics, the semantic embedded characteristics and the structured business data into a deep learning semantic model, outputting probability distribution and confidence of emotion or intention, constructing physiological signal characteristics according to the probability distribution, and constructing interaction state characteristics based on time sequence information, message direction information and interaction event sequences in the structured business data, wherein the interaction state characteristics are used for representing the change trend of message quantity along with time, conversation dominant relationship and upgrading moment in the conversation process; the structured business data, the physiological signal features, the interaction state features and the deep learning semantic model output are normalized and then fused into a feature matrix, and perception is established The context matrix is used as a mapping from key terms to quality of service dimensions; Training an isolation forest algorithm tree set aiming at each service quality dimension, randomly selecting characteristics of each isolation tree in the isolation forest algorithm tree set, randomly selecting dividing points in the value range of the selected characteristics, recursively dividing an input sample until the sample is isolated or a preset termination condition is met; calculating an anomaly score according to the expected path length, generating an anomaly flag and anomaly interpretation by combining decision paths of the isolated forest algorithm tree set, wherein the anomaly interpretation comprises the corresponding quality of service dimension and the triggered key term, and outputting a treatment suggestion corresponding to the anomaly flag, and the execution result of the treatment suggestion is used for updating the perception Context matrix and parameters of the deep learning semantic model.
- 2. The method for integrated analysis of e-commerce data according to claim 1, wherein the structured business data comprises a transaction record, an access log, an interactive event sequence, service request metadata and device communication indexes, and physiological acquisition indexes acquired by a terminal or a wearable device, wherein the physiological acquisition indexes at least comprise one or more of heart rate indexes, body surface temperature indexes and exercise activity indexes; The unstructured text data comprises commodity evaluation, customer service dialogue, complaint text, return goods reason description and security related statement, and when the structured business data are associated with the unstructured text data, the method comprises the steps of aggregating data of the same interactive link based on session identification, user identification or transaction identification, and reserving time sequence information and message direction information.
- 3. The method of claim 1, wherein the cleaning the unstructured text data includes deleting account numbers, topic identifications and link identifications in the unstructured text data, and normalizing character codes, cases, repeated symbols and noise segments; The word segmentation and shape normalization processing comprises word segmentation, part-of-speech recognition and shape reduction of a text so as to map synonyms or homonyms changing in shape into unified vocabulary elements, and position indexes of the vocabulary elements in the original text are reserved when the vocabulary element sequence is generated so as to support subsequent interpretation.
- 4. The method for integrating and analyzing electronic commerce data according to claim 1, wherein after the term sequence is obtained, performing occurrence frequency statistics on each term in the term sequence to obtain term frequency characteristics; The construction of the interactive status feature further comprises calculating the change of the number of messages in unit time according to the time sequence information to characterize the communication strength and fluctuation between the client and the agent, and associating the fluctuation with the service quality dimension.
- 5. The method for integrated analysis of e-commerce data of claim 4, the method is characterized in that when generating the term weight feature, the method comprises the following steps: calculating the occurrence intensity of each candidate keyword in the candidate keyword set in a single text based on the candidate keywords to form a term local importance; calculating the coverage degree of the candidate keywords in the corpus set to form a term global scarcity; combining the term local importance with the term global scarcity to obtain the term weight characteristic; In generating the semantic embedded features, comprising: Selecting adjacent terms of the current term as a central term to form a context set, and training a model to enable the central term to predict terms in the context set so as to iteratively update the term vector parameters.
- 6. The e-commerce data integration analysis method of claim 1, wherein the deep learning semantic model comprises a self-attention mechanism-based pre-training encoder and a classification head, wherein the pre-training encoder is used for generating a context representation under consideration of word order relations, the classification head is used for mapping the context representation into scores of emotion categories or intention categories and converting the scores into probability distribution through normalization mapping, the confidence is determined through the maximum value, entropy value or inter-category difference of the probability distribution, and a manual review or standby rule engine is triggered when the confidence is lower than a preset condition so as to reduce the influence of misjudgment on a service flow.
- 7. The method of claim 1, wherein said constructing physiological signal features comprises: determining an emotional intensity state based on the probability distribution, the emotional intensity state being used to characterize a negative degree, an urgency degree, or an uncertainty degree of text feedback; Taking the emotion intensity state as a driving quantity, respectively carrying out trend assignment on physiological dimensions, wherein the physiological dimensions at least comprise a heart rate dimension, a body surface temperature dimension and an activity state dimension, and the trend assignment is used for representing an ascending trend, a descending trend or a fluctuation trend of the physiological dimensions relative to a reference state; generating a physiological dimension change vector based on the trend assignment, and performing normalization and time consistency processing on the physiological dimension change vector to eliminate differences of different dimensions and sampling; and correlating the processed physiological dimension change vector with message direction information to distinguish response modes of a client side and an agent side, so as to obtain the physiological signal characteristics for abnormality detection.
- 8. The e-commerce data integration analysis method of claim 1, wherein the perception The context matrix includes: Determining a key term set related to each service quality dimension, wherein the key term set comprises any combination of quality defects, response delays, system faults, trust and uncertainty, guarantee statement and value and loyalty related expression, and forming a key word-dimension association structure based on the occurrence relation of the key terms in a text; Converting whether each text contains key terms into a perception vector composed of true and false marks, associating the perception vector with the anomaly marks to construct the perception-context matrix for explaining the source of anomaly, enabling the perception-context matrix to characterize the influence of the key terms on anomaly discrimination, determining the anomaly marks by an isolated forest algorithm according to the expected path length and anomaly scores, and outputting an isolated forest algorithm tree set for displaying the correspondence between the key terms and the anomaly marks under each service quality dimension in a tree structure mode to realize the interpretable presentation of anomaly detection results.
- 9. The method for integrated analysis of e-commerce data according to claim 1, wherein the expected path length comprises dividing samples in each isolation tree of the isolation forest according to randomly selected features and randomly selected segmentation points in sequence from a root node until the samples are separated individually or reach a preset depth, recording the number of splitting steps undergone by the samples in each isolation tree and averaging to obtain the expected path length; The method further comprises performing dimension reduction visualization processing on the feature matrix to present a distribution of normal sample clusters and abnormal sample points in a low-dimensional space; The disposition advice includes at least any combination of refund, change, compensation, worksheet upgrade, risk interception, content audit priority adjustment, or customer service ticket recommendation.
- 10. An electronic commerce data integration analysis system, the system comprising: The system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring structured business data and unstructured text data of an electronic commerce platform and correlating the structured business data with the unstructured text data; the preprocessing module is used for cleaning, word segmentation and word shape normalization processing on the unstructured text data to obtain a word term sequence; the feature generation module is used for generating a term weight feature and a semantic embedded feature based on the term sequence, wherein the term weight feature is determined by the occurrence degree of a term in a single text and the degree of distinction in a corpus together, and the semantic embedded feature is obtained through training in a way of predicting a context word by a central word; the emotion recognition module is used for inputting the term weight characteristics, the semantic embedded characteristics and the structured business data into a deep learning semantic model, outputting probability distribution and confidence of emotion or intention, constructing physiological signal characteristics according to the probability distribution, and constructing interaction state characteristics based on time sequence information, message direction information and interaction event sequences in the structured business data, wherein the interaction state characteristics are used for representing the change trend of message quantity along with time, conversation dominant relationship and upgrading moment in the conversation process; The mapping module is used for standardizing the structured business data, the physiological signal characteristics, the interaction state characteristics and the deep learning semantic model output, fusing the standardized structured business data, the physiological signal characteristics, the interaction state characteristics and the deep learning semantic model output into a characteristic matrix, and establishing perception The context matrix is used as a mapping from key terms to quality of service dimensions; The anomaly detection training module is used for training an isolation forest algorithm tree set aiming at each service quality dimension, each isolation tree in the isolation forest algorithm tree set carries out recursion division on an input sample by randomly selecting a characteristic and randomly selecting a dividing point in the value range of the selected characteristic until the sample is isolated or meets a preset termination condition; An anomaly judgment module for calculating an anomaly score according to the expected path length, generating an anomaly flag and an anomaly interpretation in combination with the decision path of the isolated forest algorithm tree set, wherein the anomaly interpretation comprises the corresponding quality of service dimension and the triggered key term, and outputting a treatment suggestion corresponding to the anomaly flag, and the execution result of the treatment suggestion is used for updating the perception Context matrix and parameters of the deep learning semantic model.
Description
E-commerce data integration analysis method and system Technical Field The invention relates to the technical field of computers, in particular to an electronic commerce data integration analysis method and system. Background In an e-commerce online service scenario, indexes such as responsiveness, availability, safety and the like are generally required to be comprehensively considered, and are often associated with emotion and context of a user in text interaction such as dialogue, evaluation and the like, so that continuous monitoring and fine optimization of a service process are difficult to realize only depending on a single data source. In the existing abnormality detection task, abnormality may be represented as fraud, abnormal transaction, abnormal behavior or improper use of a system, etc., whereas the conventional method (such as clustering, principal component analysis or statistical threshold) is easy to have limited effect under the conditions of high dimension, noise and complex interaction, and further faces the problem of trust and compliance pressure caused by insufficient interpretability. Meanwhile, the e-commerce QoS intelligent service architecture generally includes perceptions, networks, applications, etc., and needs to cooperate text understanding, deep learning decisions and emotion/physiological signal simulation obtained by text-to-speech mapping to support real-time, context-aware service recommendation and quality improvement. Disclosure of Invention In view of the above technical problems, the invention provides an e-commerce data integration analysis method and system, which aim to solve the problems that structured business data and unstructured text data in an e-commerce platform are difficult to be effectively associated and fused, service quality abnormality in a conversation process is difficult to be recognized in time, and abnormality sources and key trigger factors are difficult to interpret. Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure. According to an aspect of the present invention, an electronic commerce data integration analysis method is provided, the method includes: the method comprises the steps of obtaining structured business data and unstructured text data of an electronic commerce platform, and associating the structured business data with the unstructured text data; Cleaning, word segmentation and word shape normalization are carried out on the unstructured text data, and a term sequence is obtained; generating a term weight feature and a semantic embedding feature based on the term sequence, wherein the term weight feature is determined by the occurrence degree of the term in a single text and the degree of distinction in a corpus together, and the semantic embedding feature is obtained through training in a mode of predicting a context word by a central word; Inputting the term weight characteristics, the semantic embedded characteristics and the structured business data into a deep learning semantic model, outputting probability distribution and confidence of emotion or intention, constructing physiological signal characteristics according to the probability distribution, and constructing interaction state characteristics based on time sequence information, message direction information and interaction event sequences in the structured business data, wherein the interaction state characteristics are used for representing the change trend of message quantity along with time, conversation dominant relationship and upgrading moment in the conversation process; the structured business data, the physiological signal features, the interaction state features and the deep learning semantic model output are normalized and then fused into a feature matrix, and perception is established The context matrix is used as a mapping from key terms to quality of service dimensions; Training an isolation forest algorithm tree set aiming at each service quality dimension, randomly selecting characteristics of each isolation tree in the isolation forest algorithm tree set, randomly selecting dividing points in the value range of the selected characteristics, recursively dividing an input sample until the sample is isolated or a preset termination condition is met; calculating an anomaly score according to the expected path length, generating an anomaly flag and anomaly interpretation by combining decision paths of the isolated forest algorithm tree set, wherein the anomaly interpretation comprises the corresponding quality of service dimension and the triggered key term, and outputting a treatment suggestion corresponding to the anomaly flag, and the execution result of the treatment suggestion is used for updating the perception Context matrix and parameters of the deep learning semantic model. Further, the structured business data comprises a t