CN-121979965-A - Automatic grammar tree structure labeling and behavior pattern mining method and system based on player behavior log
Abstract
The invention discloses a method and a system for automatically marking grammar tree structures and mining behavior patterns based on player behavior logs. The method comprises the steps of cleaning player behavior logs, session segmentation and terminal symbol mapping, constructing a compressed analysis forest based on a behavior grammar, establishing a shared subtree index, constructing a residual fingerprint retrieval history convergence state to start inference at a temperature, adopting an alternate direction multiplier method to conduct multi-constraint inference such as generation type consistency, stage monotonicity and resource boundaries on the analysis forest, outputting a residual tensor of a node label posterior and aligned according to node-generation type-constraint types, and conducting normalized coding and frequent subtree mining on a stable grammar tree to generate a evidence-carrying behavior pattern library and failure positioning mapping. The scheme improves the consistency and the interpretability of the labels, reduces the cost of labor and iteration, and improves the controllability of on-line triggering of the modes.
Inventors
- Si Yihan
- SHEN XISHENG
- XU FANGLEI
- MAO YAOHUA
Assignees
- 杭州云欣谷网络科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260407
Claims (10)
- 1. A grammar tree structure automatic labeling and behavior pattern mining method based on player behavior log is characterized by comprising the following steps: s1, mapping player behavior log events into a terminal symbol sequence according to a session, and constructing a compressed analysis forest with a shared subtree index according to a behavior grammar; S2, establishing a constraint inference model in an extended Lagrangian form on the analysis forest, wherein the constraint at least comprises a production type consistency constraint, a stage monotonicity constraint and a resource flow conservation or resource boundary constraint, and adopting an alternate direction multiplier method ADMM to carry out iterative solution to obtain a node label posterior and generate a residual tensor, wherein the initial dual variable and the punishment parameter of the ADMM are started by the historical convergence state temperature obtained by residual fingerprint retrieval; S3, constructing a conflict traceability graph by using the residual tensor, solving a conflict dominant subtree with the minimum number of nodes, which covers the residual quality not lower than a threshold value, according to a fixed greedy coverage criterion, and performing manual verification only on the conflict dominant subtree; S4, only carrying out the update of a preset structured production type conversion template on the production type set associated with the conflict dominant subtree, judging whether the upper boundary of the residual error is monotonically reduced on the verification session set, and rolling back the update if the reduction condition is not met; And S5, normalizing the updated annotation grammar tree, and mining frequent subtrees to generate a evidence-carrying behavior pattern library.
- 2. The method of claim 1, wherein the residual fingerprint is generated and used for retrieval from a splice of three types of summary information: (1) N-gram statistics summary of session terminator sequence; (2) Analyzing the proportion of the shared subtrees of the forest or the density abstract of the shared structure; (3) The residual type distribution vector abstract at least reflects the duty ratio of three types of residual errors of the consistency of the generation type, the monotonicity of the stage and the resource constraint; when the splicing result is retrieved through hash or similarity to obtain a hit item, multiplexing a dual variable corresponding to the hit item and a punishment parameter as an ADMM initial value, adopting a default initial value when the hit item is missed, and recording a new fingerprint and a corresponding convergence state.
- 3. The method of claim 1, wherein the residual tensor has a "node-generator-constraint type" as an alignment dimension and is derived from a weighted combination of an original residual component and a dual residual component, wherein: (1) The calculation caliber of the original residual error component is that for each superside in the analysis forest, the local superlimit or unsatisfied degree of the superside under each constraint type is calculated according to the initial labeling and constraint rules, the local superlimit is expressed in a non-negative form, the local superlimit is weighted and accumulated according to the posterior probability or weight of the superside, and the accumulated result is collected to a head node-generation-constraint type item corresponding to the superside; (2) Calculating norms or absolute increments of update increments of dual variables in ADMM iteration, and decomposing and collecting according to node-generation-constraint types; (3) And merging residual error entries of the isomorphic structure at the shared subtree index, and returning the merged residual error to the father node according to the superb posterior weight or the structure contribution proportion to generate the formula entry.
- 4. The method of claim 1, wherein the conflict-traceability graph has "node-generative-constraint type" as vertices and a resolved forest superside connection and shared subtree index relationship as edges, and wherein the fixed greedy coverage criteria comprises: (1) Defining the total forest residual quality as the sum of all residual tensor entries; (2) Calculating residual quality which can be covered by the induced subtrees of each candidate node; (3) Iteratively selecting candidate nodes which can bring the maximum unit newly added node coverage residual error gain from the empty set to add a conflict dominant subtree set until the residual error quality covered by the set reaches or exceeds a preset proportion threshold; and outputting the conflict dominant subtrees, the associated generative sets and residual contribution sequences thereof for generating the manual verification task.
- 5. The method of claim 1, wherein the structured generative transformation template comprises only the following three classes and only allows for acting on the set of productions associated with the conflict dominant subtree: (1) Splitting the non-terminal symbol into a non-terminal symbol with a stage index according to the stage number, and limiting the generation type expansion to be only advanced between adjacent stages so as to eliminate the stage reverse sequence residual error; (2) The resource stream annotation splitting template is characterized in that a single-step generation formula is rewritten into a two-step generation formula, and an intermediate non-terminal symbol carrying a resource increase and decrease annotation is introduced, so that resource conservation or resource boundary constraint can be checked and satisfied at the intermediate non-terminal symbol; (3) Combining and template with prefix factor, namely performing factorization and combination on a plurality of production formulas with the same prefix, and introducing branch non-terminal symbols to accept different suffixes so as to reduce the inconsistent residual error of the production formulas; the template selection is determined by the type of residual dominant constraint inside the conflict dominant subtree.
- 6. The method of claim 1, wherein the rollback admission is based on verifying an upper bound for residuals on a session set, the upper bound for residuals being defined as aggregate values for a total amount of residual components for each constraint type, the updated grammar must satisfy: (1) Verifying that the residual upper bound on the session set falls not less than a preset threshold as compared to before updating, and (2) At least two types of constraint residuals fall down simultaneously or entropy of residual type distribution is reduced; Otherwise, the current generation type rewrite and the resolver write-back are cancelled; The evidence-carrying behavior pattern library outputs a certificate for each pattern, the certificate at least comprises a pattern subtree standardization representation, a stage guardrail inspection type set, a resource inspection type set, a dual upper bound abstract and a failure positioning mapping, and the failure positioning mapping at least indicates nodes, production formulas and constraint types which cause verification failure.
- 7. A grammar tree structure automatic labeling and behavior pattern mining system based on player behavior logs is used for realizing the method of any one of claims 1-6 and is characterized by comprising a session mapping module, a parsing forest construction module, a residual fingerprint retrieval module, an ADMM constraint inference and residual tensor generation module, a conflict tracing map construction module, a conflict subtree greedy solution module, a manual verification module, a templated generation type conversion and rollback admission module, a parser write-back module, a grammar tree standardization and frequent subtree mining module and a evidence-carrying pattern library output module.
- 8. The system of claim 7, wherein the residual fingerprint search module establishes a residual fingerprint-dual variable-penalty parameter index table, issues a warm start initial value to the ADMM constraint inference and residual tensor generation module when searching hits, and updates the index table to multiplex a convergence state and a penalty parameter adaptive result after each round of ADMM convergence, the templatization generating type transformation and rollback admission module comprises a template selection unit, a verification evaluation unit and a rollback unit, the verification evaluation unit calculates residual upper bound and residual type distribution entropy on a verification session set and judges whether admission conditions are met, the rollback unit cancels generating type rewrite, parameter write-back and resolver structure change caused by the residual upper bound and residual type distribution entropy when the admission conditions are not met, the rollback unit outputs certificates and failure positioning mapping when frequent subtree modes are output, so that an external policy engine can verify according to a stage guardrail checking type set and a resource checking type set in the certificates when the mode is triggered, and uses the failure positioning mapping to generate type-constraint type to trigger rollback, recheck or retrain flow when the verification fails.
- 9. A computer device comprising a processor, a graphics processing unit GPU and a memory, wherein the memory stores a computer program which, when executed by the processor and the graphics processing unit, causes the computer device to perform the method of any of claims 1 to 6.
- 10. A computer readable storage medium having stored thereon a computer program which, when executed by a computer device, causes the computer device to perform the method of any of claims 1-6.
Description
Automatic grammar tree structure labeling and behavior pattern mining method and system based on player behavior log Technical Field The invention relates to the field of computers, in particular to a method and a system for automatically marking grammar tree structures and mining behavior patterns based on player behavior logs. Background With the large-scale operation of network games, mobile games and cloud games, the platform side can continuously collect a large number of player behavior logs, such as log-in/log-out, task access and completion, stage entry/exit, stage acquisition and consumption, payment and refund, team and social interaction, abnormal disconnection and reconnection and other event sequences. Such logs often have the characteristic of "semi-structured + strongly context dependent" in that, on the one hand, event fields differ between different versions, different activity periods, different channel SDKs, and on the other hand, the semantics of player behavior and phase relationships (e.g., novice guidance phase, mainline propulsion phase, pay conversion phase, churn prophase, etc.) are not determinable by a single event, but need to be understood in combination with changes in session context and resource flows (e.g., gold coins, physical forces, props, tickets, etc.). To support anti-cheating, anomaly detection, player grouping, path analysis, and refined operations, the industry typically needs to further structure the raw behavioral logs into computable "behavioral templates/patterns" and develop pattern mining and decision triggering on the basis of these. However, the traditional mode of only relying on field splitting, regular templates, fixed rule base or simple clustering often has difficulty in considering generalization capability and semantic consistency of cross-source isomerism, and has high maintenance cost and insufficient interpretability when the service is iterated rapidly, so that the problems of pattern library drift, false triggering, missed triggering and the like are gradually highlighted. In the prior art, various schemes exist for log mode discovery and log structuring. For example, publication number CN108241658a proposes a log mode discovery method and system, the idea of which includes collecting log messages and forming a log content list, replacing according to entity characteristic values and replacement rules, merging the same log content to obtain a log content data set, further performing part-of-speech labeling on the log content, performing syntax analysis by using a probability context-free grammar to generate a syntax tree, defining the grammar, extracting a key information model according to the syntax tree and the grammar, merging logs belonging to the same key information model to obtain a log mode record and a static mode data table, and the like. The method has the advantages that the syntactic analysis in natural language processing is introduced into the log mode division process, so that semantic structure information can be reserved more completely than the traditional regular/field interception, and the log mode division is more scientific and complete. However, from the practical application point of facing the player behavior log, the player behavior is not a static text structure of a single log statement, but a cross-event and cross-stage sequence structure, and the problems of resolving ambiguity, labeling conflict and the like caused by multi-session context difference, cross-version behavior semantic drift and inconsistent constraint (such as stage reverse order, resource conservation and path breakage) can still be faced by only merging by a syntax tree or a key information model. Meanwhile, the scheme mainly emphasizes that 'generating a syntax tree-extracting a key information model-merging into a mode', and does not give a closed loop mechanism for accurately attributing conflict sources under a multi-constraint conflict scene, minimizing a manual check range and evolving a grammar rule on the premise of ensuring online safety. Furthermore, patent US8473300B1 discloses a method and system of log mining to modify grammar-based text processing, the general idea of which is to receive an activity log from a device, the activity log containing input instructions, functions determined based on matching of the input instructions with grammar-based text patterns, and a response based on confirmation of the determined functions, then compare the current activity log with a stored historical activity log to determine a correlation, and modify the grammar-based text patterns based on the determined correlation, thereby updating the grammar-based text patterns on the device side. The scheme emphasizes that the correlation among logs is utilized to drive the adjustment of grammar type modes, the closed-loop idea of 'log mining-mode correlation-grammar updating' is embodied, and the method can adapt to the change of input instructions and