CN-121999984-A - Personalized diet inquiry recommendation method based on reinforcement learning
Abstract
The invention discloses a personalized diet inquiry recommendation method based on reinforcement learning, which comprises the following steps of collecting data, preprocessing to generate a standardized input sequence, constructing a multi-mode causal dependency graph and forming a causal link set, generating a metabolism pre-measurement, a physiological conflict amount, a multi-scale health target vector and a symptom backtracking chain, constructing an extension Option-Critic, inputting the metabolism pre-measurement, the physiological conflict amount, the symptom backtracking chain and the multi-scale health target vector into a network to generate an inquiry action sequence and a diet decision action sequence, updating a user state to generate a user feedback sequence, sending a new state item and the user feedback sequence into the network as states and rewards to execute iteration, and outputting a personalized diet inquiry recommendation result. The invention adopts the extension Option-Critic to realize the dynamic diet inquiry and decision optimization, and has the advantages of strong individuation, real-time response and high health adaptation degree.
Inventors
- ZHOU LIHUI
Assignees
- 杏林(广东)中医医疗有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260119
Claims (8)
- 1. The personalized diet inquiry recommendation method based on reinforcement learning is characterized by comprising the following steps of: collecting multisource original data comprising diet records, physiological monitoring data and symptom data, and preprocessing the multisource original data to generate a standardized input sequence; Constructing a multi-mode causal dependency graph with multiple types of health related nodes based on a standardized input sequence, and generating a causal link set according to the node relevance and the time sequence recurrence relation; Generating a metabolism pre-measurement, a physiological conflict amount and a multi-scale health target vector based on the standardized input sequence, and performing reverse link search based on the causal link set to generate a symptom backtracking chain; Constructing an extended Option-Critic, setting a consultation Option, a symptom analysis Option, a metabolism verification Option, a meal decision Option and a feedback integration Option, and setting an Option strategy, an Option starting function and an Option ending function for each Option; Inputting a metabolism pre-measurement, a physiological conflict quantity, a symptom backtracking chain and a multi-scale health target vector into an extension Option-Critic to generate a consultation action sequence and a meal decision action sequence; Updating the user state according to the inquiry action sequence, generating a new state item, and generating a user feedback sequence based on the meal decision action sequence and the time sequence change of the physiological monitoring index after the user performs the action; and (3) inputting the new state item and the user feedback sequence as states and rewards to an extended Option-Critic execution strategy iteration, and outputting a personalized meal inquiry recommendation result.
- 2. The reinforcement learning based personalized diet inquiry recommendation method of claim 1, wherein the generation of the standardized input sequence specifically comprises: Unifying time references of the acquired multisource original data comprising diet records, physiological monitoring data and symptom data, and generating time alignment items; Carrying out data integrity processing on the time alignment item, and carrying out normalization on the missing information, the abnormal information and the discontinuous information in the item according to a preset processing rule to generate an integrity normalization item; Executing structure unification on the integrity regular items, formatting the diet records, the physiological monitoring data and the symptom data into a unified item structure according to a fixed structure, and generating structure unification items; Carrying out data stability processing on the uniform structure items, and setting the data fragments with unstable characteristics according to a preset stabilization rule to generate stabilized items; Performing scale unification on the stabilized items, mapping different types of data to a uniform scale structure in a numerical interval, a variation proportion and a time span, and generating scale unifying items; Performing normalization processing on the items with consistent scales, mapping the item values to a preset normalization interval, and generating normalized items; and combining the normalized entries according to the unified time index order to generate a normalized input sequence.
- 3. The reinforcement learning based personalized diet inquiry recommendation method of claim 1, wherein the generation of the causal link set specifically comprises: Sequentially reading each time step item from the standardized input sequence, and splicing standardized food records, standardized physiological monitoring data and standardized symptom data in the standardized food records to generate a node original unit; Directly converting the node original units into node entities with multiple types of health related attributes, and constructing a node entity sequence according to a time index sequence; Aiming at the node entity sequence, comparing health characteristics of the node entity sequence node by node, generating association description items among the nodes and combining the association description items into a node association set; extracting the sequence relation of the node entity sequences node by utilizing the time index of the node entity sequences, generating a node time sequence push record and forming a time sequence recursion process set; Cross pairing is carried out on the node association degree set and the time sequence recursion process set according to node indexes, so that a joint node item group is generated; establishing connection records for corresponding nodes based on the joint node item groups, and integrating all the connection records into a node connection set; combining the node entity sequence with the node connection set to generate a multi-mode causal dependency graph; on the multi-mode causal dependency graph, screening all node connections according to node association and time sequence recursion relation to obtain node pairs with causal directivity and generate causal node item groups; Based on the causal node entry set, the node pairs are arranged in time order and the chain structure is reconstructed to generate a causal link set.
- 4. The reinforcement learning-based personalized diet inquiry recommendation method of claim 1, wherein the generation of the symptom backtracking chain specifically comprises: Intercepting corresponding items from the standardized input sequence according to time indexes, and polymerizing standardized diet records, standardized physiological monitoring characteristics and standardized symptom characteristics of each item into a metabolic input group; the metabolism input groups are fed into a metabolism generation flow one by one, a metabolism sub-entry sequence is generated according to time sequence, and metabolism predicted quantity is output after sequence collection is completed; Taking a standardized input sequence as a carrier, independently extracting health characteristics of each time step, arranging the health characteristics into a conflict analysis unit group, deriving a physiological conflict item set from the unit group, and generating a physiological conflict quantity; Reading a standardized input sequence, splitting health features in the standardized input sequence into target computing units according to different time spans, and generating a target fragment set by the target computing units; Carrying out item integration on the target fragment set in a multi-scale mode, combining target fragments under different scales into a target integrated item and generating a multi-scale health target vector; extracting causal node pairs from the causal link set item by item, and reversing the pointing structure of each node pair to form a reverse retrieval unit; according to the backward searching unit, tracing the backward path of the causal link set section by section, and collecting node items obtained in the tracing process into a backward node group according to the appearance sequence; And (3) based on the reverse node group, re-combing the node arrangement relation according to the time index, constructing a chain structure, and generating a symptom backtracking chain.
- 5. The reinforcement learning-based personalized diet inquiry recommendation method of claim 1, wherein the constructing an extended Option-Critic specifically comprises: Constructing an Option matrix of an extension Option-Critic, respectively generating corresponding Option nodes from a consultation Option, a symptom analysis Option, a metabolism verification Option, a meal decision Option and a feedback integration Option, and forming an Option node set from all the Option nodes; Configuring an action generating structure for each option node, writing action mapping parameters into an action mapping area of the option node, generating independent option strategy nodes, and integrating all the option strategy nodes into an option strategy set; Establishing a starting identification structure for each option node, loading starting parameters in a starting trigger expression into a starting parameter area of the option node to form option starting nodes, and arranging all the starting nodes into an option starting set; Establishing a termination identification structure for each option node, writing termination parameters in a termination trigger expression into a termination parameter area of the option node to form option termination nodes, and forming an option termination set by all the termination nodes; binding the option node set, the option strategy set, the option starting set and the option ending set item by item according to the option identification, and generating an option execution layer; an internal dispatching area is built for the option execution layer, and all option execution nodes are arranged into a dispatching unit group according to the execution sequence; connecting the dispatching unit group to an action deriving area of the option execution layer; And integrally packaging the Option execution layer, the scheduling unit group and the action derivative area to form an extension Option-Critic.
- 6. The reinforcement learning-based personalized diet inquiry recommendation method according to claim 1, wherein the generation of the inquiry action sequence and the diet decision action sequence specifically comprises: Combining the metabolism prediction amount, the physiological conflict amount, the symptom backtracking chain and the multi-scale health target vector into an input item group according to a fixed sequence; Reading the input item groups item by an Option execution layer of the extended Option-Critic, distributing each input item to an Option node area of a consultation Option, a symptom analysis Option, a metabolism verification Option, a meal decision Option and a feedback integration Option, and generating an Option input set; triggering an action generating flow of each option in the option node area, generating an option internal action item for each option, and generating an action item set in a centralized manner by all the action items; sequentially arranging the action item sets according to the generated sequence to generate an initial action sequence; Performing classification splitting on the initial action sequence according to the option type, extracting action items corresponding to the inquiry options as inquiry action fragment groups, and extracting action items corresponding to the meal decision options as meal action fragment groups; Performing sequence arrangement operation on the inquiry action segment group, and arranging all inquiry action segments into continuous nodes according to time sequence to generate an inquiry action sequence; and executing node reconstruction operation on the diet action fragment group, reconstructing the diet action fragments into continuous node chains according to the generation sequence, and generating a diet decision action sequence.
- 7. The reinforcement learning based personalized diet inquiry recommendation method of claim 1, wherein the generating of the user feedback sequence specifically comprises: Sequentially reading a consultation action sequence, combining each action item in the consultation action sequence with the current user state item according to the same time index, and generating a state update item; writing action content in the state update item into a corresponding position of the user state item to generate a continuous new state item; Arranging continuously generated new state items according to the generation sequence to form a new state sequence; Reading the meal decision action sequence according to the time sequence, and matching action items in the meal decision action sequence with physiological monitoring indexes after the user executes meal decision action according to the same time index to generate feedback processing items; combining action content in the feedback processing item with the physiological monitoring index variation to generate a continuous feedback item; And arranging the continuously generated feedback items according to the generation sequence to generate a user feedback sequence.
- 8. The reinforcement learning-based personalized diet inquiry recommendation method according to claim 1, wherein the generation of the personalized diet inquiry recommendation result specifically comprises: arranging the new state items according to the generation sequence to generate a state sequence, and arranging the user feedback sequence according to the generation sequence to generate a reward sequence; synchronously sending the state sequence and the rewarding sequence into an extension Option-Critic, calling an Option update flow in a network, updating the corresponding items of the options item by item, and generating an updated item group of the options; writing the updated Option item group into an Option position of an extension Option-Critic to obtain an extended Option structure; on the basis of the expanded option structure, performing item-by-item updating on the corresponding items of the action, and arranging the updated action item groups into an action strategy sequence; recombining the action strategy sequence and the expanded option structure into a strategy block to generate a continuous strategy updating sequence; rearranging the strategy updating sequence according to the strategy index to generate a final strategy sequence; and outputting the final strategy sequence to a result generation flow to generate a decision basis, and generating a personalized diet inquiry recommendation result according to the decision basis.
Description
Personalized diet inquiry recommendation method based on reinforcement learning Technical Field The invention relates to the technical field of personalized inquiry, in particular to a personalized diet inquiry recommendation method based on reinforcement learning. Background The existing personalized diet inquiry and recommendation technology relies on static rules, fixed inquiry paths or shallow statistical models, diet tabu lists, symptom rule trees or diet recommendation logic based on single indexes which are manually compiled are generally adopted, diet behavior changes, physiological monitoring sequence fluctuation and symptom evolution processes of users cannot be dynamically adapted, modeling capability of multi-mode association relations among diet records, physiological monitoring data and symptom data is generally lacking in the traditional method, a causal dependency structure crossing modes and time spans cannot be built, progressive changes of real health states are difficult to describe, in addition, the existing system outputs recommendation results through fixed strategies, and closed-loop updating among inquiry actions, diet decision actions and user feedback is difficult to achieve. Meanwhile, reinforcement learning is not introduced into medical grade diet inquiry scenes in the prior art, a strategy optimization mechanism capable of combining metabolic trend, physiological conflict and health targets is lacking, a decision path cannot be dynamically adjusted according to physiological monitoring index changes after user execution, symptom sources cannot be traced back according to a causal link in the prior art, health targets cannot be generated in multiple time scales, and a hierarchical strategy structure capable of simultaneously processing inquiry options, symptom analysis options, metabolic check options and diet decision options is lacking. Therefore, there is still a significant disadvantage in terms of real-time, individuation, dynamic performance, closed-loop optimization, etc., and there is a need for a personalized meal inquiry recommendation method capable of implementing whole-process policy update and multi-modal causal reasoning. Disclosure of Invention The invention aims to provide a personalized diet inquiry recommendation method based on reinforcement learning, which combines the expanded Option-Critic reinforcement learning with multi-mode causal reasoning to realize dynamic diet inquiry and decision optimization and has the advantages of strong individuation, real-time response and high health adaptation degree. According to the embodiment of the invention, the personalized diet inquiry recommendation method based on reinforcement learning comprises the following steps of: collecting multisource original data comprising diet records, physiological monitoring data and symptom data, and preprocessing the multisource original data to generate a standardized input sequence; Constructing a multi-mode causal dependency graph with multiple types of health related nodes based on a standardized input sequence, and generating a causal link set according to the node relevance and the time sequence recurrence relation; Generating a metabolism pre-measurement, a physiological conflict amount and a multi-scale health target vector based on the standardized input sequence, and performing reverse link search based on the causal link set to generate a symptom backtracking chain; Constructing an extended Option-Critic, setting a consultation Option, a symptom analysis Option, a metabolism verification Option, a meal decision Option and a feedback integration Option, and setting an Option strategy, an Option starting function and an Option ending function for each Option; Inputting a metabolism pre-measurement, a physiological conflict quantity, a symptom backtracking chain and a multi-scale health target vector into an extension Option-Critic to generate a consultation action sequence and a meal decision action sequence; Updating the user state according to the inquiry action sequence, generating a new state item, and generating a user feedback sequence based on the meal decision action sequence and the time sequence change of the physiological monitoring index after the user performs the action; and (3) inputting the new state item and the user feedback sequence as states and rewards to an extended Option-Critic execution strategy iteration, and outputting a personalized meal inquiry recommendation result. Optionally, the generating of the standardized input sequence specifically includes: Unifying time references of the acquired multisource original data comprising diet records, physiological monitoring data and symptom data, and generating time alignment items; Carrying out data integrity processing on the time alignment item, and carrying out normalization on the missing information, the abnormal information and the discontinuous information in the item according to a preset proc