CN-121981689-A - University student personalized occupation planning method based on reinforcement learning

CN121981689ACN 121981689 ACN121981689 ACN 121981689ACN-121981689-A

Abstract

The invention provides a college student personalized occupation planning method based on reinforcement learning, which comprises the steps of firstly acquiring and standardizing multi-source individual data of college students, and constructing capability vectors based on multi-mode information such as texts, videos and the like; further acquiring post portrait data, generating post capability demand vectors and calculating capability gap characterization. On the basis, a reinforcement learning environment with capability vectors, capability gap vectors and time constraint as states is constructed, a micro-intervention task package consisting of micro-course tasks, project training tasks and simulated interview training tasks is used as actions, the capability gap reduction effect and task cost are comprehensively evaluated through a reward function, and the occupation planning strategy is obtained through training. And finally, generating a personalized occupational planning result based on a planning strategy completed by training, continuously updating and dynamically correcting the planning according to time change and capability gap change in the execution process, and improving the scientificity and suitability of the occupational planning.

Inventors

GAO YUN
Luo Rujiu
ZHANG PING
LIU SHUHAN

Assignees

连云港职业技术学院

Dates

Publication Date: 20260505
Application Date: 20260204

Claims (10)

1. The personalized occupation planning method for the college students based on reinforcement learning is characterized by comprising the following steps of: S1, acquiring individual data of a target college student and carrying out structural processing, cleaning, time alignment and unified coding on the individual data to form a multi-source input data set of the target college student; s2, multi-mode feature extraction and fusion are carried out on the multi-source input data set, text feature extraction is carried out on resume text, visual and voice behavior feature extraction is carried out on the simulated interview video data, and text features, visual features and voice behavior features are fused to obtain a capability vector representing the comprehensive capability of a target college student; s3, acquiring post portrait data related to the career planning of the target college students, extracting post features of the post portrait data to obtain post capability demand vectors; S4, constructing an reinforcement learning environment for professional planning, defining an environment state as a state vector formed by a capacity vector and a capacity gap representation, defining an environment action as a micro-intervention task package for improving the capacity vector, and setting a reward function related to capacity improvement for the reinforcement learning environment; And S5, training and updating the planning strategy based on the reinforcement learning environment, outputting a micro-intervention task package by the planning strategy according to the state vector in each training round, re-acquiring updated individual data of the target college students after executing the micro-intervention task package, updating the capacity vector, calculating a reward function according to the capacity vector change before and after updating, and iteratively updating the planning strategy by using the reward function.
2. The college student personalized occupation planning method based on reinforcement learning according to claim 1, further comprising S6, outputting a personalized occupation planning result by the planning strategy based on the current state vector of the target college student after the planning strategy converges or the preset training stop condition is met, wherein the personalized occupation planning result comprises a micro-intervention task package sequence arranged in time sequence, and a post target and a capability promotion target corresponding to the micro-intervention task package sequence.
3. The college student personalized occupation planning method based on reinforcement learning according to claim 1, wherein S1 specifically comprises: acquiring individual data of a target college student, wherein the individual data at least comprises resume text, course and score records, project and practice records and simulated interview video data, and binding the individual data with the same target college student identifier; the method comprises the steps of unifying format specification and fields of individual data, converting resume text, course and result records, project and practice records and simulated interview video data into preset data structures, and obtaining normalized individual data; Carrying out deletion processing, abnormal rejection and repeated combination on normalized individual data, and carrying out time segmentation and quality screening on simulated interview video data to obtain cleaned individual data; and performing time alignment and unified coding on the cleaned individual data to form a multi-source input data set of the target college student, and taking the multi-source input data set as the input of the subsequent capability representation.
4. The college student personalized occupation planning method based on reinforcement learning according to claim 1, wherein S2 specifically is: Taking resume text in a multi-source input data set as input, extracting text features related to the capability, and obtaining text feature vectors; taking simulated interview video data in a multisource input data set as input, respectively extracting visual behavior features and voice behavior features, and generating corresponding visual feature vectors and voice feature vectors; Carrying out semantic consistency alignment on the text feature vector, the visual feature vector and the voice feature vector to obtain an aligned multi-modal feature set; And fusing the aligned multi-mode feature sets to generate a capability vector representing the comprehensive capability of the target college student, and writing the capability vector into a capability field corresponding to the multi-source input data set for calling in the subsequent step.
5. The college student personalized occupation planning method based on reinforcement learning according to claim 1, wherein S3 specifically comprises: acquiring post portrait data related to the occupation planning, wherein the post portrait data at least comprises a post name, a post skill requirement, a post capability requirement and a post evaluation index; extracting post characteristics from post image data to generate a post capacity demand vector corresponding to the post; Calculating post matching difference based on the capability vector and post capability demand vector to obtain capability gap characterization, and storing the capability gap characterization and post portrait data in a correlated manner; and determining at least one post target according to the capacity gap characterization, and determining a corresponding capacity lifting target for each post target, wherein the post target and the capacity lifting target are used as target constraint input of a subsequent reinforcement learning environment together.
6. The college student personalized occupation planning method based on reinforcement learning according to claim 1, wherein S4 specifically is: constructing a reinforcement learning environment, and defining a state vector consisting of a capability vector and a capability notch representation as an environment state; Defining a micro-intervention task package for improving the capacity vector as an environmental action, wherein the micro-intervention task package at least comprises one or more of a micro-course task, a project training task and a simulated interview training task, and setting executable parameters for the micro-intervention task package; Defining an environment state transition rule, triggering the re-acquisition of updated individual data of a target college student after executing a micro-intervention task package, and regenerating an updated capacity vector based on the updated individual data so as to obtain an updated state vector; setting a reward function, calculating the reward function at least according to the lifting amplitude of the capacity vector and the reduction amplitude of the capacity gap representation, and taking the reward function as a feedback signal updated by the planning strategy.
7. The college student personalized occupation planning method based on reinforcement learning according to claim 1, wherein S5 specifically comprises: Initializing a planning strategy, and establishing an interactive interface between the planning strategy and a reinforcement learning environment so that the planning strategy can receive a state vector and output a micro-intervention task package; in each training round, inputting the current state vector into a planning strategy, and outputting a micro-intervention task package of the current round by the planning strategy; And acquiring updated individual data after executing the micro-intervention task package, updating the multisource input data set and the capability vector based on the updated individual data, further obtaining an updated state vector, and calculating the reward function according to the capability vector and the capability gap characterization before and after updating.
8. The personalized career planning method for college students based on reinforcement learning according to claim 7, wherein S5 further comprises iteratively updating the planning strategy by using a reward function, ending training when a preset stop condition is met, the preset stop condition at least comprising a training round reaching a threshold or a stable convergence of the reward function.
9. The college student personalized occupation planning method based on reinforcement learning according to claim 2, wherein S6 specifically is: Acquiring a current multi-source input data set of a target college student, generating a current capacity vector and a current capacity gap representation by the multi-source input data set, and further constructing a current state vector; Inputting the current state vector into the planning strategy after training is completed, and outputting a micro-intervention task package sequence facing the post target by the planning strategy; and carrying out time sequence arrangement on the micro-intervention task package sequence to generate a personalized occupation planning result, wherein the personalized occupation planning result at least comprises the micro-intervention task package sequence arranged in time sequence and a capability lifting target corresponding to the micro-intervention task package sequence.
10. The personalized career planning method for college students based on reinforcement learning according to claim 9, wherein S6 further comprises periodically triggering the re-acquisition of updated individual data in the process of executing the personalized career planning result, and updating the multi-source input data set and the capability vector accordingly, thereby updating the capability gap characterization and the state vector to realize dynamic correction of the personalized career planning result.

Description

University student personalized occupation planning method based on reinforcement learning Technical Field The invention relates to the technical field of intelligent decision making and data processing, in particular to a college student personalized occupation planning method based on reinforcement learning. Background With the rapid evolution of the subdivision of university professional structures and employment sentry capability requirements, the problems faced by college students in the course of job planning are not limited to 'which industry or station to select', but gradually change into how to continuously reduce the dynamic gap between self-capability structures and target sentry capability structures under the constraint of limited time and resources. The traditional professional planning mode depends on questionnaire evaluation, expert experience or static rule recommendation, generally simplifies college student capability into a small number of discrete labels or scores, ignores implicit capability features contained in multi-mode information such as resume text, project experience, interview expression and the like, and meanwhile lacks quantitative modeling on a post capability requirement structure, so that capability matching stays on a correlation judgment level, and is difficult to score dimension, amplitude and evolution process of a specific capability gap; In addition, most of the existing professional planning schemes adopt static recommendation or staged adjustment mechanisms, the professional planning process is not regarded as a long-term sequence decision problem, system modeling of the interrelation relation among learning tasks, training tasks and time budgets is lacked, and when external conditions change or the capability improvement effect of college students is inconsistent with expectations, the existing professional planning schemes are generally difficult to trigger re-planning in time, and the planned path is easy to fail. Therefore, we propose a college student personalized occupation planning method based on reinforcement learning. The above information disclosed in the background section is only for enhancement of understanding of the background of the disclosure and therefore it may include information that does not form the prior art that is already known to a person of ordinary skill in the art. Disclosure of Invention The invention aims at overcoming the defects of the prior art, and provides a college student personalized occupation planning method based on reinforcement learning, which solves the technical problems in the background art. In order to achieve the above purpose, the present invention provides the following technical solutions: a university student personalized occupation planning method based on reinforcement learning comprises the following steps: S1, acquiring individual data of a target college student, performing structural processing, cleaning the individual data, performing time alignment and uniformly encoding to form a multi-source input data set of the target college student; S2, multi-mode feature extraction and fusion are carried out on the multi-source input data set, text feature extraction is carried out on resume text, visual and voice behavior feature extraction is carried out on the simulated interview video data, and the text feature, the visual feature and the voice behavior feature are fused to obtain a capability vector representing the comprehensive capability of a target college student; S3, acquiring post portrait data related to the occupation planning of the target college students, and extracting post features of the post portrait data to obtain post capability demand vectors; S4, constructing an reinforcement learning environment for professional planning, defining an environment state as a state vector formed by a capacity vector and a capacity gap representation, defining an environment action as a micro-intervention task package for improving the capacity vector, and setting a reward function related to capacity improvement for the reinforcement learning environment; S5, training and updating a planning strategy based on a reinforcement learning environment, outputting a micro-intervention task package by the planning strategy according to a state vector in each training round, re-acquiring updated individual data of a target college student and updating a capacity vector after executing the micro-intervention task package, calculating a reward function according to the capacity vector change before and after updating, and iteratively updating the planning strategy by using the reward function; And S6, outputting a personalized occupation planning result by the planning strategy based on the current state vector of the target college student after the planning strategy converges or meets the preset training stop condition, wherein the personalized occupation planning result comprises a micro-intervention task packag