CN-121998506-A - Digital portrait construction method based on multi-source heterogeneous data
Abstract
The invention discloses a digital portrait construction method based on multi-source heterogeneous data, which comprises the steps of obtaining employee structured and unstructured data, establishing a multi-source heterogeneous data set, establishing four types of quantization indexes after standardized processing, designing a self-adaptive weight distribution model based on an attention mechanism, fusing subjective and objective weights, obtaining comprehensive capability scores through feature fusion, dynamically generating employee digital portraits comprising core capabilities, dominant shortboards and development potential by combining data real-time updating, and displaying by a multi-dimensional chart.
Inventors
- WU YIXI
- DU BIYU
- LI MINGTING
- WANG MENGLAN
- YE LIANG
- CHEN XI
- WANG MIAO
- BAI XUE
- ZHANG XINGXIA
- GUO JIA
- LI JIEKE
- LIU FUPENG
- YUAN YONGQING
Assignees
- 四川大学华西医院
- 成都布鲁奥森信息科技有限责任公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260128
Claims (8)
- 1. A digital portrait construction method based on multi-source heterogeneous data is characterized in that the method comprises the following steps: The method comprises the steps of S1, obtaining structured data and unstructured data of staff, wherein the structured data comprises annual assessment results, scientific research project information, job history and training records, and the unstructured data comprises paper achievements, academic job certificates and top-of-the-air files; s2, carrying out standardized processing on the multi-source heterogeneous employee data set to obtain a standardized data set; S3, constructing quantization indexes according to the standardized data set, wherein the quantization indexes comprise an assessment dimension index, a scientific research dimension index, a history dimension index and a honor dimension index; S4, designing a self-adaptive weight distribution model based on an attention mechanism, and combining subjective weights and objective weights to obtain comprehensive weights of various indexes; and S5, dynamically generating a staff digital image according to the comprehensive ability score and by combining with real-time data updating of various indexes, wherein the staff digital image comprises a core ability dimension score, a dominant short-board analysis and development potential evaluation, and displaying the staff digital image in a multi-dimensional chart, wherein the staff digital image comprises a comprehensive ability radar chart, an index weight histogram, a dynamic change trend chart and an attendance status quantization chart.
- 2. The digital representation construction method based on multi-source heterogeneous data according to claim 1, wherein the establishment of the multi-source heterogeneous employee data set is specifically implemented as follows: The method comprises the steps of distributing unique identifiers for each employee, binding the obtained structured data and unstructured data with the unique identifiers respectively, regulating the structured data, sorting annual assessment results, scientific research project information, job history and training records into standardized field formats respectively, storing the standardized field formats into structured data tables according to the unique identifiers in a classified mode, extracting unstructured data core metadata by using a natural language processing technology to form metadata lists, establishing association mapping relations between the structured data tables and the metadata lists by using the unique identifiers as indexes, defining the correspondence relation between corresponding fields of single-name employees and the metadata, combining named unstructured data primary files according to the unique identifiers of the employees, data types and time sequences to form association indexes of primary text storage paths and the metadata, integrating the structured data tables, the metadata lists, the association mapping relations and the primary text storage indexes, and constructing a multi-source heterogeneous employee data set comprising data content, association relations and storage paths.
- 3. The digital portrait construction method based on multi-source heterogeneous data according to claim 1 is characterized in that the standardization processing is to conduct data cleaning, data alignment and data normalization on the multi-source heterogeneous employee data set, the data cleaning comprises missing value filling and outlier rejection, the missing value filling adopts median filling of similar employee data, the outlier rejection adopts 3 sigma criterion, and the data alignment is based on unique identification of employees, so that time dimension and attribute dimension alignment of different data sources are achieved.
- 4. The digital portrait construction method based on multi-source heterogeneous data according to claim 1 is characterized in that a quantization formula of the assessment dimension index is: ; Wherein, the The dimension index is checked; Quantifying the coefficient for the k-th examined level; The time attenuation coefficient is the time attenuation coefficient of the kth examination, m is the examination times; the quantitative formula of the scientific research dimension index is as follows: ; Wherein, the Is a scientific research dimension index; The level coefficient of the p-th scientific research project; the p-th scientific research project participation coefficient; finishing a quality coefficient for the p-th scientific research project, wherein q is the number of the scientific research projects; the quantized index formula of the resume dimension index is as follows: ; Wherein, the The method comprises the steps of setting a history dimension index, setting d as a tenure weight coefficient, setting y as a cumulative tenure, setting e as a post important coefficient, and setting g as key post tenure times; the quantization index formula of the honor dimension index is as follows: ; Wherein, the Is a honor dimension index; a rank coefficient that is the reputation of the r term; And obtaining a time decay coefficient for the r-th honor, wherein n is the honor number.
- 5. The method for constructing a digital portrait based on multi-source heterogeneous data according to claim 1, wherein said adaptive weight distribution model is composed of an attention mechanism calculating unit, a subjective weight generating unit, an objective weight calculating unit and a weight fusion unit; The attention mechanism calculation unit is used for excavating the association relation between various quantization indexes and staff core capacity based on a multi-head attention mechanism of a transducer; The subjective weight generating unit is used for processing and evaluating the related information of the requirements based on an analytic hierarchy process; the objective weight calculation unit is used for analyzing index distribution characteristics of the standardized data set based on an entropy weight method; the weight fusion unit is used for integrating subjective weight, objective weight and attention association results.
- 6. The digital representation construction method based on multi-source heterogeneous data according to claim 4, wherein the expression of the comprehensive weight is: ; Wherein, the The comprehensive weight of the j-th class index; The coefficients are adjusted for subjective weights; subjective weight based on analytic hierarchy process; Is an objective weight based on an entropy weight method; Constructing a judgment matrix by the subjective weight through an analytic hierarchy process, and calculating a feature vector corresponding to the maximum feature value to obtain the subjective weight; the objective weight is calculated by calculating the information entropy of various indexes, and the objective weight is calculated according to an entropy weight formula: ; Wherein, the And (5) information entropy of the j-th index.
- 7. The digital representation construction method based on multi-source heterogeneous data according to claim 1, wherein the formula of feature fusion is expressed as: ; Wherein, the Scoring employee comprehensive ability; the comprehensive weight of the j-th class index; Is the quantized value of the j-th class index.
- 8. The digital representation construction method based on multi-source heterogeneous data according to claim 1, wherein the dynamic update of the employee digital representation follows a data change driving principle, and the update triggering conditions comprise structured data addition, unstructured data supplementation and quantization index calculation parameter adjustment of a multi-source heterogeneous employee data set; In the updating process, consistency verification is firstly carried out on the changed data, and the binding accuracy of the data and the unique identification of the staff and the matching rationality of the data attribute and the corresponding quantization index are confirmed; based on the change data passing the verification, calculating the quantization index value of the corresponding dimension again, synchronously updating the comprehensive weights of various indexes through a self-adaptive weight distribution model, and refreshing the comprehensive capacity score of the staff and the core capacity dimension score, the dominant short-board analysis and the development potential evaluation result contained in the staff digital portrait according to the characteristic fusion formula; each update of the employee digital portrait is based on the data change track, original information of change data, quantization index values before and after the change, comprehensive weights and comprehensive capability score change details are recorded, a traceable employee digital portrait evolution track is formed, and employee digital portrait states of different stages of the employee are inquired back according to time dimension.
Description
Digital portrait construction method based on multi-source heterogeneous data Technical Field The invention belongs to the field of intelligent data processing, and particularly relates to a digital portrait construction method based on multi-source heterogeneous data. Background The current staff management enters a digital transformation stage, and the accuracy requirements on staff comprehensive capacity assessment and development potential research and judgment are increasingly improved. The related data of the staff presents multi-source heterogeneous characteristics, and covers the structural data such as annual assessment, scientific research projects and the like and unstructured data such as paper results, table files and the like, and how to effectively integrate the data to construct comprehensive and objective staff digital portraits becomes a key requirement for improving staff management efficiency and scientificity. The existing digital portrait construction method is mainly aimed at single type data processing or simply splicing multi-source data, the quantitative index design is not enough, the weight distribution is mainly realized by adopting a fixed rule or a single subjective/objective assignment mode, the dynamic mining of the association degree of the data and the staff core capability is lacking, the portrait updating mechanism is lagged, and the display form is relatively single. The scheme has the problems of incomplete data integration, one-sided index quantification, lack of adaptability in weight distribution, insufficient image dynamic property and the like, so that the constructed digital image is difficult to comprehensively reflect the actual capability and development track of staff, and accurate and real-time data support cannot be provided for staff management decisions. Disclosure of Invention Aiming at the defects of the prior art, the invention aims to provide a digital portrait construction method based on multi-source heterogeneous data, which solves the problems that the prior art is insufficient in multi-source heterogeneous data integration, quantitative index design is one-sided, weight distribution is lack of dynamic suitability and portrait update lag, and a display form is single, so that the digital portrait of staff is insufficient in precision and cannot comprehensively reflect the actual capability and development track of the staff, and the management decision of the staff is difficult to be scientifically supported. In order to achieve the above purpose, the embodiment of the invention discloses a digital portrait construction method based on multi-source heterogeneous data, which comprises the following steps: The method comprises the steps of S1, obtaining structured data and unstructured data of staff, wherein the structured data comprises annual assessment results, scientific research project information, job history and training records, and the unstructured data comprises paper achievements, academic job certificates and top-of-the-air files; s2, carrying out standardized processing on the multi-source heterogeneous employee data set to obtain a standardized data set; S3, constructing quantization indexes according to the standardized data set, wherein the quantization indexes comprise an assessment dimension index, a scientific research dimension index, a history dimension index and a honor dimension index; S4, designing a self-adaptive weight distribution model based on an attention mechanism, and combining subjective weights and objective weights to obtain comprehensive weights of various indexes; and S5, dynamically generating a staff digital image according to the comprehensive ability score and by combining with real-time data updating of various indexes, wherein the staff digital image comprises a core ability dimension score, a dominant short-board analysis and development potential evaluation, and displaying the staff digital image in a multi-dimensional chart, wherein the staff digital image comprises a comprehensive ability radar chart, an index weight histogram, a dynamic change trend chart and an attendance status quantization chart. Further, the establishment of the multi-source heterogeneous employee data set is specifically implemented as follows: The method comprises the steps of distributing unique identifiers for each employee, binding the obtained structured data and unstructured data with the unique identifiers respectively, regulating the structured data, sorting annual assessment results, scientific research project information, job history and training records into standardized field formats respectively, storing the standardized field formats into structured data tables according to the unique identifiers in a classified mode, extracting unstructured data core metadata by using a natural language processing technology to form metadata lists, establishing association mapping relations between the structured data tables and t