CN-121982606-A - Personality recognition method in asynchronous video and program product
Abstract
The invention discloses a personality recognition method and program product in an asynchronous video, the method comprises the steps of adopting multiple rounds of simulated annealing search, forming a plurality of face action unit optimization candidate subsets based on an original set of face action units, calculating multiple indexes of each face action unit optimization candidate subset, screening partial subsets from all face action unit optimization candidate subsets according to the multiple indexes to form a pareto front edge set, selecting a face action unit optimal subset from the face action unit optimal subset according to a distance minimum principle, extracting a plurality of key frames from the asynchronous video, extracting face action units in each key frame and a plurality of adjacent frames from the face action unit optimal subset to serve as a recognition window sample, inputting all recognition window samples into a model, carrying out semantic fusion to generate a combined global semantic representation, fusing the global semantic representation with a text answer, inputting the model, and carrying out personality recognition. The method and the device improve the stability and accuracy of personality recognition.
Inventors
- ZHANG TIANYI
- SHAN WEI
- ZHENG WENMING
- LU CHENG
Assignees
- 东南大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260121
Claims (10)
- 1. The personality recognition method in the asynchronous video is characterized by comprising the following steps of: (1) Extracting a plurality of face action units from the asynchronous video to form an original set of face action units; (2) Forming a plurality of face action unit optimization candidate subsets based on the face action unit original set by adopting a plurality of rounds of simulated annealing search, wherein in each round of simulated annealing search, the face action unit original set is formed based on the face action unit original set, and a single round of simulated annealing is executed by taking the face action unit original set as a search basis and taking an energy value as a comparison reference, so that a face action unit optimization candidate subset is finally obtained; (3) Calculating a prediction effectiveness index, a running stability index and a scale index of each face action unit optimization candidate subset; (4) Screening partial subsets from all face action unit optimization candidate subsets according to the three indexes to form a pareto front edge set, and selecting an optimal face action unit subset from the pareto front edge set according to a distance minimum principle; (5) Extracting a plurality of key frames from the asynchronous video, and extracting facial action units in each key frame and a plurality of adjacent frames from the optimal subset of facial action units as an identification window sample; (6) Inputting all the recognition window samples into a pre-trained large language model, and carrying out semantic fusion to generate a combined global semantic representation; (7) And fusing the global semantic representation with the text answer, inputting a pre-trained large language model, and performing personality recognition.
- 2. The personality recognition method in asynchronous video according to claim 1, wherein the step (2) specifically comprises the steps of: (2.1) setting the annealing round number i=1; (2.2) randomly selecting one from a plurality of initialization strategies, and forming an initial set of facial action units based on the initial set of facial action units according to the selected initialization strategy; (2.3) randomly changing the initial set of facial action units to form a facial action unit candidate subset, and performing single-round simulated annealing to obtain a facial action unit optimized candidate subset, wherein during the single-round simulated annealing, whether the facial action unit candidate subset is accepted as a current solution is judged by taking an energy value calculated based on an LSTM energy function as a comparison reference; (2.4) setting the annealing round number i=i+1, judging that i is less than or equal to the maximum annealing round number, if yes, returning to the step (2.2), otherwise, executing the step (2.5); (2.5) outputting all face action unit optimization candidate subsets.
- 3. The personality recognition method in an asynchronous video according to claim 2, wherein the plurality of initialization policies specifically includes three initialization policies, respectively: The first initialization strategy is to directly use the face action unit original set as the face action unit original set; a second initialization strategy, namely randomly selecting one from the face action unit initial set as the face action unit initial set; and a third initialization strategy, namely randomly selecting k facial action units from the original facial action unit set as the original facial action unit set, wherein k is generated by a random number generator.
- 4. The personality recognition method in asynchronous video according to claim 2, wherein step (2.3) specifically comprises the steps of: (2.3.1) taking the initial set of facial action units as initial values for the current solution; (2.3.2) randomly selecting a facial action unit from the initial set of facial action units, changing its selected state, forming a candidate subset of facial action units; (2.3.3) calculating energy values for the candidate subset of facial action units using LSTM based energy functions; (2.3.4) accepting the face action unit candidate subset as a new current solution when the energy value of the face action unit candidate subset is smaller than the energy value of the current solution, otherwise accepting the face action unit candidate subset as the new current solution with exp (-delta E/T), delta E represents the difference between the energy values of the face action unit candidate subset and the current solution, and T represents the temperature; And (2.3.5) updating the temperature according to a preset temperature attenuation coefficient, judging whether a termination condition is met, ending the single-round simulated annealing process when the termination condition is met, outputting the current solution as a face action unit optimization candidate subset, and otherwise, returning to (2.3.2) for continuous execution.
- 5. The method for personality recognition in asynchronous video according to claim 1, wherein step (3) specifically comprises: (3.1) optimizing the candidate subset for each facial action unit, and calculating the prediction error of the candidate subset on personality recognition as a prediction effectiveness index of the candidate subset; (3.2) optimizing the candidate subset for each facial action unit, and calculating the comprehensive similarity according to the similarity between the candidate subset and other facial action unit optimizing candidate subsets to obtain the comprehensive similarity as an operation stability index; (3.3) optimizing the candidate subset for each facial action unit, and counting the number of the facial action units contained in the candidate subset as a scale index.
- 6. The method for personality recognition in asynchronous video according to claim 1, wherein step (4) specifically comprises: (4.1) constructing a dominant relationship in all face action unit optimization candidate subsets according to the prediction effectiveness index, the running stability index and the scale index, screening out all face action unit optimization candidate subsets which are not subjected to the dominant relationship by other face action unit optimization candidate subsets, and forming a pareto front edge set; And (4.2) calculating the mean value of three indexes of each facial action unit optimization candidate subset in the pareto front edge set, selecting the facial action unit optimization candidate subset with the minimum mean value as an ideal point, calculating the distance from each facial action unit optimization candidate subset to the ideal point, and selecting the facial action unit optimization candidate subset with the minimum distance as the facial action unit optimization candidate subset.
- 7. The method for personality recognition in asynchronous video according to claim 1, wherein step (6) specifically comprises: (6.1) inputting each recognition window sample into a pre-trained large language model, and obtaining natural semantic descriptions with semantics of the recognition window samples based on a preset first structuring instruction template; And (6.2) inputting the natural semantic description of each recognition window sample into a trained large language model, sequentially iterating and fusing the natural semantic description of other recognition window samples based on a preset second structured instruction template, and generating a global semantic representation based on iterative semantic fusion.
- 8. The method for personality recognition in asynchronous video according to claim 1, wherein the step (7) specifically comprises: (7.1) obtaining global semantic representation and answers of personnel to be identified in the asynchronous video aiming at text formats of the evaluation questions, and performing joint coding to obtain high-dimensional semantic vectors; and (7.2) inputting the high-dimensional semantic vector into a pre-trained large language model to perform personality recognition.
- 9. The method of personality recognition in asynchronous video according to claim 1, further comprising the step of, after step (7): (8) And adopting LoRA fine tuning technology to update the low-rank increment of the weight of the attention layer of the large language model.
- 10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method according to any of claims 1-9.
Description
Personality recognition method in asynchronous video and program product Technical Field The present invention relates to large language models, and more particularly, to a personality recognition method and program product in asynchronous video. Background With the wide application of Large Language Models (LLM) in the fields of natural language understanding and personality recognition, text-based large model personality recognition methods have become mainstream. However, in an Asynchronous Video Interview (AVI) scenario, a single text modality often fails to adequately reflect the non-linguistic behavioral characteristics of the candidate, thereby limiting the accuracy of personality assessment. Existing multi-modal methods generally rely on full-face image features or sparsely sampled information, but such processing approaches can break the continuity of facial motion in the time dimension, making it difficult to capture personality-related subtle dynamic changes. In addition, facial expressions in interview scenes are often more restricted, resulting in problems of noisy, insufficient effective discrimination, etc. visual features based on overall facial characterization. Therefore, how to effectively preserve local time variation of facial motion, reduce interference caused by irrelevant visual features, and improve recognition accuracy is a key problem to be solved in the prior art. Disclosure of Invention Aiming at the problems existing in the prior art, the invention aims to provide a personality recognition method and program product in asynchronous video with higher recognition accuracy. In order to achieve the above object, the present invention provides the following technical solutions: A personality recognition method in asynchronous video comprises the following steps: (1) Extracting a plurality of face action units from the asynchronous video to form an original set of face action units; (2) Forming a plurality of face action unit optimization candidate subsets based on the face action unit original set by adopting a plurality of rounds of simulated annealing search, wherein in each round of simulated annealing search, the face action unit original set is formed based on the face action unit original set, and a single round of simulated annealing is executed by taking the face action unit original set as a search basis and taking an energy value as a comparison reference, so that a face action unit optimization candidate subset is finally obtained; (3) Calculating a prediction effectiveness index, a running stability index and a scale index of each face action unit optimization candidate subset; (4) Screening partial subsets from all face action unit optimization candidate subsets according to the three indexes to form a pareto front edge set, and selecting an optimal face action unit subset from the pareto front edge set according to a distance minimum principle; (5) Extracting a plurality of key frames from the asynchronous video, and extracting facial action units in each key frame and a plurality of adjacent frames from the optimal subset of facial action units as an identification window sample; (6) Inputting all the recognition window samples into a pre-trained large language model, and carrying out semantic fusion to generate a combined global semantic representation; (7) And fusing the global semantic representation with the text answer, inputting a pre-trained large language model, and performing personality recognition. Further, the step (2) specifically includes the following steps: (2.1) setting the annealing round number i=1; (2.2) randomly selecting one from a plurality of initialization strategies, and forming an initial set of facial action units based on the initial set of facial action units according to the selected initialization strategy; (2.3) randomly changing the initial set of facial action units to form a facial action unit candidate subset, and performing single-round simulated annealing to obtain a facial action unit optimized candidate subset, wherein during the single-round simulated annealing, whether the facial action unit candidate subset is accepted as a current solution is judged by taking an energy value calculated based on an LSTM energy function as a comparison reference; (2.4) setting the annealing round number i=i+1, judging that i is less than or equal to the maximum annealing round number, if yes, returning to the step (2.2), otherwise, executing the step (2.5); (2.5) outputting all face action unit optimization candidate subsets. Further, the plurality of initialization policies specifically includes three initialization policies, respectively: The first initialization strategy is to directly use the face action unit original set as the face action unit original set; a second initialization strategy, namely randomly selecting one from the face action unit initial set as the face action unit initial set; and a third initialization strategy, namely