DE-102024132748-A1 - System and method for computer-aided pattern recognition and sequential-adaptive question generation based on user answers to data-based classification questions for a compatibility comparison between user data and reference datasets

DE102024132748A1DE 102024132748 A1DE102024132748 A1DE 102024132748A1DE-102024132748-A1

Abstract

System (10) and method for computer-aided pattern recognition and sequential-adaptive question generation based on user responses (22) to data-based classification questions (23) for a compatibility check between user data and reference datasets, the system comprising: a data input interface (1) that transmits the user responses (22) to all data-based classification questions (23) and a classification result (24a, 24b, 24c) resulting from all user responses (22) and belonging to these user responses (22); a processor unit (2) that transforms the user responses (22) into data points (D), wherein the data points (D) take on values in a predefined range; receives the data points (D) and the classification result (24a, 24b, 24c) belonging to each of these data points (D); in a training phase of the system (10) a decision tree learning-based machine learning model (3) that automatically induces decision trees (20) from the data points (D), executes, (...)

Inventors

Larissa Leitner
Annika von Mutius
Weitere(r) Erfinder auf Antrag nicht genannt.

Assignees

Empion GmbH

Dates

Publication Date: 20260513
Application Date: 20241109

Claims (10)

System (10) for computer-aided pattern recognition and sequential-adaptive question generation based on user responses (22) to data-based classification questions (23) for compatibility comparison between user data and reference datasets, the system comprising: a data input interface (1) that transmits the user responses (22) to all data-based classification questions (23) and a classification result (24a, 24b, 24c) resulting from all user responses (22) and belonging to these user responses (22); a processor unit (2) that: • transforms the user responses (22) into data points (D), wherein the data points (D) take on values in a predefined range; • receives the data points (D) and the classification result (24a, 24b, 24c) belonging to each of these data points (D); • In a training phase of the system (10), a decision tree learning-based machine learning model (3) is executed, which automatically induces decision trees (20) from the data points (D), wherein one of the data-based classification questions (23) is used for each classification of the data points (D) into nodes (21) of the decision trees (20), and the individual classifications are based on a distance of the respective data point (D) to a respective node (21) specific threshold value (S) calculated by the processor unit (2), • wherein the machine learning model (3) is trained on the data points (D) and the associated classification result, and the machine learning model (3) learns to determine the node (21) specific threshold values (S) by optimizing a predefined metric and, at each induction step, based on a data point (D) to one of the data-based classification questions (23), the next data-based classification question (23) is derived from the data-based classification questions (23) in such a way as to generate such that this next data-based classification question (23) increases the separation quality of the data points (D) with respect to classification, and further data-based classification questions (23) that no longer increase the separation quality are not used for compatibility checking in order to obtain a reduced number of data-based classification questions (23); a data output interface (4) that outputs the reduced number of data-based classification questions (23) belonging to the respective classification result (24a, 24b, 24c).
System (10) according to one of the preceding claims, wherein the node (21) specific threshold values (S) are determined by optimization of the Gini coefficient, the entropy or mean squared deviation.
System (10) according to one of the preceding claims, comprising a user interaction interface (5) that displays to the user the location of the respective data point (D) relative to the respective node (21) specific threshold (S), wherein, in the case that the data point (D) is relatively close to the threshold (S) and thus the classification of the data point (D) is uncertain, the processor unit (2) calculates an uncertainty range (25) around the threshold (S) and the system (10) interacts with the user via the user interaction interface (5), queries a tendency of the user with regard to the data point (D) and the uncertainty range (25) by means of queries and, depending on the tendency, the processor unit (5) uses the smallest value (25a) or the largest value (25b) of the uncertainty range (25) as the value for a more certain classification.
System (10) according to one of the preceding claims, wherein the decision tree learning-based machine learning model (3) is a random forest-based machine learning model.
System (10) according to one of the preceding claims, wherein the data input interface (1) transmits the user responses (22) to all data-based classification questions (23) in the form of a dense vector (26) and the user responses (22) to the reduced number of data-based classification questions (23) in the form of a sparse vector (27) with the same dimensionality as the dense vector (26); The processor unit (2) receives the dense vectors (26) and associated sparse vectors (27) as training data pairs and executes an encoder-decoder machine learning model, wherein the encoder-decoder machine learning model is trained on the training data pairs in a training phase, the encoder (28) learns to map the sparse vectors (27) to codes in a feature space (L), wherein in the feature space (L) the codes are determined by the respective sparse vectors (27) with respect to characteristics of the sparse vectors (27), and the encoder (28) is executed to recognize the characteristics of the sparse vectors (27) and to map the sparse vectors (27) accordingly, and the decoder (29) learns to map the codes to generated dense vectors (26g) by performing a reconstruction loss of the generated dense vectors (26g). relative to the dense vectors (26) is minimized; the data output interface (4) which in an operational phase of the encoder-decoder machine learning model, in which the system (10) the user responses (22) to the reduced number of data-based classification questions gen (23) receives, generates dense vectors (26g) for compatibility matching.
System (10) according to any of the preceding claims, wherein the data-based classification questions (23) comprise questions of a culture test and the classification results (24a, 24b, 24c) are culture classes, wherein the culture test measurably represents aspects of corporate cultures of companies and the culture classes of job seekers, the compatibility check is a matching between the culture classes of the job seekers and the corporate cultures of the companies, and the user data refers to data from companies or job seekers and the reference data sets refer accordingly to data from job seekers or companies.
System (10) according to one of the Claims 1 until 2 comprehensive a user interaction interface (5) or system (10) according to one of the Claims 3 until 6 , wherein the system (10) receives the user responses (22) to the reduced number of data-based classification questions (23) via the user interaction interface (5), the processor unit (2) performs the compatibility check between the user data and the reference data sets based on these user responses (22) by determining matches in the user data and the reference data sets, and the data output interface (4) and/or the user interaction interface (5) output the compatibility check obtained.
System (10) according to one of the preceding claims, wherein the system (10) is a distributed computing environment, wherein the processor unit (2) comprises storage and/or processing resources and the users communicate with the system (10) via terminal devices.
A method for computer-aided pattern recognition and sequential-adaptive question generation based on user responses (22) to data-based classification questions (23) for compatibility matching between user data and reference datasets, the method comprising the steps of: - Obtaining, via a data input interface (1), the user responses (22) to all data-based classification questions (23) and a classification result (24a, 24b, 24c) corresponding to these user responses (22) (V1); - Transforming the user responses (22) into data points (D), wherein the data points (D) take on values within a predefined range (V2); - Configure a decision tree learning-based machine learning model (3) such that the machine learning model (3) automatically induces decision trees (20) from the data points (D), using one of the data-based classification questions (23) to classify the data points (D) into nodes (21) of the decision trees (20), and perform the individual classifications based on a calculated distance of the respective data point (D) to a respective node (21) specific threshold (S) (V3); - Training, using a processor unit (2), the machine learning model (3) on the data points (D) and the respective associated classification result (24a, 24b, 24c), wherein the machine learning model learns to determine the node (21) specific thresholds (S) by optimizing a predefined metric and, at each induction step, based on a data point (D) to one of the data-based classification questions (23), generating the next data-based classification question (23) from the data-based classification questions (23) in such a way that this next data-based classification question (23) increases the separation quality of the data points (D) with respect to classification and further data-based classification questions (23) that no longer increase the separation quality are not used for compatibility checking in order to obtain a reduced number of data-based classification questions (23) (V4); - Output, via a data output interface (4), the data-based classification questions (23) belonging to the respective classification result (24a, 24b, 24c) in the reduced number for carrying out the compatibility check (V5).
Procedure according to Claim 9 , wherein the data-based classification questions (23) comprise questions of a culture test and the classification results (24a, 24b, 24c) are culture classes, wherein the culture test measurably represents aspects of a company's corporate culture and a culture type of job seekers, the compatibility matching is a matching between the cultural profiles of the companies and the job seekers, and the user data refer to data from companies or job seekers and the reference datasets refer accordingly to data from job seekers or companies.

Description

The invention relates to a system and a method for computer-aided pattern recognition and sequential-adaptive question generation based on user answers to data-based classification questions for a compatibility comparison between user data and reference datasets. The following definitions apply to the entire disclosure content. AU 2021 105 094 A4 discloses a method for executing a predictive model to predict a student's career-related goals after graduation, wherein a decision tree and a random tree process are selected from a prediction and classification model operator for different target variables. A decision tree is a decision-support structure that uses a tree-like model of decisions and their possible consequences. Decision tree learning is a supervised learning approach used in machine learning. In this formalism, a classification decision tree is used as a predictive model to draw inferences about a set of observations. Well-known algorithms include Chi-square automatic interaction detection, Classification and Regression Tree, and Iterative Dichotomiser 3. US 6 616 458 B1 discloses a method and a device for managing a survey, wherein an online service provider computer receives a survey with survey questions from a customer who wants to conduct a survey. A problem with such surveys, especially those that measure and quantify aspects of corporate culture and aspects of job seekers and aim to determine a culture match between companies and job seekers, is the design of the data features efficiently. The invention was based on the objective of optimizing the classification of user data in this field, particularly data-efficient classification, and improving the efficiency of compatibility matching. Data-efficient classification is a high-performance classification based on a relatively small number of data points. The subject matter of the independent claims each solves this problem. Further developments and advantageous embodiments are described in the dependent claims, the drawings, and the description of preferred embodiments. In one aspect, the invention provides a system for computer-aided pattern recognition and sequential-adaptive question generation based on user responses to data-based classification questions for compatibility comparison between user data and reference datasets. The system can be a computer system. Computer-aided pattern recognition means that the system is used to automatically recognize and classify patterns, structures, or regularities in the user responses. In sequential adaptation, the adjustment is carried out in a specific order, with each adjustment step based on previous steps or results. The system, and in particular its behavior, not only adapts but does so in a logical, successive, and especially deterministic, sequence. Each step directly influences the next, resulting in a chain of adjustments that build upon one another. For example, the system poses a question, and based on the answer, the next question is selected to further optimize differentiation in the sense of classification. Thus, the questions are posed successively in a predetermined logical sequence with continuous adaptation. At the same time, the user or operator can configure how many questions the system should ask. This allows, for example, a trade-off between performance and data efficiency. Data-driven classification questions are questions that aim to categorize users into specific, predefined classes based on their answers. These questions can be part of a machine learning process, where the answers are used to perform classification or pattern recognition. The questions are designed so that the answers serve as data points for a machine learning model. A data-driven classification question could be, for example: "How would you describe your preferred way of working: structured or flexible?" The answer "structured" could be assigned the value 0. The answer "flexible" could be assigned the value 1. A value of 0.5 then signifies neutrality with respect to "structured" and "flexible." Values in the range of 0 to less than 0.5 indicate a tendency towards "structured," with a correspondingly strong emphasis. Values in the range of greater than 0.5 to 1 indicate a tendency towards "flexible" with a correspondingly strong expression. The system includes a data input interface. The data input interface is a technical device. The data input interface transmits the user responses to all data-based classification questions and a classification result derived from all user responses and belonging to those user responses. For example, a complete questionnaire comprises 12 classification questions. The classification result could be, for instance, a cultural class, such as a job seeker's cultural type, with specific classifications, for example, three each. The classification result could also be a company culture, where the data-driven classification questions are directed at companies, and the users i