DE-102024132749-A1 - Device, system and method for classifying user data for a compatibility check between user data and reference datasets using sequential-adaptive question generation

DE102024132749A1DE 102024132749 A1DE102024132749 A1DE 102024132749A1DE-102024132749-A1

Abstract

Device (10) for classifying user data for compatibility matching between user data and reference datasets by means of sequential-adaptive question generation, the device (10) comprising: a user interaction interface (1) through which the device (10) transmits classification questions to a user (U) in an iterative process, receives user responses to each of the classification questions, and transmits the next classification question to the user (U) depending on each user response; a hardware unit (2) that executes a pre-trained basic model (LLM); a database programming interface (3) through which the hardware unit (2) retrieves information from a database (VDB), wherein sample user responses for the classification questions are stored in the database (VDB), and each sample user response is assigned a value in a predefined range; (...)

Inventors

Larissa Leitner
Annika von Mutius
Weitere(r) Erfinder auf Antrag nicht genannt.

Assignees

Empion GmbH

Dates

Publication Date: 20260513
Application Date: 20241109

Claims (10)

Device (10) for classifying user data for compatibility matching between user data and reference datasets using sequential-adaptive question generation, the device (10) comprising: • a user interaction interface (1) through which the device (10) transmits classification questions to a user (U) in an iterative process, receives user responses to each of the classification questions, and transmits the next classification question to the user (U) depending on each user response; • a hardware unit (2) that executes a pre-trained fundamental model (LLM); • a database programming interface (3) through which the hardware unit (2) retrieves information from a database (VDB), wherein sample user responses for the classification questions are stored in the database (VDB), and each sample user response is assigned a value within a predefined range; • a classifier programming interface (4) through which the user responses are each assigned to a class attribute, wherein the classifier (RF) generates the next classification question at each iteration step based on the respective user response in such a way that this next classification question increases the discrimination quality of the user responses with regard to classification, and the iteration process ends when a class assignment from all class attributes is achieved; where the underlying model (LLM) • receives each of the user responses, • performs a semantic similarity calculation of the respective user response to the sample user responses stored in the database (VDB) via the database programming interface (3) and transforms the respective user response into a current data point with a value in the specified range based on the similarity calculation, • receives the next classification question based on the current data point via the classifier programming interface (4); • via the user interaction interface (1) the received next classification question is transmitted to the user (U).
Device (10) according to Claim 1 , where the database programming interface (3) is a RAG API, which is a Retrieval-Augmented Generation Application Programming Interface, the database (VDB) is a vector database, and the semantic similarity calculation is performed based on vector-based representations of content.
Device (109 according to one of the preceding claims, wherein the classifier programming interface (4) is a programming interface to a trained decision tree learning based machine learning model, wherein the assignment of the class attributes is based on a distance of the respective data point to a respective decision tree node specific threshold calculated by the hardware unit (2).
Device (10) according to one of the preceding claims, wherein the device (10) determines the compatibility check and provides this as a result to the user (U) via the user interaction interface (1).
Device (10) according to one of the preceding claims, wherein the classification questions comprise questions of a culture test and the class assignment is a culture class, wherein the culture test represents aspects of corporate cultures of companies and the culture class is measurable, the compatibility check is a matching between the culture classes of the job seeker and the corporate cultures of the companies, and the user data refers to data from companies or job seekers and the reference data sets refer accordingly to data from job seekers or companies.
System (20) for classifying user data for compatibility matching between user data and reference datasets using sequential-adaptive question generation, the system (20) comprising: • a user interaction interface (1) through which classification questions are transmitted to a user (U) in an iterative process, user responses to each of the classification questions are received, and depending on each user response, the next classification question is transmitted to the user (U); • a hardware unit (2) that executes a pre-trained basic model (LLM); • a database (VDB) in which sample user responses are stored for the classification questions and each sample user response is assigned a value in a predefined range; • a classifier module (21) that executes a classifier (RF), wherein at each iteration step the classifier (RF) generates the next classification question based on the respective user response such that this next classification question increases the separation quality of the user responses with regard to the classification and the iteration process ends when a class assignment is achieved from all class attributes; wherein the underlying model (LLM) • via the user interaction interface (1) receives each of the user responses, • via a first programming interface (22) to the database (VDB) performs a semantic similarity calculation of the respective user response to the sample user responses stored in the database (VDB) and transforms the respective user response into a current data point with a value in the specified range based on the similarity calculation, • via a second programming interface (23) to the classifier module (21) transmits the current data point to the classifier module (21), the classifier module (21) generates the next classification question based on the current data point and transmits the generated next classification question to the basic model (LLM) via the second programming interface (23); • via the user interaction interface (1) transmits the transmitted next classification question to the user (U).
System (20) according Claim 6 , wherein the classification questions comprise questions of a culture test and the class assignment is a culture class, wherein the culture test represents aspects of corporate cultures of companies and the culture class is measurable, the compatibility check is a matching between the culture classes of the job seeker and the corporate cultures of the companies, and the user data refers to data from companies or job seekers and the reference datasets refer accordingly to data from job seekers or companies.
A computer-implemented method for classifying user data for compatibility matching between user data and reference datasets using sequential-adaptive question generation, comprising the following steps: • via a user interaction interface (1) transmitting classification questions to a user (U) in an iterative process, whereby user responses to each of the classification questions are received via the user interaction interface (1), and depending on each user response, the next classification question is transmitted to the user (U) via the user interaction interface (1) (V1); • on a hardware unit (2) Executing a pre-trained basic model (LLM) (V2), wherein the basic model (LLM) ◯ receives each of the user responses via the user interaction interface (1) (V2a), ◯ via a first programming interface (22) to a database (VDB) in which sample user responses are stored for the classification questions and each of the sample user responses is assigned a value in a predefined range, performs a semantic similarity calculation of the respective user response to the sample user responses stored in the database (VDB) and transforms the respective user response into a current data point with a value in the predefined range based on the similarity calculation (V2b), ◯ via a second programming interface (23) to a classifier module (21) transmits the current data point to the classifier module (21), the classifier module (21) generates the next classification question based on the current data point and via the The second programming interface (23) transmits the generated next classification question to the basic model (LLM), whereby the classifier module (21) generates the next classification question at each iteration step based on the respective user response in such a way that this next classification question increases the separation quality of the user responses with regard to the classification, and the iteration process ends when a class assignment from all class attributes is achieved (V2c), ◯ via the user interaction interface (1) transmits the next classification question to the user (U) (V2d).
Procedure according to Claim 8 , whereby the basic model (LLM) is retrained on historical classification questions, user responses and the resulting class assignments (V3).
Procedure according to Claim 8 or 9 , wherein the classification questions comprise questions of a culture test and the class assignment is a culture class, wherein the culture test represents aspects of corporate cultures of companies and the culture class is measurable, the compatibility check is a matching between the culture classes of the job seeker and the corporate cultures of the companies, and the user data refers to data from companies or job seekers and the reference datasets refer accordingly to data from job seekers or companies.

Description

The invention relates to a device, a system and a method for classifying user data for a compatibility comparison between user data and reference datasets by means of sequential-adaptive question generation. The following definitions apply to the entire disclosure content. AU 2021 105 094 A4 discloses a method for executing a predictive model to predict a student's career-related goals after graduation, wherein a decision tree and a random tree process are selected from a prediction and classification model operator for different target variables. In the state of the art, machine learning models such as BERT, DALL-E, and the GPT series are known, which are trained on large datasets and can be adapted to a variety of downstream tasks. Such models are called foundation models; see, for example, arXiv:2108.07258v3 [cs.LG] 12 Jul 2022. Language models, also called large language models, are an example of a foundation model. A language model is a machine learning model trained in a data-driven training process to model a sequence of elements, such as letters or words in natural language texts (see, for example, arXiv:1706.03762v7 [cs.CL] 2 Aug 2023). A publicly available language model is, for example, LLaMA, a collection of foundation language models by Meta AI (see, for example, arXiv:2302.13971v1 [cs.CL] 27 Feb 2023). Using prompt engineering, language models can process natural language and demonstrate logical relationships. For example, a task, such as a question, is provided to the language model in text form via an interface, such as an input field. US 6 616 458 B1 discloses a method and a device for managing a survey, wherein an online service provider computer receives a survey with survey questions from a customer who wants to conduct a survey. A problem with such surveys, especially those that measure and quantify aspects of corporate culture and aspects of job seekers and aim to determine a culture match between companies and job seekers, is the design of the data features efficiently. The invention was based on the objective of optimizing the classification of user data in this field, particularly data-efficient classification, and improving the efficiency of compatibility matching. Data-efficient classification is a high-performance classification based on a relatively small number of data points. The subject matter of the independent claims each solves this problem. Further developments and advantageous embodiments are described in the dependent claims, the drawings, and the description of preferred embodiments. In one aspect, the invention provides a device for classifying user data for compatibility checks between user data and reference datasets using sequential-adaptive question generation. With sequential adaptation, the adjustment is performed in a specific order, with each adjustment step based on previous steps or results. The system not only adapts but does so in a logical, successive, and in particular deterministic, sequence. Each step directly influences the next, resulting in a chain of adjustments that build upon one another. For example, the device poses a question, and based on the answer, the next question is selected to further optimize differentiation in the sense of classification. The questions are thus posed sequentially in a predetermined logical order with continuous adaptation. Simultaneously, the user or operator can configure how many questions the system should pose. This allows, for example, a trade-off between performance and data efficiency. The device includes a user interaction interface. The user interaction interface (UI) is a technical device or software component that allows the user to input data into the device, which is then processed. This interface transmits the entered data to the device for further analysis. For example, the user interaction interface can be a graphical user interface. It can also be a web form, that is, a web-based interface used, for example, on a user's computer, tablet, or smartphone to enter answers via text fields, drop-down menus, or radio buttons. The user interaction interface can also be an audio interface. Speech-to-text technology can be used to... A text, for example a text input, can be obtained from this. Via the user interaction interface, the device iteratively transmits classification questions to a user, receives user responses to each classification question, and transmits the next classification question to the user depending on each user response. The user responses can be, for example, free text input from the user. This creates, for instance, a chat history between the user and the underlying model. In one respect, these classification questions are data-driven classification questions. Data-driven classification questions are questions that aim to categorize users into specific, predefined classes based on their answers. These questions can be part of a machine learning process where the answers are used to perfo