US-12619430-B2 - System and method for evidencing developer domain specific skills
Abstract
A method and system for evidencing skills of a developer are disclosed. The method includes extracting, from various databases, a list of terms and definitions for identifying skills, and grouping the identified skills for forming at least one ontology based on similarity. The method further includes acquiring raw data and performing data cleaning on the raw data for identifying at least one domain-specific skill. Subsequently, skill matching is performed by comparing the at least one domain-specific skill against the at least one ontology for determining that the at least one ontology is evidenced for the developer. The method further includes performing analytics to display the evidenced at least ontology and corresponding level, and automatically assigning the at least one task based on the performed analytics.
Inventors
- Rares DOLGA
- Yulong PEI
- Vali TAWOSI
- Salwa Husam Alamir
- Sameena Shah
Assignees
- JPMORGAN CHASE BANK, N.A.
Dates
- Publication Date
- 20260505
- Application Date
- 20231012
Claims (18)
- 1 . A method for evidencing skills of a developer, the method comprising: extracting, by a processor and from a plurality of databases, a list of terms and corresponding definitions; removing, by the processor, at least one term from the extracted list of terms and corresponding definitions; identifying, by the processor, a plurality of skills based on the extracted list of terms and corresponding definitions after the removing of the at least one term; grouping, by the processor, the identified plurality of skills for forming at least one ontology based on similarity, wherein the grouping is performed using affinity clustering, and wherein the affinity clustering includes extracting a sentence embedding using a language model, the sentence embedding includes a process of representing variable-length sentences as fixed-length vectors; acquiring, by the processor and from one or more data sources, raw data; performing, by the processor, data cleaning on the acquired raw data; identifying, by the processor, at least one domain-specific skill from the cleaned raw data; performing, by the processor, skill matching by comparing the at least one domain-specific skill against the at least one ontology, wherein a similarity between the at least one domain-specific skill and the at least one ontology is measured based on a cosine of an angle formed between a first embedding vector of the at least one domain-specific skill and a second embedding vector of the at least one ontology to determine whether or not the first embedding vector and the second embedding vector are pointing in a same direction within a deviation below a reference threshold; when the at least one domain-specific skill matches with the at least one ontology, determining that the at least one ontology is evidenced for the developer; performing, by the processor, analytics to identify and display, on a display, the evidenced at least one ontology and corresponding level; and automatically assigning, by the processor, at least one task based on the performed analytics.
- 2 . The method according to claim 1 , wherein the plurality of databases includes a glossary storage and a document storage.
- 3 . The method according to claim 1 , wherein the at least one ontology is formed independently from the identifying of the at least one domain-specific skill from the cleaned raw data.
- 4 . The method according to claim 1 , wherein the one or more data sources include at least a task storage.
- 5 . The method according to claim 1 , wherein the one or more data sources include at least a message storage.
- 6 . The method according to claim 1 , wherein the raw data is text data.
- 7 . The method according to claim 1 , wherein the corresponding level is determined by a number of tasks performed associated with the at least one ontology.
- 8 . The method according to claim 1 , wherein the at least one ontology includes a plurality of ontologies, and wherein the corresponding level is displayed for each of the plurality of ontologies.
- 9 . The method according to claim 8 , wherein the automatic assigning is performed based on highest level for an ontology among the plurality of ontologies.
- 10 . The method according to claim 8 , wherein the plurality of ontologies and the corresponding levels for the developer are displayed as a singular bar.
- 11 . The method according to claim 10 , wherein the plurality of ontologies and corresponding levels for the developer are displayed in different colors.
- 12 . The method according to claim 1 , wherein the data cleaning includes removal of extraneous information or noise.
- 13 . The method according to claim 1 , further comprising: automatically recommending, by the processor, a new position based on the evidenced at least one ontology and corresponding level.
- 14 . The method according to claim 1 , wherein the skill matching includes executing at least two machine learning models, and wherein the at least two machine learning models includes a rule based model and a large language model.
- 15 . The method according to claim 14 , wherein the rule based model utilizes the at least one ontology to detect a presence of a skill term corresponding to the at least one ontology, and wherein the skill term includes an exact term, a synonym, an acronym or a misspelling of the skill term.
- 16 . The method according to claim 14 , wherein the large language model further processes an output provided by the rule based model, and wherein the large language model utilizes in-context learning to extract a list of skill terms from the raw data.
- 17 . A system for evidencing skills of a developer, the system comprising: a memory; and a processor, wherein the system is configured to perform: extracting, from a plurality of databases, a list of terms and corresponding definitions; removing at least one term from the extracted list of terms and corresponding definitions; identifying a plurality of skills based on the extracted list of terms and corresponding definitions after the removing of the at least one term; grouping the identified plurality of skills for forming at least one ontology based on similarity, wherein the grouping is performed using affinity clustering, and wherein the affinity clustering includes extracting a sentence embedding using a language model, the sentence embedding includes a process of representing variable-length sentences as fixed-length vectors; acquiring, from one or more data sources, raw data; performing data cleaning on the acquired raw data; identifying at least one domain-specific skill from the cleaned raw data; performing skill matching by comparing the at least one domain-specific skill against the at least one ontology, wherein a similarity between the at least one domain-specific skill and the at least one ontology is measured based on a cosine of an angle formed between a first embedding vector of the at least one domain-specific skill and a second embedding vector of the at least one ontology to determine whether or not the first embedding vector and the second embedding vector are pointing in a same direction within a deviation below a reference threshold; when the at least one domain-specific skill matches with the at least one ontology, determining that the at least one ontology is evidenced for the developer; performing analytics to identify and display, on a display, the evidenced at least one ontology and corresponding level; and automatically assigning at least one task based on the performed analytics.
- 18 . A non-transitory computer readable storage medium that stores a computer program for evidencing skills of a developer, the computer program, when executed by a processor, causing a system to perform a plurality of processes comprising: extracting, from a plurality of databases, a list of terms and corresponding definitions; removing at least one term from the extracted list of terms and corresponding definitions; identifying a plurality of skills based on the extracted list of terms and corresponding definitions after the removing of the at least one term; grouping the identified plurality of skills for forming at least one ontology based on similarity, wherein the grouping is performed using affinity clustering, and wherein the affinity clustering includes extracting a sentence embedding using a language model, the sentence embedding includes a process of representing variable-length sentences as fixed-length vectors; acquiring, from one or more data sources, raw data; performing data cleaning on the acquired raw data; identifying at least one domain-specific skill from the cleaned raw data; performing skill matching by comparing the at least one domain-specific skill against the at least one ontology, wherein a similarity between the at least one domain-specific skill and the at least one ontology is measured based on a cosine of an angle formed between a first embedding vector of the at least one domain-specific skill and a second embedding vector of the at least one ontology to determine whether or not the first embedding vector and the second embedding vector are pointing in a same direction within a deviation below a reference threshold; when the at least one domain-specific skill matches with the at least one ontology, determining that the at least one ontology is evidenced for the developer; performing analytics to identify and display, on a display, the evidenced at least one ontology and corresponding level; and automatically assigning at least one task based on the performed analytics.
Description
TECHNICAL FIELD This disclosure generally relates to data processing. More specifically, the present disclosure generally relates to performing developer domain skill extraction for authenticating indicated skillset. BACKGROUND The developments described in this section are known to the inventors. However, unless otherwise indicated, it should not be assumed that any of the developments described in this section qualify as prior art merely by virtue of their inclusion in this section, or that those developments are known to a person of ordinary skill in the art. Extracting skills of a developer may be beneficial for performing job matching, as well as for providing recommendation for training and prioritization of work. Although external tools be available in the public domain, which may be utilized to extract skills of a developer, such skills are listed based on a subjective view of the authoring developer. In other words, skills listed via such external tools may not be vetted. Accordingly, discrepancies may exist between the skills listed on the public domain and actual skill possessed by the respective author. Similarly, at least since the skill listed by the developer is solely dependent on the developer listing such skills, some skills possessed by the developer may not be listed on the external tools and hidden from project managers. SUMMARY According to an aspect of the present disclosure, a method for evidencing skills of a developer is provided. The method includes extracting, by a processor and from multiple databases, a list of terms and corresponding definitions; removing, by the processor, at least one general term from the extracted list of terms and corresponding definitions for identifying multiple skills; grouping, by the processor, the identified skills for forming at least one ontology based on similarity; acquiring, by the processor and from one or more data sources, raw data; performing, by the processor, data cleaning on the acquired raw data; identifying, by the processor, at least one domain-specific skill from the cleaned raw data; performing, by the processor, skill matching by comparing the at least one domain-specific skill against the at least one ontology; when the at least one domain-specific skill matches with the at least one ontology, determining that the at least one ontology is evidenced for the developer; performing, by the processor, analytics to identify and display, on a display, the evidenced at least one ontology and corresponding level; and automatically assigning, by the processor, at least one task based on the performed analytics. According to another aspect of the present disclosure, the multiple databases includes a glossary storage and a document storage. According to another aspect of the present disclosure, the grouping is performed using affinity clustering. According to yet another aspect of the present disclosure, the skill matching is performed using cosine similarity of embeddings with all of the terms in the at least one ontology. According to another aspect of the present disclosure, the at least one ontology is formed independently from the identifying of the at least one domain-specific skill from the cleaned raw data. According to a further aspect of the present disclosure, the one or more data sources include at least a task storage. According to yet another aspect of the present disclosure, the one or more data sources include at least a message storage. According to a further aspect of the present disclosure, the raw data is text data. According to another aspect of the present disclosure, the corresponding level is determined by a number of tasks performed associated with the identified at least one ontology. According to a further aspect of the present disclosure, the at least one ontology includes multiple ontologies, and the corresponding level is displayed for each of the multiple ontologies. According to a further aspect of the present disclosure, the automatic assigning is performed based on highest level for an ontology among the multiple ontologies. According to a further aspect of the present disclosure, the multiple ontologies and the corresponding levels for the developer are displayed as a singular bar. According to a further aspect of the present disclosure, the multiple ontologies and corresponding levels for the developer are displayed in different colors. According to a further aspect of the present disclosure, the data cleaning includes removal of extraneous information or noise. According to a further aspect of the present disclosure, the method further includes automatically recommending, by the processor, a new position based on the evidenced at least one ontology and corresponding level. According to a further aspect of the present disclosure, the skill matching executes at least two machine learning models, and the at least two machine learning models includes a rule based model and a large language model. According to