Search

US-12625890-B2 - Interactive dataset preparation

US12625890B2US 12625890 B2US12625890 B2US 12625890B2US-12625890-B2

Abstract

Methods, computer program products, and systems are presented. The method computer program products, and systems can include, for instance: discovering semantic operations within a table based dataset, wherein the discovering semantic operations within the table based dataset includes examining of logging data, wherein the logging data specifies atomical physical changes that have been applied to the table based dataset responsively to receipt of change specifying input data from one or more user.

Inventors

  • Anupam SANGHI
  • Rajmohan Chandrahasan
  • Arvind Agarwal

Assignees

  • INTERNATIONAL BUSINESS MACHINES CORPORATION

Dates

Publication Date
20260512
Application Date
20231228

Claims (20)

  1. 1 . A computer implemented method comprising: in response to receiving query data from a data source, discovering, by activating a data process executed on a computing node based system, semantic operations within a table based dataset, wherein the discovering semantic operations within the table based dataset includes examining logging data, wherein the logging data specifies atomical physical changes that have been applied to the table based dataset responsively to receipt of change specifying input data from one or more user, wherein the table based dataset includes at least one table having rows and columns that define cells; generating, by the computing node based system, prompting data in dependence on the discovering semantic operations within the table based dataset; sending, by the computing node based system, the prompting data to at least one user device; and interacting, via a user interface, with at least one user in dependence on the discovering semantic operations within the table based dataset, wherein the interacting with the at least one user in dependence on the discovering semantic operations within the table based dataset includes presenting on the user interface the generated prompting data to the at least one user, wherein the prompting data prompts the at least one user to perform at least one action with respect to the table based dataset, wherein the method includes identifying a semantic operation group based on first and second cells of the table based dataset having common semantic operation tags resulting from the discovering, and wherein the prompting data presented on the user interface includes an action specifying text string mapping to the semantic operation group, and wherein generation of the action specifying text string mapping to the semantic operation group has included querying a trained predictive model that has been trained to return strings of dataset action specifying text in response to being queried with query data specifying characteristics of the semantic operation group.
  2. 2 . The computer implemented method of claim 1 , wherein the discovering semantic operations within the table based dataset includes determining a semantic operation defined by an atomical physical change of the atomical physical changes.
  3. 3 . The computer implemented method of claim 1 , wherein the discovering semantic operations within the table based dataset includes determining a semantic operation defined by an atomical physical change of the atomical physical changes, wherein the semantic operation specifies a reason for performance of the atomical physical change.
  4. 4 . The computer implemented method of claim 1 , wherein the method includes iteratively performing the discovering semantic operations within the table based dataset, the generating prompting data in dependence on the discovering semantic operations within the table based dataset, and the presenting on the user interface the generated prompting data to the at least one user.
  5. 5 . The computer implemented method of claim 1 , wherein the method includes identifying a second semantic operation group based on third and fourth cells of the table based dataset having matching semantic operation tags resulting from the discovering, and wherein the prompting data presented on the user interface includes a second action specifying text string mapping to the second semantic operation group.
  6. 6 . The computer implemented method of claim 1 , wherein the method includes iteratively performing the discovering semantic operations within the table based dataset, the generating prompting data in dependence on the discovering semantic operations within the table based dataset, and the presenting on the user interface the generated prompting data to the at least one user, wherein the prompting data presented on the user interface includes text based data specifying a predicted impact on data quality of the table based dataset associated to a group of semantic operations identified responsively to the discovering.
  7. 7 . The computer implemented method of claim 1 , wherein the method includes receiving approval from the at least one user to implement a group of semantic operations identified responsively to the discovering the semantic operations within a table based dataset, assessing a data quality of the table based dataset responsively to the receiving, and repeating the discovering semantic operations within the table based dataset.
  8. 8 . The computer implemented method of claim 1 , wherein the discovering semantic operations within the table based dataset includes determining a semantic operation defined by an atomical physical change of the atomical physical changes, wherein the semantic operation is selected from the group consisting of a representation change, a value change, a format change, a unit change, a value correction, a synonym substitution, a misspelling reason for change, a representation correction reason for change, a domain correction reason for change, a value correction reason for change, a consistency in representation reason for change, and consistency in domain reason for change.
  9. 9 . The computer implemented method of claim 1 , wherein the discovering semantic operations within the table based dataset includes determining a semantic operation defined by an atomical physical change of the atomical physical changes, wherein the semantic operation is selected from the group consisting of a representation correction reason for change, a domain correction reason for change, a domain correction reason for change, a value correction reason for change, a consistency in representation reason for change, and a consistency in domain reason for change.
  10. 10 . The computer implemented method of claim 1 , wherein the method includes receiving approval from the at least one user to implement a group of semantic operations identified responsively to the discovering the semantic operations within a table based dataset, assessing a data quality of the table based dataset responsively to the receiving, and repeating the discovering semantic operations within the table based dataset, the generating prompting data in dependence on the discovering semantic operations within the table based dataset, and the presenting on the user interface the generated prompting data to the at least one user responsively to a determination based on the assessing that the table based dataset includes insufficient data quality.
  11. 11 . A system comprising: a memory; at least one processor in communication with the memory; and program instructions executable by one or more processor via the memory to perform a method comprising: in response to receiving query data from a data source, discovering, by activating a data process executed on a computing node based system, semantic operations within a table based dataset, wherein the discovering semantic operations within the table based dataset includes examining logging data, wherein the logging data specifies atomical physical changes that have been applied to the table based dataset responsively to receipt of change specifying input data from one or more user, wherein the table based dataset includes at least one table having rows and columns that define cells; generating prompting data in dependence on the discovering semantic operations within the table based dataset; sending the prompting data to at least one user device; and interacting, via a user interface, with at least one user in dependence on the discovering semantic operations within the table based dataset, wherein the interacting with the at least one user in dependence on the discovering semantic operations within the table based dataset includes presenting on the user interface the generated prompting data to the at least one user, wherein the prompting data prompts the at least one user to perform at least one action with respect to the table based dataset, wherein the method includes identifying a semantic operation group based on first and second cells of the table based dataset having common semantic operation tags resulting from the discovering, and wherein the prompting data presented on the user interface includes an action specifying text string mapping to the semantic operation group, and wherein generation of the action specifying text string mapping to the semantic operation group has included querying a trained predictive model that has been trained to return strings of dataset action specifying text in response to being queried with query data specifying characteristics of the semantic operation group.
  12. 12 . The system of claim 11 , wherein the method includes iteratively performing the discovering semantic operations within the table based dataset, the generating prompting data in dependence on the discovering semantic operations within the table based dataset, and the presenting on the user interface the generated prompting data to the at least one user.
  13. 13 . The system of claim 11 , wherein the discovering semantic operations within the table based dataset includes determining, for a first atomical physical change to a first cell of the table based dataset, first semantic operation tags, determining, for a second atomical physical change to a second cell of the table based dataset, second semantic operation tags, determining, for a third atomical physical change to a third cell of the table based dataset, third semantic operation tags, determining, for a fourth atomical physical change to a fourth cell of the table based dataset, fourth semantic operation tags, matching the first semantic operation tags to the second semantic operation tags, performing matching of the third semantic operation tags to the fourth semantic operation tags, identifying a first group of semantic operations based on the matching, identifying a second group of semantic operations based on the performing matching, wherein the generating the prompting data in dependence on the performing of the semantic operation discovery of the table based dataset includes performing the generating in dependence on an evaluating of the first group and the second group, wherein the evaluating includes scoring the first group based on multiple factors, performing scoring of the second group based on the multiple factors, and ranking the first group and the second group in dependence on the scoring and the performing scoring, wherein the method includes iteratively performing the discovering semantic operations within the table based dataset, the generating prompting data in dependence on the discovering semantic operations within the table based dataset, and the presenting on the user interface the generated prompting data to the at least one user, wherein the prompting data presented on the user interface includes text based data specifying a predicted impact on data quality of the table based dataset associated to a group of semantic operations identified responsively to the discovering, wherein the method includes receiving approval from the at least one user to implement a group of semantic operations identified responsively to the discovering the semantic operations within a table based dataset, assessing a data quality of the dataset responsively to the receiving, and performing an iteration of the discovering semantic operations within the table based dataset, the generating prompting data in dependence on the discovering semantic operations within the table based dataset, and the presenting on the user interface the generated prompting data to the at least one user responsively to a determination based on the assessing that the dataset includes insufficient data quality.
  14. 14 . A computer implemented method comprising: in response to receiving query data from a data source; discovering, by activating a data process executed on a computing node based system, semantic operations within a table based dataset, wherein the discovering semantic operations within the table based dataset includes examining logging data, wherein the logging data specifies atomical physical changes that have been applied to the table based dataset responsively to receipt of change specifying input data from one or more user, wherein the table based dataset includes at least one table having rows and columns that define cells; generating, by the computing node based system, prompting data in dependence on the discovering semantic operations within the table based dataset; sending, by the computing node based system, the prompting data to at least one user device; and interacting, via a user interface, with at least one user in dependence on the discovering semantic operations within the table based dataset, wherein the interacting with the at least one user in dependence on the discovering semantic operations within the table based dataset includes presenting on the user interface the generated prompting data to the at least one user, wherein the prompting data prompts the at least one user to perform at least one action with respect to the table based dataset, wherein the method includes receiving approval from the at least one user to implement a group of semantic operations identified responsively to the discovering the semantic operations within a table based dataset, assessing a data quality of the table based dataset responsively to the receiving, and repeating the discovering semantic operations within the table based dataset.
  15. 15 . The computer implemented method of claim 14 , wherein the method includes identifying a semantic operation group based on first and second cells of the table based dataset having common semantic operation tags resulting from the discovering, and wherein the prompting data presented on the user interface includes an action specifying text string mapping to the semantic operation group.
  16. 16 . The computer implemented method of claim 14 , wherein the method includes identifying a semantic operation group based on first and second cells of the table based dataset having common semantic operation tags resulting from the discovering, and wherein the prompting data presented on the user interface includes an action specifying text string mapping to the semantic operation group, and wherein generation of the action specifying text string mapping to the semantic operation group has included querying a trained predictive model that has been trained to return strings of dataset action specifying text in response to being queried with query data specifying characteristics of the semantic operation group.
  17. 17 . The computer implemented method of claim 14 , wherein the discovering semantic operations within the table based dataset includes determining, for a first atomical physical change to a first cell of the table based dataset, a first one or more semantic operation tag, determining, for a second atomical physical change to a second cell of the table based dataset, a second one or more semantic operation tag, determining, for a third atomical physical change to a third cell of the table based dataset, a third one or more semantic operation tag, determining, for a fourth atomical physical change to a fourth cell of the table based dataset, a fourth one or more semantic operation tag, matching the first one or more semantic operation tag to the second one or more semantic operation tag, performing matching of the third one or more semantic operation tag to the fourth one or more semantic operation tag, identifying a first group of semantic operations based on the matching, identifying a second group of semantic operations based on the performing matching, wherein the generating the prompting data in dependence on the performing of the semantic operation discovery of the table based dataset includes performing the generating in dependence on an evaluating of the first group and the second group.
  18. 18 . The computer implemented method of claim 14 , wherein the discovering semantic operations within the table based dataset includes determining, for a first atomical physical change to a first cell of the table based dataset, first semantic operation tags, determining, for a second atomical physical change to a second cell of the table based dataset, second semantic operation tags, determining, for a third atomical physical change to a third cell of the table based dataset, third semantic operation tags, determining, for a fourth atomical physical change to a fourth cell of the table based dataset, fourth semantic operation tags, matching the first semantic operation tags to the second semantic operation tags, performing matching of the third semantic operation tags to the fourth semantic operation tags, identifying a first group of semantic operations based on the matching, identifying a second group of semantic operations based on the performing matching, wherein the generating the prompting data in dependence on the performing of the semantic operation discovery of the table based dataset includes performing the generating in dependence on an evaluating of the first group and the second group, wherein the evaluating includes scoring the first group based on multiple factors, performing scoring of the second group based on the multiple factors, and ranking the first group and the second group in dependence on the scoring and the performing scoring.
  19. 19 . The computer implemented method of claim 14 , wherein the discovering semantic operations within the table based dataset includes determining, for a first atomical physical change to a first cell of the table based dataset, first semantic operation tags, determining, for a second atomical physical change to a second cell of the table based dataset, second semantic operation tags, determining, for a third atomical physical change to a third cell of the table based dataset, third semantic operation tags, determining, for a fourth atomical physical change to a fourth cell of the table based dataset, fourth semantic operation tags, matching the first semantic operation tags to the second semantic operation tags, performing matching of the third semantic operation tags to the fourth semantic operation tags, identifying a first group of semantic operations based on the matching, identifying a second group of semantic operations based on the performing matching, wherein the generating the prompting data in dependence on the performing of the semantic operation discovery of the table based dataset includes performing the generating in dependence on an evaluating of the first group and the second group, wherein the evaluating includes scoring the first group based on multiple factors, performing scoring of the second group based on the multiple factors, and ranking the first group and the second group in dependence on the scoring and the performing scoring, wherein the method includes iteratively performing the discovering semantic operations within the table based dataset, the generating prompting data in dependence on the discovering semantic operations within the table based dataset, and the presenting on the user interface the generated prompting data to the at least one user, wherein the prompting data presented on the user interface includes text based data specifying a predicted impact on data quality of the table based dataset associated to a group of semantic operations identified responsively to the discovering, wherein the method includes receiving approval from the at least one user to implement the group of semantic operations identified responsively to the discovering the semantic operations within a table based dataset, assessing a data quality of the table based dataset responsively to the receiving, and performing an iteration of the discovering semantic operations within the table based dataset, the generating prompting data in dependence on the discovering semantic operations within the table based dataset, and the presenting on the user interface the generated prompting data to the at least one user responsively to a determination based on the assessing that the table based dataset includes insufficient data quality.
  20. 20 . The computer implemented method of claim 14 , wherein the method includes identifying a semantic operation group based on first and second cells of the table based dataset having common semantic operation tags resulting from the discovering, and wherein the prompting data presented on the user interface includes an action specifying text string that specifies action to be taken by the one or more user with respect to the semantic operation group.

Description

Embodiments herein relate to datasets generally, and particularly to interactive dataset preparation. Data structures have been employed for improving operation of a computer system. A data structure refers to an organization of data in a computer environment for improved computer system operation. Data structure types include containers, lists, stacks, queues, tables and graphs. Data structures have been employed for improved computer system operation e.g., in terms of algorithm efficiency, memory usage efficiency, maintainability, and reliability. Artificial intelligence (AI) refers to intelligence exhibited by machines. Artificial intelligence (AI) research includes search and mathematical optimization, neural networks and probability. Artificial intelligence (AI) solutions involve features derived from research in a variety of different science and technology disciplines ranging from computer science, mathematics, psychology, linguistics, statistics, and neuroscience. Machine learning has been described as the field of study that gives computers the ability to learn without being explicitly programmed. SUMMARY Shortcomings of the prior art are overcome, and additional advantages are provided, through the provision, in one aspect, of a method. The method can include, for example: discovering semantic operations within a table based dataset, wherein the discovering semantic operations within the table based dataset includes examining of logging data, wherein the logging data specifies atomical physical changes that have been applied to the table based dataset responsively to receipt of change specifying input data from one or more user, wherein the table based dataset includes at least one table having rows and columns that define cells; generating prompting data in dependence on the discovering semantic operations within the table based dataset; and interacting with at least one user in dependence on the discovering semantic operations within the table based dataset, wherein the interacting with the at least one user in dependence on the discovering semantic operations within the table based dataset includes presenting on a user interface the generated prompting data to the at least one user, wherein the prompting data prompts the at least one user to perform at least one action with respect to the table based dataset. In another aspect, a computer program product can be provided. The computer program product can include a computer readable storage medium readable by one or more processing circuit and storing instructions for execution by one or more processor for performing a method. The method can include, for example: discovering semantic operations within a table based dataset, wherein the discovering semantic operations within the table based dataset includes examining of logging data, wherein the logging data specifies atomical physical changes that have been applied to the table based dataset responsively to receipt of change specifying input data from one or more user, wherein the table based dataset includes at least one table having rows and columns that define cells; generating prompting data in dependence on the discovering semantic operations within the table based dataset; and interacting with at least one user in dependence on the discovering semantic operations within the table based dataset, wherein the interacting with the at least one user in dependence on the discovering semantic operations within the table based dataset includes presenting on a user interface the generated prompting data to the at least one user, wherein the prompting data prompts the at least one user to perform at least one action with respect to the table based dataset. In a further aspect, a system can be provided. The system can include, for example a memory. In addition, the system can include one or more processor in communication with the memory. Further, the system can include program instructions executable by the one or more processor via the memory to perform a method. The method can include, for example: discovering semantic operations within a table based dataset, wherein the discovering semantic operations within the table based dataset includes examining of logging data, wherein the logging data specifies atomical physical changes that have been applied to the table based dataset responsively to receipt of change specifying input data from one or more user, wherein the table based dataset includes at least one table having rows and columns that define cells; generating prompting data in dependence on the discovering semantic operations within the table based dataset; and interacting with at least one user in dependence on the discovering semantic operations within the table based dataset, wherein the interacting with the at least one user in dependence on the discovering semantic operations within the table based dataset includes presenting on a user interface the generated prompting data to the at le