JP-7856371-B2 - Systems, computer implementations, and computer programs (association of disturbance events with accidents or tickets)
Inventors
- ヤシチン エマニュエル
- ゾウ ニアンジュン
- バミディパティ アヌラダ
- パテル ダヴァルクマール シー.
- アイアンガー アルン クワンギル
- シリヴァスタヴァ シレイ
Assignees
- インターナショナル・ビジネス・マシーンズ・コーポレーション
Dates
- Publication Date
- 20260511
- Application Date
- 20220812
- Priority Date
- 20210812
Claims (20)
- Memory and Equipped with a processor including hardware, The processor is configured to communicate with the memory, and the processor is configured to communicate with the memory. Receiving a set of service records from a data source that includes one or more service records corresponding to events representing multiple disturbances in a specified area occurring over a certain period, and one or more service records that do not have labels associated with mislabeled or associated disturbances, To determine the observed actual event rate of events representing disturbances during the aforementioned period, To determine the periodically aggregated rate of the baseline average value of expected service-related records under non-disturbance conditions for the aforementioned period, Determine a set of standardized, periodically aggregated scores as a function of the periodically aggregated rate of the baseline mean and the rate of observed actual events during the period, Identifying disturbance time windows based on changes in standardized, periodically aggregated scores detected during the aforementioned period, To generate disturbance-related probabilities for the service record corresponding to the identified disturbance time window, A system configured to reassign labels to service tickets as being associated with the identified disturbance time window based on the generated disturbance-related probabilities.
- To determine the periodically aggregated rate of the baseline average value of expected service-related records under non-disturbance conditions for the aforementioned period, the processor further: Obtain a vector of random variables containing a large number of periodically aggregated rate values from the aforementioned service-related records, The vector is trimmed by removing a predetermined number of the highest periodically aggregated rate values and a predetermined number of the lowest periodically aggregated rate values from the predetermined period. By averaging the remaining data points in the aforementioned vector, a periodically aggregated rate estimate of the trimmed average is calculated, Applying an additional bias adjustment factor to the trimmed mean estimate, The system according to claim 1, configured to prevent the trimmed mean estimate from falling below a lower threshold by applying a lower threshold.
- To determine the normalized set of daily scores, the processor further: To determine the difference between the periodically aggregated rate of the baseline average of expected service-related tickets and the rate of the observed actual events, The system according to claim 2, configured to obtain each standardized periodically aggregated score by applying standardized normalization to the determined difference.
- In order to identify the disturbance time window, the processor further, The system according to claim 2, configured to separate the non-disturbant conditions of the period from the disturbant conditions of the period by applying a calibrated change point analysis.
- To apply the aforementioned calibrated change point analysis, the processor further: The process involves generating an automated restart cumulative sum chart based on the standardized set of periodically aggregated scores, reference parameters, restart points, and sensitivity threshold parameters. Determining the termination of an episode condition that establishes a limit on the termination time of the aforementioned disturbance condition, The system according to claim 4, configured to determine the end time of the disturbance time window based on the value of the automatic restart cumulative sum chart.
- To apply the calibrated change point analysis, the processor further: Determining the start of an episode condition that establishes a limit on the start time of the aforementioned disturbance condition, The system according to claim 5, configured to determine the start time of the disturbance time window based on the value of the automatic restart cumulative sum chart.
- The aforementioned processor further, Determining the labeling coverage for each service ticket in the set of service tickets during the aforementioned period, wherein the labeling coverage includes the percentage of missing labels or labels with temporal or spatial inconsistencies. The system according to any one of claims 1 to 6, configured to check the data quality of the service tickets generated during the period based on the labeling coverage.
- The aforementioned processor further, The system according to claim 6, configured to establish data quality checks using a metric that defines coverage of the labeled events included in a set of known disturbance periods.
- In order to relabel, the processor further, The system according to claim 6, configured to assign an environmental disturbance identifier (ID) to the service record.
- The hardware processor receives a set of service records from a data source , which includes one or more service records corresponding to events representing multiple disturbances in a specified area occurring during a certain period, and one or more service records that are mislabeled or do not have labels associated with the disturbances. The hardware processor determines the observed actual event rate of events representing disturbances during the period, The hardware processor determines a periodically aggregated rate of the baseline average value of expected service-related records under non-disturbance conditions for the period, The hardware processor determines a set of standardized, periodically aggregated scores as a function of the periodically aggregated rate of the baseline average and the rate of observed actual events during the period. The hardware processor identifies a disturbance time window based on the standardized, periodically aggregated score changes detected during the period, The hardware processor generates disturbance-related probabilities for the service record corresponding to the identified disturbance time window, A computer implementation method comprising the steps of: reassigning a label to a service ticket as being associated with the identified disturbance time window based on the generated disturbance-related probabilities.
- The step of determining the periodically aggregated rate of the baseline mean of expected service-related records under non-disturbance conditions for the aforementioned period is: The hardware processor obtains a vector of random variables containing a large number of periodically aggregated rate values of the service-related records. The hardware processor performs a step of trimming the vector by removing a predetermined number of the highest periodically aggregated rate values and a predetermined number of the lowest periodically aggregated rate values from the predetermined period. The hardware processor calculates a periodically aggregated rate estimate of the trimmed average by averaging the remaining data points in the vector. The hardware processor applies an additional bias adjustment coefficient to the trimmed mean estimate. The computer implementation method according to claim 10, further comprising the step of preventing the trimmed mean estimate from falling below a lower limit by applying a lower limit threshold using the hardware processor.
- The step of determining the standardized, periodically aggregated set of scores is: The hardware processor determines the difference between the periodically aggregated rate of the baseline average value of expected service-related tickets and the rate of the observed actual events. The computer implementation method according to claim 11, comprising the step of obtaining each standardized periodically aggregated score by applying standard normalization to the difference determined by the hardware processor.
- The step of identifying the disturbance time window is, The computer implementation method according to claim 11 or 12, further comprising the step of separating the non-disturbant conditions of the period from the disturbant conditions of the period by applying a calibrated change point analysis using the hardware processor.
- The step of applying the calibrated change point analysis is: The hardware processor generates an automated restart cumulative sum chart based on the standardized set of periodically aggregated scores, reference parameters, restart points, and sensitivity threshold parameters. A step of determining the termination of an episode condition that establishes a limit on the termination time of the aforementioned disturbance condition, The computer implementation method according to claim 13, further comprising the step of determining the end time of the disturbance time window based on the value of the automatic restart cumulative sum chart using the hardware processor.
- The step of applying the calibrated change point analysis further, A step of determining the start of an episode condition that establishes a limit on the start time of the aforementioned disturbance condition, The computer implementation method according to claim 14, further comprising the step of determining the start time of the disturbance time window based on the value of the automatic restart cumulative sum chart.
- A computer program for probabilistic labeling, wherein the processor... A procedure for receiving a set of service records from a data source, which includes one or more service records corresponding to events representing multiple disturbances in a specified area occurring over a certain period, and one or more service records that do not have labels associated with mislabeled or associated disturbances. A procedure for determining the observed actual event rate of an event representing a disturbance during the aforementioned period, A procedure for determining the periodically aggregated rate of the baseline mean of expected service-related records under non-disturbance conditions for the aforementioned period, A procedure for determining a set of standardized, periodically aggregated scores as a function of the periodically aggregated rate of the baseline mean and the rate of observed actual events during the period, A procedure for identifying a disturbance time window based on changes in standardized, periodically aggregated scores detected during the aforementioned period, A procedure for generating disturbance-related probabilities for the service record corresponding to the identified disturbance time window, A computer program to perform a procedure for reassigning labels to service tickets as being associated with the identified disturbance time window, based on the generated disturbance-related probabilities.
- To determine the periodically aggregated rate of the baseline average value of expected service-related records under non-disturbance conditions for the aforementioned period, the computer program instructs the processor to: A procedure for obtaining a vector of random variables containing a large number of periodically aggregated rate values from the aforementioned service-related records, A procedure for trimming the vector by removing a predetermined number of the highest periodically aggregated rate values and a predetermined number of the lowest periodically aggregated rate values from the predetermined period, A procedure for calculating a periodically aggregated rate estimate of the trimmed average by averaging the remaining data points in the aforementioned vector, A procedure for applying an additional bias adjustment factor to the trimmed mean estimate, The computer program according to claim 16, which performs a procedure to prevent the trimmed mean estimate from falling below a lower threshold by applying a lower threshold.
- To determine the set of standardized daily scores, the computer program instructs the processor to: A procedure for determining the difference between the periodically aggregated rate of the baseline average of expected service-related tickets and the rate of the observed actual events, The computer program according to claim 17, which performs a procedure to obtain each standardized periodically aggregated score by applying standardized normalization to the determined difference.
- In order to identify the disturbance time window, the computer program instructs the processor to: The procedure for applying a calibrated change point analysis is to separate the non-disturbant conditions of the period from the disturbant conditions of the period, and the procedure for applying a calibrated change point analysis is to perform the following: A procedure for generating an automated restart cumulative sum chart based on the aforementioned standardized daily score set, criterion parameters, restart points, and sensitivity threshold parameters, A procedure for determining the termination of an episode condition that establishes a limit on the termination time of the aforementioned disturbance condition, A computer program according to claim 17 or 18, comprising the step of determining the end time of the disturbance time window based on the value of the automatic restart cumulative sum chart.
- To apply the calibrated change point analysis, the computer program instructs the processor to: A procedure for determining the start of an episode condition that establishes a limit on the start time of the aforementioned disturbance condition, The computer program according to claim 19, which performs a procedure for determining the start time of the disturbance time window based on the value of the automatic restart cumulative sum chart.
Description
This application relates, in general, to computers and computer applications, more specifically, to systems and methods for generating data useful for training machine learning models and performing predictions and forecasts. Environmental disturbances such as storms, blizzards, and electromagnetic hazards often cause asset failures or malfunctions, as well as associated power outages, leading to service quality issues. However, such failures can also occur during periods without environmental disturbances. That is, failures are generally not identifiable as being caused by data-driven disturbances, such as those resulting from limited available information due to data compilation, time constraints, or insufficient staff training. For better business management, it is always desirable that all failures be appropriately labeled or classified, for example, as a catastrophe. Examples of such disturbances can often be found in the case of infrastructure companies, such as power infrastructure for electric companies supplying electricity (weather storms), or sensor networks deployed in the wild for chemical manufacturing processes (electromagnetic hazards). Modern critical analyses require clean data for those malfunction or failure events. Therefore, events need to be automatically labeled to separate them into normal and environmental disturbance cases. From a data quality perspective, it is necessary to validate existing labels and fill in any missing labels. This document provides a general overview of a system framework for probabilistic labeling of events, and for the detection and identification of disturbances. The service ticket dataset used in data quality analysis before restoration is shown below. The output of the restored and corrected dataset includes records containing information on detected storms and also includes probabilistic labels. This specification provides a computer system for implementing a framework and methodology for improving the performance of a trouble ticket dataset according to the embodiments described herein. This method is implemented by a supervisory program that implements an automated task to (probabilistically) more accurately label trouble ticket data with disturbances (e.g., storms) performed by the system in Figure 3. A table is shown illustrating the characteristics of a ticket representing a storm in an exemplary embodiment. The system shown in Figure 3 illustrates the overall system architecture for identifying disturbances, probabilistic labeling, and assigning events (tickets) to previously known disturbances (known storms in specific sub-regions). This chart shows a time series plot of the period-aggregated SRT count rate (Y-axis) of a location-specific sub-region over time (X-axis). A table summarizing the notation and exemplary default values used in the change point analysis according to the embodiments of this specification is provided. A table is provided that outlines the multiple time-series data used in the analysis according to the embodiments of this specification. This shows the lifetime of disturbances such as storms, which have a life cycle involving three states. An exemplary CUSUM control plot of calculated CUSUM values plotted against time is shown to illustrate an aspect of applying a calibrated change point algorithm used to identify the boundaries of local disturbances. An exemplary CUSUM control plot of the calculated CUSUM value in the first alternative instance is shown. An exemplary CUSUM control plot of the calculated CUSUM value in a second alternative instance is shown. An exemplary CUSUM control plot of the calculated CUSUM value in a third alternative instance is shown. An exemplary pseudocode of a calibrated change-point method is shown, where time points are acquired as shown in Figure 10A to establish the storm boundary. This diagram shows a subset of data used in the verification process, specifically illustrating the categorization of tickets for a public utility company according to the embodiment of this specification. This figure shows the key metrics used in the verification process according to the embodiments of this specification. This flowchart illustrates the processes that may be executed by the processor to implement preventative measures using a machine learning model according to the embodiment. A schematic diagram of an exemplary computer or processing system that may implement probabilistic labeling of service tickets in one embodiment of the present disclosure is shown. In an embodiment, the system framework implements a method for probabilistic labeling of records or data related to the detection and identification of events (e.g., asset malfunctions, equipment failures, and power outages) and disturbances. The ability to reliably label asset malfunctions, failures, and power outages as being caused by external disturbances substantially improves data quality and the openness of additional analytical techniques,