WO-2026092927-A1 - METHOD FOR PERSISTING DATA
Abstract
A method for persisting data for assessing a technical installation is proposed. This involves a plurality of data sets being recorded, each of which comprises a plurality of measured variables, each measured variable comprising a plurality of measured values. The recorded data sets are stored in a data memory. The recorded data sets are clustered, data sets that have a relatively high degree of similarity to one another being identified as belonging to a common cluster. At least one data set from each identified cluster is selected, and at least one data set from at least one identified cluster is deselected. All measured values in the selected data sets are retained in the data memory, and all measured values in the deselected data sets are erased from the data memory.
Inventors
- Köhler, Stefan
Assignees
- SEW-EURODRIVE GMBH & CO KG
Dates
- Publication Date
- 20260507
- Application Date
- 20250925
- Priority Date
- 20241031
Claims (15)
- 1. A method for persisting data for the evaluation of a technical system, wherein a plurality of data records (10) are recorded, each comprising a plurality of measured variables, wherein each measured variable comprises a plurality of measured values, and wherein the recorded data records (10) are stored in a data storage device (20), characterized in that a clustering of the recorded data records (10) is carried out, wherein data records (10) which exhibit a relatively high degree of similarity to each other are recognized as belonging to a common cluster, and that at least one data record (10) is selected from each recognized cluster, and that at least one data record (10) is deselected from at least one recognized cluster, and that all measured values of the selected data records (10) are retained in the data storage device (20), and that all measured values of the deselected data records (10) are deleted from the data storage device (20).
- 2. Method according to one of the preceding claims, characterized in that a dimensionality reduction of the recorded data sets (10) is carried out before the clustering of the recorded data sets (10) is carried out.
- 3. Method according to one of the preceding claims, characterized in that data records (10) which do not exhibit a sufficiently high degree of similarity to any of the other data records (10) are selected.
- 4. Method according to one of the preceding claims, characterized in that the data records (10) are recorded during a defined period of time and stored in the data storage (20), and that the clustering of the recorded data records (10) is carried out after the defined period of time has elapsed.
- 5. Method according to one of the preceding claims, characterized in that a compressed representation of the deselected data sets (10) is calculated from the measured values of the deselected data sets (10) and stored in the data storage (20).
- 6. Method according to one of the preceding claims, characterized in that a compressed representation of the selected data sets (10) is calculated from the measured values of the selected data sets (10) and stored in the data storage (20).
- 7. Method according to one of claims 5 to 6, characterized in that the compressed representation of the data sets (10) is calculated by lossy compression of the measured values of the data sets (10), or that the compressed representation of the data sets (10) is calculated using a machine learning model trained on the measured values of the data sets (10), or that the compressed representation of the data sets (10) is calculated by line simplification of the measured values of the data sets (10).
- 8. Method according to one of the preceding claims, characterized in that aggregated values are calculated from the measured values of the deselected data records (10) before their deletion, and that the aggregated values of the deselected data records (10) are stored in the data storage (20). - 15 -
- 9. Method according to one of the preceding claims, characterized in that aggregated values are calculated from the measured values of the selected data sets (10), and that the aggregated values of the selected data sets (10) are stored in the data storage (20).
- 10. Method according to one of the preceding claims, characterized in that exactly one data record (10) is selected from at least one detected cluster, and that a number of data records (10) belonging to the cluster is determined, and that the determined number of data records (10) belonging to the cluster is assigned to the selected data record (10) and stored in the data storage (20).
- 11. Method according to one of the preceding claims, characterized in that exactly one data record (10) is selected from at least one identified cluster, and that a variance of the data records (10) belonging to the cluster is determined, and that the determined variance of the data records (10) belonging to the cluster is assigned to the selected data record (10) and stored in the data storage (20).
- 12. Method according to one of the preceding claims, characterized in that several data sets (10) are selected from at least one detected cluster, each of which is located at an edge of the respective cluster.
- 13. Method according to one of the preceding claims, characterized in that several data records (10) are selected from at least one detected cluster, wherein distances between the selected data records (10) are maximally large.
- 14. Method according to one of the preceding claims, characterized in that several data records (10) are selected from at least one detected cluster, wherein a clustering of the data records (10) of the cluster is carried out, wherein Data records (10) which exhibit a relatively high similarity to each other are recognized as belonging to a common subcluster, and at least one data record (10) is selected from each recognized subcluster. - 16 -
- 15. A method according to any of the preceding claims, characterized in that several data records (10) are selected from at least one identified cluster, wherein the diversity of the selected data records (10) is maximized, and wherein the diversity of the data records (10) is calculated on the basis of a probability by using sampling via a Determinant Point Process. ```
Description
Methods for persisting data Description: The invention relates to a method for persisting data for the evaluation of a technical system, wherein a plurality of data sets are recorded, each comprising a plurality of measured variables, wherein each measured variable comprises a plurality of measured values, and wherein the recorded data sets are stored in a data storage device. Technical systems, such as drive systems, comprise a multitude of technical components, including an electric motor, a converter or inverter to generate three-phase alternating current for the electric motor, and a gearbox to reduce the motor's speed. Examples of such technical systems include rotary tables, conveyor belts, and stacker cranes. After extended operation, the components of these systems can malfunction due to wear and tear. It is known to monitor such technical systems by recording and evaluating specific measurements at set times. If the recorded measurements deviate significantly from predefined target values, a fault in the system is assumed, and a corresponding message is sent to the operator. For example, current generated by the converter or inverter that exceeds a defined limit may indicate stiffness in the gearbox due to wear. Condition monitoring involves collecting a relatively large amount of data from a technical system for later analysis. This data consumes storage space and incurs costs. A proven strategy to reduce costs is to aggregate the raw data directly and store only the aggregated data. However, in certain cases, historical raw data is also relevant, for example, for training machine learning models or for visualizing the raw data to identify anomalies in the event of a fault. ISI \ EIDOPAT 25.09.2025 Typical procedures are known from the following documents: - PAN, Feng, et al. Finding representative set from massive data. In: Fifth IEEE International Conference on Data Mining (ICDM'05). IEEE, 2005. p. 8 pp. - Autoencoder. In: Wikipedia, The Free Encyclopedia. URL: https://de. wikipedia. org/w/index.php?title=Autoencoder&oldid=238636189. - JOSIGER, Marcus; KIRCHNER, Kathrin. Modern clustering algorithms—a comparative analysis on two-dimensional data. In: Proc. FGML Workshop (FGML 2003), p. 2003. p. SO- 84. - Data compression. In: Wikipedia, The Free Encyclopedia. URL: https://de. wikipedia. org/w/index.php?title=Data Compression&oldid=240585389. - LIU, Tong, et al. High-ratio lossy compression: Exploring the autoencoder to compress scientific data. IEEE Transactions on Big Data, 2021, Volume 9, No. 1, pp. 22-36. - Envelope analysis. Last updated: February 27, 2024. URL: https://sensemore.io/de/hullkurvenanalyse/ archived in: https://web.archive.Org/web/20240227231426/https://sensemore.io/de/hullkurvenanalyse/ on February 27, 2024. - K-means algorithm. In: Wikipedia, The Free Encyclopedia. Revision as of 9. URL: https://de. wikipedia. org/w/index.php?title=K-MeansAlgorithm&oldid=225216308. - Determinantal point process. In: Wikipedia, The Free Encyclopedia. Revision as of 26 May 2023. URL: https://de.wikipedia.org/w/index.php?title=Determinantal_point_process&oldid=234045844. The invention is based on the objective of further developing a method for persisting data. The problem is solved by a method for persisting data with the features specified in claim 1. Advantageous embodiments and further developments are the subject of the dependent claims. A method for persisting data for the evaluation of a technical system is proposed. This involves recording multiple data sets, each comprising multiple measured variables, and each measured variable comprising multiple measured values. The recorded data sets are stored in a data repository. The recorded data sets are then clustered, with data sets exhibiting a relatively high degree of similarity being identified as belonging to the same cluster. At least one data set is extracted from each identified cluster. A data record is selected, and at least one data record is deselected from at least one identified cluster. All measured values of the selected data records are retained in the data store, and all measured values of the deselected data records are deleted from the data store. The method according to the invention is particularly suitable for persisting data for the evaluation of a technical system. The method according to the invention allows relevant historical measurement values to be persisted while redundant or less relevant data are discarded. This results in a significant reduction in the amount of data to be stored, with virtually no loss of information. According to an advantageous embodiment of the invention, a dimensional reduction of the recorded data sets is carried out before the clustering of the recorded data sets is performed. Dimensional reduction is performed, for example, using principal component analysis. Principal component analysis is a mathematical method for approximating a large number of statistical variables by using a sm