EP-3948577-B1 - AUTOMATED MACHINE LEARNING ON THE BASIS OF STORED DATA

EP3948577B1EP 3948577 B1EP3948577 B1EP 3948577B1EP-3948577-B1

Inventors

WILKE, ANDREAS
KOMAROV, ILYA

Dates

Publication Date: 20260506
Application Date: 20200331

Claims (14)

A computer-implemented method for automated machine learning for motor vehicle monitoring, the method comprising: • providing a pre-trained learning module (120) for machine learning, • providing a database (104) managed by a multi-model database management system (118), wherein the database comprises a plurality of data records (108; DS1, ..., DS3) containing measurement data of a function of the motor vehicle, which are acquired by a motor vehicle computer system of the motor vehicle using sensors of the motor vehicle to acquire status data of the motor vehicle and are stored in a document-oriented data model (106), wherein the measurement data comprise: engine speed, vehicle speed, fuel consumption, exhaust emissions, transmission gear, error messages, and identifiers of electronic components of the motor vehicle, wherein the stored data records each comprise one or more field values, wherein the individual field values of the stored data records are each stored in a field (F1, ..., F8), wherein the database further comprises a searchable index (112) stored in a further data model (110), wherein the index comprises a plurality of tokens (109) generated from the field values of the stored data records, wherein each of the tokens in the index is linked to one or more pointers (115) to one or more of the data records stored in the document-oriented data model from whose field values the corresponding token was generated, wherein the pointers each provide access to the raw data relevant to the tokens, which are used to evaluate the corresponding tokens, wherein the tokens in the index are each assigned to one or more token types (111), wherein at least a subset of the corresponding token assignments (113) are classified as confirmed facts, wherein the remaining token assignments are classified as preliminary assumptions, • receiving an additional data record, which is a data record captured by the motor vehicle computer system of the motor vehicle using the sensors of the motor vehicle, • storing the additional data record, which comprises one or more additional field values, by the multi-model database management system in the document-oriented data model of the database, • generating one or more tokens from the additional field values, • assigning each of the additional tokens to one or more token types by the learning module, • classifying the individual token assignments of the additional tokens respectively as a confirmed fact or a preliminary assumption, wherein the classifying comprises: ∘ comparing the token assignments of the additional tokens with the index, ∘ if one of the token assignments of one of the additional tokens is included in the index classified as a confirmed fact, classifying the corresponding token assignment of the corresponding additional token as a confirmed fact, ∘ if one of the token assignments of one of the additional tokens is not included in the index or is included in the index classified as a preliminary assumption, classifying the corresponding token assignment of the corresponding additional token as a preliminary assumption, • supplementing the index by the multi-model database management system using the additional tokens, a pointer to the additional data record stored in the document-oriented data model, and the token assignments of the additional tokens, wherein the method further comprises reclassifying a token assignment in the index that has been classified as a preliminary assumption into a confirmed fact if the corresponding token assignment satisfies a predefined criterion, wherein the predefined criterion comprises that a plurality of token assignments, which are based on field values each located in the stored data records within a predefined distance from the field values on which the corresponding token assignment to be reclassified is based, comprise a predefined minimum proportion of token assignments classified as confirmed facts, wherein the learning module is configured for a consistency check for malfunction detection using the measurement data of the function of the motor vehicle and for enabling predictive maintenance, wherein the malfunction detection comprises identifying a cause of failure in the form of a failed vehicle component, wherein a requirement for a successful consistency check is that a checked data record comprising the measurement values comprises exclusively confirmed facts, wherein, in the course of the consistency check, preliminary assumptions are identified and highlighted as such, wherein the preliminary assumptions require explicit approval.
The computer-implemented method of claim 1, wherein the supplementing of the index comprises: • comparing the additional tokens with the index, • if any of the additional tokens is not included in the index, adding the corresponding additional token to its token assignments in the index and linking the corresponding additional token in the index to the pointer to the additional data record stored in the document-oriented data model, • if any of the token assignments of an additional token included in the index are not included in the index, adding the corresponding token assignment with the corresponding additional token to the index and linking the corresponding additional token in the index to the pointer to the additional data record stored in the document-oriented data model, • if one of the additional tokens is included in the index along with its token assignments, linking the corresponding additional token in the index to the pointer to the additional data record stored in the document-oriented data model.
The computer-implemented method of any of the preceding claims, wherein the token assignments of the individual tokens in the index are each provided with a flag indicating whether the corresponding token assignment of the corresponding token is a confirmed fact or a preliminary assumption.
The computer-implemented method of any of the preceding claims, wherein the learning module, for assigning the additional tokens to token types, determines, in each case based on the index, which token assignments for the corresponding additional token are already included in the index, and uses the token types thus determined for the assigning, and/or wherein the learning module, for assigning the additional tokens to token types, determines, in each case based on the index, all stored data records in which the corresponding additional token is included, and uses the data records thus determined for the assigning, and/or wherein the learning module, for assigning one of the additional tokens to token types, draws upon further additional tokens, determines for each of the further additional tokens, based on the index, all further stored data records in which the corresponding further additional tokens are included, and uses the further data records thus determined for the assigning.
The computer-implemented method of any of the preceding claims, wherein a change to a token assignment classified as a confirmed fact occurs only on the basis of one or more further token assignments classified as confirmed facts, and/or wherein the pointers, linked to which the tokens are stored in the index, each point to one or more of the field values in the stored data records.
The computer-implemented method of any of the preceding claims, wherein the generating of the tokens comprises applying a tokenization logic (120) to the field values of the additional data record, which comprises a full-text indexer configured to break down text into words and output the words as tokens, or wherein the generating of the tokens comprises applying a tokenization logic to the field values of the additional data record, which comprises a generic tokenizer configured to recognize data of different data types in the field values and to generate tokens of different data types therefrom.
The computer-implemented method of any of the preceding claims, wherein the field values of the additional data record comprise text data, image data, audio data, and/or video data.
The computer-implemented method of any of the preceding claims, wherein the method further comprises: • receiving a search query, wherein the search query includes a search value, • searching the index for the search value, • identifying a token within the index that is identical to the search value, • analyzing pointers linked to the identified token to determine one or more of the data records containing one or more field values from which the indexed token was generated, • returning the determined data records or one or more references to the determined data records in response to the search query.
The computer-implemented method of claim 8, wherein the search value further comprises an assignment to a token type, and the identifying of the token within the index further requires that the identified token has the same token assignment.
The computer-implemented method of any of claims 8 to 9, wherein, when searching the index, only token assignments and tokens with token assignments that are classified as confirmed facts are considered.
The computer-implemented method of any of the preceding claims, wherein the method further comprises pre-training the learning module, wherein the pre-training comprises: • providing a plurality of initial data records which are stored by the multi-model database management system in the document-oriented data model, wherein the stored initial data records each comprise one or more initial field values, • generating a plurality of initial tokens from the initial field values, • assigning each of the initial tokens to one or more initial token types, wherein all initial token assignments are defined as confirmed facts, • generating the searchable index using the plurality of initial tokens by the multi-model database management system in the further data model, wherein the generated index comprises the initial tokens, wherein each of the initial tokens in the index is linked to one or more pointers to one or more of the initial data records stored in the document-oriented data model from whose initial field values the corresponding initial token was generated, and wherein the initial tokens in the index each have one or more of the token assignments defined as confirmed facts.
The computer-implemented method of claim 11, wherein one or more of the initial token assignments defined as confirmed facts are provided as predefined assignments for the pre-training of the learning module, and/or wherein one or more of the initial token assignments defined as confirmed facts are determined by the learning module during the pre-training process.
The computer-implemented method of any of the preceding claims, wherein the index stores all tokens generated from the field values of the stored data records such that the index contains each token exactly once for each of the token assignments of the corresponding token, and/or wherein the further data model is structured such that the tokens and token assignments of the index stored in the further data model satisfy the fifth and/or sixth normal form, and/or wherein at least the document-based data model used by the multi-model database management system to store the data records is a NoSQL data model.
A motor vehicle computer system (100) for automated machine learning for motor vehicle monitoring, wherein the motor vehicle computer system comprises one or more processors (114), a database (104) provided by one or more data storage media (102), a multi-model database management system (118) that manages the database (104) and is configured to store a plurality of data records (108; DS1, ..., DS2) containing measurement data of a function of the motor vehicle in a document-oriented data model (106) in the data storage media, which are acquired by the motor vehicle computer system of the motor vehicle using sensors of the motor vehicle to acquire status data of the motor vehicle, wherein the measurement data comprise: engine speed, vehicle speed, fuel consumption, exhaust emissions, transmission gear, error messages, and identifiers of electronic components of the motor vehicle, wherein the stored data records each comprise one or more field values, wherein the individual field values of the stored data records are each stored in a field (F1, ..., F8), wherein the field values of the stored data records are each assigned to one or more field types from a plurality of different field types, a pre-trained learning module (120) for machine learning, and a program logic (116), wherein the database further comprises a searchable index (112) stored in a further data model (110), wherein the index comprises a plurality of tokens generated from the field values of the stored data records, wherein each of the tokens in the index is linked to one or more pointers (115) to one or more of the data records stored in the document-oriented data model from whose field values the corresponding token was generated, wherein the pointers each provide access to the raw data relevant to the respective tokens, which are used to evaluate the corresponding tokens, wherein the tokens in the index are each assigned to one or more token types (111), wherein at least a subset of the corresponding token assignments (113) are classified as confirmed facts, and the remaining token assignments are classified as preliminary assumptions, wherein the program logic (116) is configured to execute a method for automated machine learning, wherein the method comprises: • receiving an additional data record, which is a data record captured by the motor vehicle computer system of the motor vehicle using the sensors of the motor vehicle, • storing the additional data record, which comprises one or more additional field values, by the multi-model database management system in the document-oriented data model of the database, • generating one or more tokens from the additional field values, • assigning each of the additional tokens to one or more token types by the learning module, • classifying the individual token assignments of the additional tokens respectively as a confirmed fact or a preliminary assumption, wherein the classifying comprises: ∘ comparing the token assignments of the additional tokens with the index, ∘ if one of the token assignments of one of the additional tokens is included in the index classified as a confirmed fact, classifying the corresponding token assignment of the corresponding additional token as a confirmed fact, ∘ if one of the token assignments of one of the additional tokens is not included in the index or is included in the index classified as a preliminary assumption, classifying the corresponding token assignment of the corresponding additional token as a preliminary assumption, • supplementing the index by the multi-model database management system using the additional tokens, a pointer to the additional data record stored in the document-oriented data model, and the token assignments of the additional tokens, wherein the method further comprises reclassifying a token assignment in the index that has been classified as a preliminary assumption into a confirmed fact if the corresponding token assignment satisfies a predefined criterion, wherein the predefined criterion comprises that a plurality of token assignments, which are based on field values each located in the stored data records within a predefined distance from the field values on which the corresponding token assignment to be reclassified is based, comprise a predefined minimum proportion of token assignments classified as confirmed facts, wherein the learning module is configured for a consistency check for malfunction detection using the measurement data of the function of the motor vehicle and for enabling predictive maintenance, wherein the malfunction detection comprises identifying a cause of failure in the form of a failed vehicle component, wherein a requirement for a successful consistency check is that a checked data record containing the measurement values comprises exclusively confirmed facts, wherein, in the course of the consistency check, preliminary assumptions are identified and highlighted as such, and these preliminary assumptions require explicit approval.

Description

Die Erfindung betrifft ein Verfahren und ein Computersystem zum automatisierten maschinellen Lernen. Aus dem Stand der Technik sind Verfahren und Systeme zum maschinellen Lernen System bekannt. Solche Systeme lernen anhand von Beispielen und können diese Beispiele nach Beendigung der Lernphase verallgemeinern und auf bisher unbekannte Daten anwenden. Die zugrundeliegenden Beispiele werden dabei nicht auswendig gelernt, sondern es werden Muster und Gesetzmäßigkeiten innerhalb der als Lerndaten dienenden Beispiele ermittelt. Dies ermöglicht es entsprechenden Systemen im Zuge eines Lerntransfers die erlernten Muster und Gesetzmäßigkeiten zur Beurteilung bisher unbekannter Daten heranzuziehen. Bekannte Verfahren und Systeme für maschinelles Lernen arbeiten aufgrund der verwendeten Datenspeicherstrukturen im Allgemeinen nicht auf der gesamten zur Verfügung stehenden Datenmenge. Für das Lernen wird eine Auswahl an Beispielen getroffen, mit welchen das System in der Lernphase trainiert wird. Die aus der beschränkten Auswahl im Zuge des Lernens erfassten Muster und Gesetzmäßigkeiten werden dann sukzessiv auf Teile des restlichen Datenbestands bzw. neuerfasste Daten angewendet, wobei alle Daten gleichwertig behandelt werden. Die DE 10 2016 22 6338 A1 beschreibt ein computerimplementiertes Verfahren zur Datenklassifikation. Das Verfahren umfasst ein Bereitstellen einer Tokenmenge, die Token beinhaltet, die aus mehreren Feldwerten mehrerer Datensätze durch Tokenisierung erzeugt wurden, wobei die Token aus Feldwerten von mindestens zwei unterschiedlichen Feldtypen erzeugt wurden, wobei die Token in Form einer Bitsequenz gespeichert sind; eine Analyse von einem oder mehreren Merkmalen der Token auf der Ebene der Bitsequenz, um Teilmengen merkmalsähnlicher Token zu identifizieren, wobei die Merkmale die Bitsequenz der Token und/oder die Länge der Bitsequenz umfassen; ein Speichern einer Kopie jeder der Teilmengen merkmalsähnlicher Token in nach Teilmengen getrennter Form, wobei jede Teilmengenkopie jeweils eine Klasse merkmalsähnlicher Daten repräsentiert. Die DE 196 27 472 A1 beschreibt ein Verfahren zur Durchführung von Operationen in einem Datenbanksystem, in dem eine Vielzahl von Datensätzen in einem Speicher eines Computers gespeichert werden, wobei jeder Datensatz aus einer beliebigen Anzahl von Feldern besteht, die jeweils aus einer Feldbeschreibung als Metadaten und einer beliebigen Anzahl von Feldinhalten bestehen, und wobei bei jeder Speicherung eines Datensatzes in einem Speicher eines Computers die Feldinhalte zusammen mit den zugehörigen Metadaten als ein Datensatz abgespeichert werden. Die DE 10 2010 043265 A1 beschreibt ein computerimplementiertes Verfahren zum Indizieren von Daten zur Verwendung durch mehrere Anwendungen, welches ein Empfangen eines Datenobjekts in einer ersten Anwendung von mehreren Anwendungen umfasst. Es erfolgt ein Tokenisieren des Datenobjekts gemeinsamer Form, um Token aus dem Datenobjekt zu extrahieren und einen Index der aus dem Datenobjekt extrahierten Token zu erzeugen, wobei der Index formatiert wird, um durch jede der mehreren Anwendungen benutzt zu werden. Ferner wird der Index in einer Datenbank gespeichert, die den mehreren Anwendungen, welche zwei oder mehr Anwendungstypen umfassen, zugänglich ist. Die DE 10 2017 208084 A1 beschreibt ein erstes Datenverarbeitungssystem mit einem ersten DBMS und einer ersten Echtzeituhr. Das erste DBMS umfasst eine erste Datenbank mit einer Vielzahl von Datensätzen. Die Datensätze beinhalten jeweils mehrere Feldwerte und damit verknüpft gespeicherte Zeitstempel. Das erste DBMS ist konfiguriert zum: Empfangen eines ersten Schreibkommandos zum Ändern eines Feldwerts eines der Datensätze der ersten Datenbank; als Antwort auf den Empfang des ersten Schreibkommandos: Empfangen einer aktuellen Zeit von der ersten Echtzeituhr, Speichern einer Kopie des Datensatzes, dessen Feldwert durch das Schreibkommando geändert werden soll, in die ersten Datenbank, wobei die Datensatzkopie einen gemäß dem ersten Schreibkommando geänderten Feldwert anstatt des bisherigen Feldwerts aufweist, wobei der geänderte Feldwert mit einem Zeitstempel, der die von der ersten Echtzeituhr empfangene aktuelle Zeit angibt, verknüpft gespeichert ist, wobei ein oder mehrere der anderen Feldwerte der Datensatzkopie jeweils mit einem Zeitstempel verknüpft gespeichert sind, der eine andere Zeit angibt, oder wobei alle anderen Feldwerte der Datensatzkopie jeweils mit einem Zeitstempel, der die von der ersten Echtzeituhr bestimmte Zeit angibt, verknüpft gespeichert sind; wobei das erste DBMS eine Importschnittstelle beinhaltet, die zum Import von weiteren Datensätzen, die jeweils aus mehreren Feldwerten und mit den Feldwerten verknüpften Zeitstempeln bestehen, in die erste Datenbank ausgebildet ist. Der Erfindung liegt die Aufgabe zugrunde, ein verbessertes Verfahren zum automatisierten maschinellen Lernen zu schaffen. Die der Erfindung zugrundeliegende Aufgabe wird jeweils mit den