Search

US-12626180-B2 - Computing device, method and computer program product for generating training data for a machine learning system

US12626180B2US 12626180 B2US12626180 B2US 12626180B2US-12626180-B2

Abstract

A computing device, method and computer program product are provided to generate training data for a machine learning system including training data representative of one or more edge scenes. In the context of a computing device, the computing device includes a simulator configured in accordance with a sampling algorithm to create a plurality of different scenes, including one or more edge scenes, within a scenario that is at least partially defined by one or more parametric attributes. The computing device also includes a physics engine generate training data representative of the plurality of different scenes including the one or more edge scenes. The physics engine is configured to modify the one or more parametric attributes to generate additional and different training data based upon another plurality of different scenes created by the simulator within another scenario that is at least partially defined by one or more parametric attributes, as modified.

Inventors

  • Patrick Daniel DEES
  • Helen Amelia HAWKINS
  • Taylor S. LOPER
  • Grant A. ROSARIO

Assignees

  • THE BOEING COMPANY

Dates

Publication Date
20260512
Application Date
20191101

Claims (18)

  1. 1 . A computing device configured to generate artificial training data for a machine learning system, the computing device comprising: a simulator configured in accordance with a sampling algorithm to create a plurality of different scenes within a scenario that is at least partially defined by one or more parametric attributes, wherein the plurality of different scenes created by the simulator include one or more edge scenes, and wherein the one or more parametric attributes define respective conditions of an environment so as to at least partially define the scenario within which the plurality of different scenes are created; and a physics engine configured to receive values of the one or more parametric attributes from the simulator and generate artificial training data representative of the plurality of different scenes within the scenario, including the one or more edge scenes, wherein objects represented by the artificial training data are configured to be labeled by one or more individuals to facilitate training of the machine learning system with labeled training data, and wherein the one or more edge scenes comprise an edge scene that is not reflected by real world data and that is based on photo-realistic models in association with the edge scene not being reflected by the real world data, wherein the physics engine is configured to modify the one or more parametric attributes, by altering a weather characteristic of the one or more parametric attributes, in order to generate additional and different artificial training data based upon another plurality of different scenes created by the simulator within another scenario that is at least partially defined by the one or more modified parametric attributes, wherein the physics engine is configured to modify the one or more parametric attributes by random sampling of a value for a respective parametric attribute of the scenario in accordance with a probability distribution of the respective parametric attribute, and wherein the random sampling of the value is based on a Box-Behnken algorithm or a Face-Centered Cubic algorithm.
  2. 2 . The computing device according to claim 1 , wherein the physics engine is configured to generate artificial training data representative of the scenario based on the photo-realistic models to ensure accuracy of textures and physical dimensions.
  3. 3 . The computing device according to claim 1 , wherein the one or more parametric attributes comprise one or more of: a speed of a vehicle in the environment, a heading of the vehicle, a location of the vehicle, a location, speed and heading of other nearby vehicles, a location and direction of travel of one or more nearby pedestrians, or characteristics of a roadway along which the vehicle is traveling including a curvature of the roadway.
  4. 4 . The computing device according to claim 1 , wherein the edge scene is not reflected by the real world data based on a low probability of an occurrence of the edge scene in the real world data.
  5. 5 . The computing device according to claim 1 , wherein the artificial training data is representative of the plurality of different scenes within the scenario comprise artificial training data representative of flight of an aircraft through an environment.
  6. 6 . The computing device according to claim 1 , wherein the physics engine configured to modify the one or more parametric attributes is configured to create another plurality of different scenes within the environment defined by a respective sample brought about by transitioning from one sample generated by the simulator to another sample.
  7. 7 . The computing device according to claim 1 , wherein the edge scene relates to detection of an oncoming weapon.
  8. 8 . A method for generating artificial training data for a machine learning system, the method comprising: performing, by a simulator, a simulation in accordance with a sampling algorithm to create a plurality of different scenes within a scenario that is at least partially defined by one or more parametric attributes, wherein performing the simulation comprises creating one or more edge scenes, and wherein the one or more parametric attributes define respective conditions of an environment so as to at least partially define the scenario within which the plurality of different scenes are created; receiving, at a physics engine and from the simulator, values of the one or more parametric attributes; generating, by the physics engine, artificial training data representative of the plurality of different scenes within the scenario, including the one or more edge scenes, wherein objects represented by the artificial training data are configured to be labeled by one or more individuals to facilitate training of the machine learning system with labeled training data, and wherein the one or more edge scenes comprise an edge scene that is not reflected by real world data and that is based on photo-realistic models in association with the edge scene not being reflected by the real world data; and modifying, by the physics engine, the one or more parametric attributes, by altering a weather characteristic of the one or more parametric attributes, in order to generate additional and different artificial training data based upon another plurality of different scenes created by the simulation within another scenario that is at least partially defined by the one or more modified parametric attributes, wherein modifying the one or more parametric attributes comprises random sampling of a value for a respective parametric attribute of the scenario in accordance with a probability distribution of the respective parametric attribute, and wherein the random sampling of the value is based on a Box-Behnken algorithm or a Face-Centered Cubic algorithm.
  9. 9 . The method according to claim 8 , wherein generating the artificial training data comprises generating artificial training data representative of the scenario based on the photo-realistic models to ensure accuracy of textures and physical dimensions.
  10. 10 . The method according to claim 8 , further comprising training the machine learning system with the artificial training data representative of the plurality of different scenes within the scenario including the one or more edge scenes.
  11. 11 . The method according to claim 8 , wherein the parametric attributes comprise one or more of properties of a camera, properties of a speed sensor, or properties of a heading sensor.
  12. 12 . The method according to claim 8 , wherein the one or more parametric attributes comprise one or more of: a speed of a vehicle in the environment, a heading of the vehicle, a location of the vehicle, a location, speed and heading of other nearby vehicles, a location and direction of travel of one or more nearby pedestrians, or characteristics of a roadway along which the vehicle is traveling including a curvature of the roadway.
  13. 13 . A computer program product configured to generate artificial training data for a machine learning system, the computer program product comprising a non-transitory computer readable medium having program code stored thereon, the program code comprising program code instructions configured, upon execution, to: perform, by a simulator, a simulation in accordance with a sampling algorithm to create a plurality of different scenes within a scenario that is at least partially defined by one or more parametric attributes, wherein the simulation is performed so as to create one or more edge scenes, and wherein the one or more parametric attributes define respective conditions of an environment so as to at least partially define the scenario within which the plurality of different scenes are created; receive, at a physics engine and from the simulator, values of the one or more parametric attributes; generate, by the physics engine, artificial training data representative of the plurality of different scenes within the scenario, including the one or more edge scenes, wherein objects represented by the artificial training data are configured to be labeled by one or more individuals to facilitate training of the machine learning system with labeled training data, and wherein the one or more edge scenes comprises an edge scene that is not reflected by real world data and that is based on photo-realistic models in association with the edge scene not being reflected by the real world data; and modify, by the physics engine, the one or more parametric attributes, by altering a weather characteristic of the one or more parametric attributes, in order to generate additional and different artificial training data based upon another plurality of different scenes created by the simulation within another scenario that is at least partially defined by the one or more modified parametric attributes, wherein the program code instructions configured to modify the one or more parametric attributes comprise program code instructions configured to randomly sample a value for a respective parametric attribute of the scenario in accordance with a probability distribution of the respective parametric attribute, and wherein random sampling of the value is based on a Box-Behnken algorithm or a Face-Centered Cubic algorithm.
  14. 14 . The computer program product according to claim 13 , wherein the program code instructions configured to generate the artificial training data comprise program code instructions configured to generate artificial training data representative of the scenario based upon real world data.
  15. 15 . The computer program product according to claim 13 , wherein the program code instructions configured to generate the artificial training data comprise program code instructions configured to generate artificial training data representative of the scenario based on the photo-realistic models to ensure accuracy of textures and physical dimensions.
  16. 16 . The computer program product according to claim 13 , wherein the one or more parametric attributes define one or more properties of one or more sensors.
  17. 17 . The computer program product according to claim 13 , wherein the program code further comprises program code instructions configured to train the machine learning system with the artificial training data representative of the plurality of different scenes within the scenario including the one or more edge scenes.
  18. 18 . The computer program product according to claim 13 , wherein the parametric attributes comprise one or more of properties of a camera, properties of a speed sensor, or properties of a heading sensor.

Description

TECHNOLOGICAL FIELD The present disclosure relates generally to a computing device, a method and computer program product configured to generate training data for a machine learning system and, more particularly, to a computing device, a method and a computer program product for generating training data for a machine learning system with the training data being representative of a plurality of different scenes within a scenario including one or more edge scenes. BACKGROUND Machine learning is utilized in a number of applications with many more applications anticipated to be reliant upon machine learning in the future. For example, applications that utilize machine learning algorithms include applications configured to predict customer purchases, applications configured to identify objects in a scene and applications configured to protect against cyber-attacks, to name but a few. Machine learning systems must be trained in order to perform in an acceptable manner. As the minimum quantity of data required to train a machine learning algorithm to perform acceptably is unclear, many machine learning algorithms are trained upon a large quantity of data in order to increase the likelihood that the machine learning system will perform acceptably. As such, large quantities of data representative of the various scenarios that the machine learning system will encounter are required in order to train the machine learning systems. A number of resources are available that provide data sets that may be used to train machine learning systems for various applications. For example, resources are available that provide data sets to train machine learning systems for object classification applications, for applications that must recognize humans in various poses and for applications that interact with complex urban environments. Additionally, data sets are available that contain sensor data to facilitate the development of autonomous vehicles. However, each of these data sets is static. As such, the data sets are useful for training purposes so long as every scenario for which the machine learning system is to be trained is included in the data set, but the machine learning system will not be trained on any scenario that is not included within the data set. Thus, an application incorporating or otherwise dependent upon a machine learning system that has been trained with a static data set may be unable to identify or appropriately react to any such scenario that was not included in the training data set. The training data that is available for certain applications may be based upon real-world data. However, tools, such as simulators, have also been developed to generate artificial data sets for the training of machine learning systems. For example, the Car Learning to Act (CARLA) open source simulator is configured to create artificial data sets for autonomous driving research, the AirSim open source simulator from Microsoft, of Redmond, WA, USA, is configured to create artificial data sets for autonomous vehicles including drones and automobiles and the SynCity tool provided by CVEDIA PVE Ltd., of Singapore, is configured to create artificial data sets for other machine learning tasks. The artificial data sets created by these and other tools generally rely upon simulated input which, in at least some instances, may not be as detailed as the real-world data. This reduction in detail may be disadvantageous for machine learning systems that support certain applications, such as machine learning systems that support computer vision applications, that are reliant upon and that may make decisions dependent upon the analysis and/or identification of a fine level of detail. As a result, machine learning systems that have been trained utilizing artificial data, such as the artificial data generated by tools, such as open source simulators, may not perform as well as corresponding machine learning systems that have been trained based upon real-world data. As such, reliance upon real-world data for the training of machine learning systems may be advantageous, but such real-world data may not be available for all scenarios that an application that includes or is otherwise reliant upon a machine learning system may encounter, such as scenarios that seldomly occur, e.g. scenarios that may be dangerous, illegal or otherwise have a low probability of occurrence. BRIEF SUMMARY A computing device, method and computer program product are provided in accordance with an example in order to generate training data for a machine learning system. The computing device, method and computer program product are configured to generate training data representative of a plurality of different scenes within a scenario including one or more edge scenes, that is, those scenes that may not be represented by real-world data since the scenes may represent behavior that is dangerous or illegal or that otherwise have a low probability of occurrence, but for which