US-20260127702-A1 - COORDINATION AND INCREASED UTILIZATION OF GRAPHICS PROCESSORS DURING INFERENCE

US20260127702A1US 20260127702 A1US20260127702 A1US 20260127702A1US-20260127702-A1

Abstract

A mechanism is described for detecting, at training time, information related to one or more tasks to be performed by the one or more processors according to a training dataset for a neural network, analyzing the information to determine one or more portions of hardware of a processor of the one or more processors that is configurable to support the one or more tasks, configuring the hardware to pre-select the one or more portions to perform the one or more tasks, while other portions of the hardware remain available for other tasks, and monitoring utilization of the hardware via a hardware unit of the graphics processor and, via a scheduler of the graphics processor, adjusting allocation of the one or more tasks to the one or more portions of the hardware based on the utilization.

Inventors

Abhishek R. Appu
Liwei Ma
Elmoustapha Ould-Ahmed-Vall
Kamal Sinha
Joydeep Ray
Balaji Vembu
Sanjeev Jahagirdar
Vasanth Ranganathan
DUKHWAN KIM
Altug Koker
John C. Weast
Mike B. MacPherson
Linda L. Hurd
Sara S. Baghsorkhi
Justin E. Gottschlich
Prasoonkumar Surti
Chandrasekaran Sakthivel

Assignees

INTEL CORPORATION

Dates

Publication Date: 20260507
Application Date: 20251006

Claims (20)

1 - 20 . (canceled)
21 . A computing device comprising: sensor circuitry configured to receive data from a plurality of sensors including a camera sensor and a LiDAR sensor; and processing circuitry including a graphics processing unit, the processing circuitry configured to: extract a first feature from sensor data received from the camera sensor via a camera model and a second feature from sensor data received from the LiDAR sensor via a LiDAR model; combine the first feature and the second feature into a combined representation; and process the combined representation via a fusion neural network model to perform a computer vision task.
22 . The computing device of claim 21 , wherein the processing circuitry is configured to perform a fused object identification via the combined representation.
23 . The computing device of claim 21 , wherein the processing circuitry is configured to detect a sensor deficiency associated with at least one sensor of the plurality of sensors.
24 . The computing device of claim 23 , wherein in response to detecting the sensor deficiency, the processing circuitry is to prioritize data from remaining sensors.
25 . The computing device of claim 21 , wherein the processing circuitry is configured to utilize pre-analyzed training data to configure hardware associated with at least one sensor.
26 . The computing device of claim 21 , wherein the processing circuitry is further configured to schedule multiple inference processes on the graphics processing unit via a multi-context scheduler.
27 . The computing device of claim 21 , wherein the processing circuitry is configured to apply a filter to data received from at least one of the plurality of sensors to improve accuracy of the computer vision task.
28 . A method for processing sensor data in a computing device comprising: receiving, via sensor circuitry, data from a plurality of sensors including a camera sensor and a LiDAR sensor; extracting a first feature from camera sensor data using a camera model and a second feature from LiDAR sensor data using a LiDAR model; combining the first feature and the second feature into a combined representation; and processing the combined representation using a fusion neural network model executed via on a graphics processing unit of the computing device to perform a computer vision task.
29 . The method of claim 28 , wherein combining the first feature and the second feature includes performing fused object identification based on the combined representation.
30 . The method of claim 28 , further comprising detecting a sensor deficiency associated with at least one sensor of the plurality of sensors.
31 . The method of claim 30 , further comprising, in response to detecting the sensor deficiency, prioritizing data from remaining sensors for processing.
32 . The method of claim 28 , further comprising utilizing pre-analyzed training data to configure hardware associated with at least one sensor prior to receiving data therefrom.
33 . The method of claim 28 , wherein processing the combined representation includes scheduling multiple inference processes on the graphics processing unit via a multi-context scheduler.
34 . The method of claim 28 , further comprising applying a filter to data received from at least one sensor to improve accuracy of the computer vision task.
35 . A data processing system comprising: a memory device configured to store instructions; and processing circuitry including a graphics processing unit, the processing circuitry configured to: receive data from a plurality of sensors including a camera sensor and a LiDAR sensor; extract a first feature from sensor data received from the camera sensor via a camera model and a second feature from sensor data received from the LiDAR sensor via a LiDAR model; combine the first feature and the second feature into a combined representation; and process the combined representation via a fusion neural network model to perform a computer vision task.
36 . The data processing system of claim 35 , wherein the processing circuitry is configured to perform a fused object identification via the combined representation.
37 . The data processing system of claim 35 , wherein the processing circuitry is configured to detect a sensor deficiency associated with at least one sensor of the plurality of sensors.
38 . The data processing system of claim 37 , wherein in response to detecting the sensor deficiency, the processing circuitry is to prioritize data from remaining sensors.
39 . The data processing system of claim 35 , wherein the processing circuitry is configured to utilize pre-analyzed training data to configure hardware associated with at least one sensor.

Description

CROSS REFERENCE TO RELATED APPLICATIONS The present patent application is a continuation application of U.S. application Ser. No. 18/351,898, filed Jul. 13, 2023, which is a continuation of U.S. application Ser. No. 17/871,781, filed Jul. 22, 2022, issued as U.S. Pat. No. 11,748,841 on Sep. 5, 2023, which is a continuation of U.S. Pat. No. 11,430,082, issued on Aug. 30, 2022, which is a continuation of U.S. Pat. No. 10,891,707, issued on Jan. 12, 2021, which claims priority from U.S. Pat. No. 10,304,154, issued on May 28, 2019, the contents of which are incorporated herein in their entirety by reference. FIELD Embodiments described herein relate generally to data processing and more particularly to facilitate a tool for facilitating coordination and increased utilization of graphics processors during inference. BACKGROUND Current parallel graphics data processing includes systems and methods developed to perform specific operations on graphics data such as, for example, linear interpolation, tessellation, rasterization, texture mapping, depth testing, etc. Traditionally, graphics processors used fixed function computational units to process graphics data; however, more recently, portions of graphics processors have been made programmable, enabling such processors to support a wider variety of operations for processing vertex and fragment data. To further increase performance, graphics processors typically implement processing techniques such as pipelining that attempt to process, in parallel, as much graphics data as possible throughout the different parts of the graphics pipeline. Parallel graphics processors with single instruction, multiple thread (SIMT) architectures are designed to maximize the amount of parallel processing in the graphics pipeline. In an SIMT architecture, groups of parallel threads attempt to execute program instructions synchronously together as often as possible to increase processing efficiency. A general overview of software and hardware for SIMT architectures can be found in Shane Cook, CUDA Programming, Chapter 3, pages 37-51 (2013) and/or Nicholas Wilt, CUDA Handbook, A Comprehensive Guide to GPU Programming, Sections 2.6.2 to 3.1.2 (June 2013). Machine learning has been successful at solving many kinds of tasks. The computations that arise when training and using machine learning algorithms (e.g., neural networks) lend themselves naturally to efficient parallel implementations. Accordingly, parallel processors such as general-purpose graphics processing units (GPGPUs) have played a significant role in the practical implementation of deep neural networks. Parallel graphics processors with single instruction, multiple thread (SIMT) architectures are designed to maximize the amount of parallel processing in the graphics pipeline. In an SIMT architecture, groups of parallel threads attempt to execute program instructions synchronously together as often as possible to increase processing efficiency. The efficiency provided by parallel machine learning algorithm implementations allows the use of high capacity networks and enables those networks to be trained on larger datasets. Conventional techniques do not provide for coordination between inference output and sensors that are responsible for providing inputs; however, such conventional techniques do not provide for accuracy in inference output. Further, the use of inference over a graphics processor is rather light, while the rest of the graphics processor remains unutilized. BRIEF DESCRIPTION OF THE DRAWINGS So that the manner in which features described herein can be understood in detail, a more particular description may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments and are therefore not to be considered limiting of its scope, for the drawings may illustrate other equally effective embodiments. Accordingly, embodiments are illustrated by way of example, and not by way of limitation. In the figures of the accompanying drawings, like reference numerals refer to similar elements. FIG. 1 is a block diagram illustrating a computer system configured to implement one or more aspects of the embodiments described herein. FIG. 2A-2D illustrate parallel processor components, according to an embodiment. FIG. 3A-3B are block diagrams of graphics multiprocessors, according to embodiments. FIG. 4A-4F illustrate an exemplary architecture in which a plurality of graphics processing units are communicatively coupled to a plurality of multi-core processors. FIG. 5 illustrates a graphics processing pipeline, according to an embodiment. FIG. 6 illustrates a computing device hosting an inference coordination and processing utilization mechanism according to one embodiment. FIG. 7 illustrates an inference coordination and processing utilization mechanism according to one embodiment. FIG. 8A illustrates a transactio