KR-20260068063-A - Multi-resolution field representations in neural networks
Abstract
Certain aspects provide techniques and apparatuses for efficiently processing inputs in a neural network using multiple receptive field sizes. An exemplary method includes the step of partitioning a first input into a first set of channels and a second set of channels. In a first layer of the neural network, the first set of channels and the second set of channels are convolved into a first output having a dimensionality smaller than that of the first input. The first set of channels and the first output are convolved into a second input. The second input is convolved into a second output through a second layer of the neural network, wherein the second output merges the first receptive field generated by the first layer with a larger second receptive field generated by the second layer. One or more operations are taken based on at least one of the first output and the second output.
Inventors
- 바르드와즈, 카르티케야
- 자피, 피에로
- 와트머, 폴 니콜라스
- 로트, 크리스토퍼
- 가나파티, 비스와나트
- 파텔, 치락 수레시바이
- 소리아가, 조셉 비나미라
Assignees
- 퀄컴 인코포레이티드
Dates
- Publication Date
- 20260513
- Application Date
- 20240725
- Priority Date
- 20230915
Claims (20)
- As a processing system, At least one memory where executable instructions are stored; and It includes one or more processors configured to be coupled to communicate with at least one memory and to execute the executable instructions, and The above one or more processors allow the processing system, Partitioning the first input into a first set of channels and a second set of channels; In the first layer of the neural network, a first set of channels and a second set of channels are convolved into a first output having a dimension smaller than the dimension of the first input; Connecting the first set of the above channels and the first output as a second input to the second layer of the neural network; Convolving the second input into the second output through the second layer of the above neural network - The second output merges the first receptive field generated by the first layer of the neural network with the second receptive field generated by the second layer of the neural network, and The first acceptance field covers a larger acceptance field in the first input than the second acceptance field. In order to take one or more actions based on at least one of the first output and the second output. A processing system configured to execute the above-mentioned executable commands.
- A processing system according to claim 1, wherein the first set of channels and the second set of channels comprise adjacent portions of the same size of the first input.
- A processing system according to claim 1, wherein the first output has a size corresponding to the size of the first set of channels or the size of the second set of channels.
- A processing system according to claim 1, wherein the second output has a size corresponding to the size of the first set of channels or the size of the second set of channels.
- A processing system according to claim 1, wherein, in order to connect a first set of channels and a first output to a second input, the one or more processors are configured to cause the processing system to reference a first set of channels and connect the first output.
- A processing system according to claim 1, wherein the one or more processors are further configured to cause the processing system to discard at least a portion of the first input based at least partially on portions of the first input used to convolve the second input to the second output.
- A processing system according to claim 6, wherein at least a portion of the first input is discarded based additionally on portions of the first input used to perform one or more additional convolutions for layers of the neural network deeper than the second layer of the neural network.
- A processing system according to claim 1, wherein, for partitioning the first input, the one or more processors are configured to cause the processing system to partition the first input such that the first set of channels has a number of channels different from the second set of channels.
- A processing system according to claim 1, wherein, in order to convolve the second input into a second output through the second layer of the neural network, the one or more processors are configured to cause the processing system to process a first set of channels based on identity weights between the input and output of the second layer of the neural network and to process the second input based on convolution weights defined in the second layer of the neural network.
- In paragraph 1, the one or more processors allow the processing system to, Connecting the first set of the above channels and the second output as a third input to the third layer of the neural network; The neural network is additionally configured to convolve the third input into the third output through the third layer of the neural network, and The third output merges the first receptive field generated by the first layer of the neural network, the second receptive field generated by the second layer of the neural network, and the third receptive field generated by the third layer of the neural network; The third receiving field covers a smaller receiving field at the first input than the first receiving field and the second receiving field; A processing system in which the above one or more operations are taken at least partially additionally based on the third output.
- As a processor-implementation method, A step of partitioning the first input into a first set of channels and a second set of channels; In the first layer of the neural network, a step of convolving a first set of channels and a second set of channels into a first output having a dimension smaller than the dimension of the first input; A step of concatenating a first set of the above channels and the first output as a second input to a second layer of the neural network; A step of convolving the second input into the second output through the second layer of the above neural network - The second output merges the first receptive field generated by the first layer of the neural network with the second receptive field generated by the second layer of the neural network, and The first receiving field covers a larger receiving field in the first input than the second receiving field; and A method comprising the step of taking one or more actions based on at least one of the first output and the second output.
- In claim 11, the method wherein the first set of channels and the second set of channels comprise adjacent portions of the same size of the first input.
- A method according to claim 11, wherein the first output has a size corresponding to the size of the first set of channels or the size of the second set of channels.
- A method according to claim 11, wherein the second output has a size corresponding to the size of the first set of channels or the size of the second set of channels.
- In claim 11, the step of connecting the first set of channels and the first output to the second input comprises a reference to the first set of channels and the step of connecting the first output.
- A method according to claim 11, further comprising the step of discarding at least a portion of the first input based at least partially on portions of the first input used to convolve the second input to the second output.
- A method according to claim 16, wherein at least a portion of the first input is discarded based additionally on portions of the first input used to perform one or more additional convolutions for layers of the neural network deeper than the second layer of the neural network.
- A method according to claim 11, wherein the step of partitioning the first input comprises the step of unevenly partitioning the first input such that the first set of channels has a different number of channels than the second set of channels.
- A method according to claim 11, wherein the step of convolving the second input to the second output through the second layer of the neural network comprises: processing a first set of channels based on identity weights between the input and output of the second layer of the neural network; and processing the second input based on convolution weights defined in the second layer of the neural network.
- In Paragraph 11, A step of concatenating a first set of the above channels and the second output as a third input to a third layer of the neural network; and The method further includes the step of convolving the third input into the third output through the third layer of the neural network, and The third output merges the first receptive field generated by the first layer of the neural network, the second receptive field generated by the second layer of the neural network, and the third receptive field generated by the third layer of the neural network; The third receiving field covers a smaller receiving field at the first input than the first receiving field and the second receiving field; A method in which the above one or more operations are taken at least partially additionally based on the third output.
Description
Multi-resolution field representations in neural networks Cross-reference to related application(s) This application claims priority to U.S. Patent Application No. 18/468,203 filed September 15, 2023, which is incorporated herein by reference. The aspects of the present disclosure relate to neural networks, and more specifically to multi-resolution receiving fields in neural networks. Neural networks, such as convolutional neural networks, are used for various tasks including object detection in visual content, segmentation of visual content, processing of data containing objects with different dimensions (e.g., spatially and/or temporally), and similar tasks. To perform these tasks, these neural networks can be trained to recognize objects at different resolutions (e.g., different spatial and/or temporal resolutions). For example, when analyzing visual content, objects located at different distances from a reference plane (e.g., the surface of an imaging device capturing the visual content) may have different sizes in the captured visual content due to their different distances from the reference plane, even though these objects may be the same size in real life. Because similar objects may have different resolutions in the data provided as input to the neural network, neural networks are generally trained to analyze input data at different resolutions. Small resolution layers can be used, for example, to recognize small objects close to a reference plane or larger objects located further away from the reference plane. Meanwhile, larger resolution layers can be used to recognize larger objects close to the reference plane or much larger objects located further away from the reference plane. In doing so, data can be shared across different layers of the neural network, which can create various bottlenecks (e.g., due to memory access patterns) that increase the amount of time the neural network takes to perform the task. Specific aspects of the present disclosure provide a method for efficiently processing inputs using a plurality of field resolutions in a neural network. An exemplary method generally comprises the step of partitioning a first input into a first set of channels and a second set of channels. In a first layer of the neural network, the first set of channels and the second set of channels are convolved into a first output having a dimensionality smaller than that of the first input. The first set of channels and the first output are convolved into a second input to a second layer of the neural network. The second input is convolved into a second output through the second layer of the neural network, wherein the second output merges a first receptive field generated by the first layer with a second receptive field generated by the second layer, and the second receptive field covers a receptive field larger than the first receptive field in the first input. One or more operations are taken based on at least one of the first output and the second output. Other aspects provide a processing system configured to perform the methods described above as well as those described in this specification; non-transient computer-readable media comprising instructions that, when executed by one or more processors of the processing system, cause the processing system to perform the methods described above as well as those described in this specification; a computer program product implemented on a computer-readable storage medium comprising code for performing the methods described above as well as those further described in this specification; and a processing system comprising means for performing the methods described above as well as those further described in this specification. The following description and related drawings describe in detail specific exemplary features of one or more aspects. The attached drawings depict only specific aspects of the present disclosure and are not to be construed as limiting the scope of the present disclosure. FIG. 1 illustrates an exemplary pipeline for efficiently processing inputs in a neural network using a plurality of reception field sizes, according to aspects of the present disclosure. FIG. 2 illustrates an exemplary layer in a neural network for processing inputs in a neural network using a plurality of receiving field sizes, according to aspects of the present disclosure. FIGS. 3a, 3b, 3c, and 3d illustrate in-memory operations performed for depth-first processing of inputs in a neural network using a plurality of receptive field sizes according to aspects of the present disclosure. FIG. 4 illustrates exemplary operations for efficiently processing inputs in a neural network using a plurality of receiving field sizes according to aspects of the present disclosure. FIG. 5 illustrates an exemplary processing system configured to perform various aspects of the present disclosure. For ease of understanding, the same reference numerals have been used to designate ide