US-20260127043-A1 - METHOD AND SYSTEM FOR PROCESSING TASK IN PARALLEL

US20260127043A1US 20260127043 A1US20260127043 A1US 20260127043A1US-20260127043-A1

Abstract

A method for processing tasks in parallel is performed by at least one processor, and includes performing a first task associated with a first instruction, determining whether the first instruction is a burst load instruction, in response to determining that the first instruction is the burst load instruction, acquiring a second instruction, and performing a second task associated with the acquired second instruction, in which the first task and the second task are performed in parallel.

Inventors

Hyunho Kim
Jinseok Kim
Jinwook Oh

Assignees

REBELLIONS INC.

Dates

Publication Date: 20260507
Application Date: 20251219
Priority Date: 20230320

Claims (20)

1 . A method for processing tasks in parallel, the method being performed by at least one processor and comprising: performing a first task associated with a first instruction; determining whether the first instruction is a burst load instruction; in response to determining that the first instruction is the burst load instruction, acquiring a second instruction; and performing a second task associated with the acquired second instruction, wherein the first task and the second task are performed in parallel, wherein the second instruction is the burst load instruction, and wherein a difference between a burst size of the second instruction and a burst size of the first instruction is within a threshold range.
2 . The method according to claim 1 , wherein the performing the second operation includes generating a plurality of requests based on the burst size of the second instruction.
3 . The method according to claim 2 , wherein the second task is generated in a pipeline structure that includes a plurality of instructions associated with the generating the plurality of requests and a plurality of instructions associated with executing the plurality of requests.
4 . The method according to claim 2 , wherein the generating the plurality of requests includes: identifying a destination associated with the second instruction; and storing the generated plurality of requests in a request queue, which is associated with the identified destination, of the plurality of request queues.
5 . The method according to claim 4 , further comprising, after the generating the plurality of requests: identifying a storage area, which is associated with the identified destination, of a plurality of storage areas; and storing, in the identified storage area, data issued based on the requests stored in the request queue associated with the destination.
6 . The method according to claim 1 , wherein the second task starts after a predetermined cycle from a cycle in which the first task starts.
7 . The method according to claim 1 , further comprising, after the performing the second task: acquiring a third instruction; and performing a third task associated with the acquired third instruction, wherein the first task and the third task are performed in parallel.
8 . The method according to claim 7 , wherein the acquiring the third instruction includes: in response to determining that a burst load instruction with a different destination from each of the first instruction and the second instruction is waiting, determining the waiting burst load instruction to be the third instruction; fetching the determined third instruction; and decoding the fetched third instruction.
9 . The method according to claim 7 , wherein each of the first instruction, the second instruction, and the third instruction is an instruction with a different destination to each other.
10 . The method according to claim 7 , wherein the second task and the third task start before a fourth task for modulating data written in a cache is performed.
11 . The method according to claim 1 , further comprising, after the performing the second task, in response to data being written to a cache, performing a fourth task to modulate the written data, wherein the second task and the fourth task are performed in parallel.
12 . A processing system comprising: a memory that stores data associated with at least one instruction; and at least one load unit configured to perform an access operation to the memory, wherein, in response to a first task associated with a burst load instruction being performed, the at least one load unit is configured to perform an additional second task, and perform the first task and the second task in parallel, wherein the second task is a task associated with the burst load instruction, and wherein a difference between a first size of a burst load instruction associated with the first task and a second size of a burst load instruction associated with the second task is within a threshold range.
13 . The processing system according to claim 12 , wherein, in response to the first task being a task associated with the burst load instruction, the at least one load unit is configured to fetch an instruction associated with the second task and decode an instruction associated with the fetched second task.
14 . The processing system according to claim 12 , wherein the at least one load unit is configured to generate a plurality of requests based on the second size.
15 . The processing system according to claim 14 , wherein the second task is generated in a pipeline structure that includes a plurality of instructions associated with the generating the plurality of requests and a plurality of instructions associated with executing the plurality of requests.
16 . The processing system according to claim 14 , wherein the at least one load unit is configured to identify a destination associated with the second task and store the generated plurality of requests in a request queue, which is associated with the identified destination, of a plurality of request queues.
17 . The processing system according to claim 16 , wherein the at least one load unit is configured to identify a storage area, which is associated with the identified destination, of a plurality of storage areas and store, in the identified storage area, data issued based on the requests stored in the request queue associated with the destination.
18 . The processing system according to claim 12 , wherein the at least one load unit is configured to additionally acquire an instruction and perform a third task associated with the acquired instruction, and perform the third task and the first task in parallel.
19 . The processing system according to claim 18 , wherein the at least one load unit is configured to start the second task and the third task before performing a fourth task for modulating data written to a cache.
20 . The processing system according to claim 12 , wherein the at least one load unit is configured to perform a fourth task to modulate written data in response to the data being written to a cache.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is a continuation of U.S. application Ser. No. 18/389,680, filed on Sep. 19, 2023, which claims priority under 35 U.S.C § 119 to Korean Patent Application No. 10-2023-0035788, filed in the Korean Intellectual Property Office on Mar. 20, 2023, the entire contents of which are hereby incorporated by reference. TECHNICAL FIELD The present disclosure relates to a method for processing tasks in parallel, and specifically, to a method and system for processing tasks in parallel in a processing system operating based on instructions. BACKGROUND A processing system operating based on instructions loads data and sends the results of execution/operation based on the loaded data to a designated destination. To increase the throughput of a processing system that operates based on instructions, pipeline may be used. Pipelining is a technique to improve the performance of a processing system by continuously processing data. However, when a plurality of burst load instructions in a pipeline structure are fetched, the processing system must load multiple data associated with the burst load instructions during a plurality of cycles. Meanwhile, before the loading of all data associated with the burst load instruction completes, a stall may occur and the processing system waits without processing subsequent processes (e.g., modulation operation process) associated with the burst load instruction. If the stall occurs, the throughput of the processing system may decrease. SUMMARY In order to solve the problems described above, the present disclosure provides a method for, a non-transitory computer-readable recording medium for storing instructions for, and an apparatus (system) for processing tasks in parallel. The present disclosure may be implemented in a variety of ways, including methods, apparatus (systems) and/or non-transitory computer readable storage media storing instructions. A method for processing tasks in parallel, in which the method may be performed by at least one processor and may include performing a first task associated with a first instruction, determining whether the first instruction is a burst load instruction, in response to determining that the first instruction is the burst load instruction, acquiring a second instruction, and performing a second task associated with the acquired second instruction, in which the first task and the second task may be performed in parallel. In addition, the second instruction may be the burst load instruction, and a difference between a burst size of the second instruction and a burst size of the first instruction may be within a threshold range, and the performing the second operation may include generating a plurality of requests based on the burst size of the second instruction. In addition, the second task may be generated in a pipeline structure that includes a plurality of instructions associated with the generating the plurality of requests and a plurality of instructions associated with executing the plurality of requests. In addition, the generating the plurality of requests may include identifying a destination associated with the second instruction, and storing the generated plurality of requests in a request queue, which is associated with the identified destination, of the plurality of request queues. In addition, the method for processing tasks in parallel may further include, after the generating the plurality of requests, identifying a storage area which is associated with the identified destination, of a plurality of storage areas, and storing, in the identified storage area, data issued based on the requests stored in the request queue associated with the destination. In addition, the second task may start after a predetermined cycle from a cycle in which the first task starts. In addition, the method for processing tasks in parallel may further include, after the performing the second task, acquiring a third instruction, and performing a third task associated with the acquired third instruction, in which the first task and the third task may be performed in parallel. In addition, the acquiring the third instruction may include in response to determining that a burst load instruction with a different destination from each of the first instruction and the second instruction is waiting, determining the waiting burst load instruction to be the third instruction, fetching the determined third instruction, and decoding the fetched third instruction. In addition, each of the first instruction, the second instruction, and the third instruction may be an instruction with a different destination to each other. In addition, the second task and the third task may start before a fourth task for modulating data written in a cache is performed. In addition, the method for processing tasks in parallel may further include, after the performing the second task, in response to data being written to the cache, perf