US-12627592-B2 - Methods and systems for scheduling and routing on direct connect topologies
Abstract
Aspects of this disclosure relate to determining a first bandwidth of one or more intracluster links between one or more nodes of a first cluster; determining a second bandwidth of one or more intercluster links between the first cluster and one or more clusters; determining an equivalency between the one or more intracluster links and the one or more intercluster links based on the first bandwidth and the second bandwidth; and preparing a routing schedule based on the equivalency.
Inventors
- Prithwish Basu
- Joud Khoury
- Siddharth Pal
Assignees
- RTX BBN TECHNOLOGIES, INC.
Dates
- Publication Date
- 20260512
- Application Date
- 20240105
Claims (17)
- 1 . A method for managing links in a network, the method comprising: determining a first bandwidth of one or more intracluster links between one or more nodes of a first cluster; determining a second bandwidth of one or more intercluster links between the first cluster and one or more clusters; determining an equivalency between the one or more intracluster links and the one or more intercluster links based on the first bandwidth and the second bandwidth; preparing a routing schedule based on the equivalency; and sending a message from a first node to a second node in the network based on the routing schedule, wherein sending the message from the first node to the second node in the network includes: partitioning the message into a first portion and a second portion; sending, from the first node to the second node, the first portion via a first link at a first timestep; and sending, from the first node to the second node, the second portion via a second link at a second timestep, the second timestep following the first timestep.
- 2 . The method of claim 1 wherein preparing the routing schedule based on the equivalency includes: determining one or more virtual links between the one or more nodes of the first cluster, the one or more virtual links being based on the one or more intracluster links and the equivalency.
- 3 . The method of claim 2 wherein determining the equivalency between the one or more intracluster links and the one or more intercluster links includes: determining a baseline bandwidth corresponding to a bandwidth of at least one of the one or more intercluster links; and determining the first bandwidth as a multiple of the baseline bandwidth.
- 4 . The method of claim 3 wherein the equivalency is the multiple and a total quantity of the one or more virtual links is based on the multiple and a quantity of the one or more intracluster links.
- 5 . The method of claim 1 further comprising: determining that a link between a first node and a second node is not functioning properly; and responsive to determining that the link is not functioning properly, redetermining the routing schedule.
- 6 . The method of claim 5 wherein the link not functioning properly corresponds to the link not providing a bandwidth for which the link is rated.
- 7 . The method of claim 1 further comprising: determining a ratio of the first bandwidth to the second bandwidth; based on the ratio, determining a cost of sending a message via a link between a first node and a second node in the network; and determining a runtime of an operation on the network based on the cost.
- 8 . The method of claim 7 wherein determining the cost includes: determining a first cost of one or more intercluster links between the first node and the second node; determining a second cost of one or more intracluster links between the first node and the second node; and based on the first cost, the second cost, and a size of the message, determining the cost.
- 9 . The method of claim 1 further comprising: determining one or more first splitting coefficients; determining one or more second splitting coefficients; determining the first portion based on the one or more first splitting coefficients; determining the second portion based on the one or more first splitting coefficients; determining the first link based on the one or more second splitting coefficients; and determining the second link based on the one or more second splitting coefficients.
- 10 . The method of claim 9 wherein the one or more first splitting coefficients are determined based on one or more constraints including: a workload constraint limiting the amount of data received by a node to a maximum amount of data, a coefficient limiting constraint limiting a sum of the one or more first splitting coefficients and the one or more second splitting coefficients to one, and a third constraint limiting each splitting coefficient of the one or more first splitting coefficients and the one or more second splitting coefficients to be greater than or equal to zero and less than or equal to one.
- 11 . A non-transitory computer-readable medium containing thereon instructions for instructing at least one processor to manage a network, the instructions instructing the at least one processor to: determine a first bandwidth of one or more intracluster links between one or more nodes of a first cluster; determine a second bandwidth of one or more intercluster links between the first cluster and one or more clusters; determine an equivalency between the one or more intracluster links and the one or more intercluster links based on the first bandwidth and the second bandwidth, wherein the instructions instructing the at least one processor to determine the equivalency further instruct the at least one processor to: determine a baseline bandwidth corresponding to a bandwidth of at least one of the one or more intercluster links; and determine the first bandwidth as a multiple of the baseline bandwidth; prepare a routing schedule based on the equivalency, wherein the instructions instructing the at least one processor to prepare the routing schedule based on the equivalency further instruct the at least one processor to: determine one or more virtual links between the one or more nodes of the first cluster, the one or more virtual links being based on the one or more intracluster links and the equivalency; and send a message from a first node to a second node in the network based on the routing schedule.
- 12 . The non-transitory computer-readable medium of claim 11 wherein the instructions further instruct the at least one processor to: determine that a link between a first node and a second node is not functioning properly; and responsive to determining that the link is not functioning properly, redetermine the routing schedule.
- 13 . The non-transitory computer-readable medium of claim 11 wherein the instructions further instruct the at least one processor to: determine a ratio of the first bandwidth to the second bandwidth; based on the ratio, determine a cost of sending a message via a link between a first node and a second node in the network; and determine a runtime of an operation on the network based on the cost.
- 14 . The non-transitory computer-readable medium of claim 11 wherein the instructions further instruct the at least one processor to: send the message from the first node to the second node in the network based on the routing schedule by: partitioning the message into a first portion and a second portion; sending, from the first node to the second node, the first portion via a first link at a first timestep; and sending, from the first node to the second node, the second portion via a second link at a second timestep, the second timestep following the first timestep.
- 15 . The non-transitory computer-readable medium of claim 14 wherein the instructions further instruct the at least one processor to: determine one or more first splitting coefficients; determine one or more second splitting coefficients; determine the first portion based on the one or more first splitting coefficients; determine the second portion based on the one or more first splitting coefficients; determine the first link based on the one or more second splitting coefficients; and determining the second link based on the one or more second splitting coefficients.
- 16 . The non-transitory computer-readable medium of claim 14 wherein the instructions further instruct the at least one processor to determine one or more first splitting coefficients based on one or more constraints including: a workload constraint limiting the amount of data received by a node to a maximum amount of data, a coefficient limiting constraint limiting a sum of the one or more first splitting coefficients and one or more second splitting coefficients to one, and a third constraint limiting each splitting coefficient of the one or more first splitting coefficients and the one or more second splitting coefficients to be greater than or equal to zero and less than or equal to one.
- 17 . A system for managing links on a network, the system comprising: a first cluster; one or more clusters coupled to the first cluster; a controller configured to control the first cluster and the one or more clusters by: determining a first bandwidth of one or more intracluster links between one or more nodes of the first cluster; determining a second bandwidth of one or more intercluster links between the first cluster and the one or more clusters; determining an equivalency between the one or more intracluster links and the one or more intercluster links based on the first bandwidth and the second bandwidth; determining a ratio of the first bandwidth to the second bandwidth; based on the ratio, determining a cost of sending a message via a link between a first node and a second node in the network; determining a runtime of an operation on the network based on the cost; preparing a routing schedule based on the equivalency and the runtime; and sending a message from a first node to a second node in the network based on the routing schedule.
Description
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT This application was made with government support under Contract No. HR001120C0089, owned by the Department of Defense. The U.S. Government may have certain rights in this invention. BACKGROUND 1. Field of the Disclosure At least one example in accordance with the present disclosure relates generally to scheduling and routing of data in direct connect topologies. 2. Discussion of Related Art In machine learning, high end computation, and other computationally intensive applications, direct connect topologies may facilitate simultaneous transmission of data in a timely manner between nodes in the computational network, thereby allowing for efficient use of nodes in the computational network. SUMMARY According to at least one aspect of the present disclosure, a method for managing links in a network is presented, the method comprising: determining a first bandwidth of one or more intracluster links between one or more nodes of a first cluster; determining a second bandwidth of one or more intercluster links between the first cluster and one or more clusters; determining an equivalency between the one or more intracluster links and the one or more intercluster links based on the first bandwidth and the second bandwidth; and preparing a routing schedule based on the equivalency. In some examples, preparing a routing schedule based on the equivalency includes: determining one or more virtual links between the one or more nodes of the first cluster, the one or more virtual links being based on the one or more intracluster links and the equivalency. In some examples, determining an equivalency between the one or more intracluster links and the one or more intercluster links includes: determining a baseline bandwidth corresponding to a bandwidth of at least one of the one or more intercluster links; and determining the first bandwidth as a multiple of the baseline bandwidth. In some examples, the equivalency is the multiple and a total quantity of the one or more virtual links is based on the multiple and a quantity of the one or more intracluster links. In some examples, the method further comprises determining that a link between a first node and a second node is not functioning properly; and responsive to determining that the link is not functioning properly, redetermining the routing schedule. In some examples, the link not functioning properly corresponds to the link not providing a bandwidth for which the link is rated. In some examples, the method further comprises determining a ratio of the first bandwidth to the second bandwidth; based on the ratio, determining a cost of sending a message via a link between a first node and a second node in the network; and determining a runtime of an operation on the network based on the cost. In some examples, determining the cost includes: determining a first cost of one or more intercluster links between the first node and the second node; determining a second cost of one or more intracluster links between the first node and the second node; and based on the first cost, the second cost, and a size of the message, determining the cost. In some examples, the method further comprises sending a message from a first node to a second node in the network based on the routing schedule. In some examples, sending a message from the first node to the second node in the network includes: partitioning the message into a first portion and a second portion; sending, from the first node to the second node, the first portion via a first link at a first timestep; sending, from the first node to the second node, the second portion via a second link at a second timestep, the second timestep following the first timestep. In some examples, the method further comprises determining one or more first splitting coefficients; determining one or more second splitting coefficients; determining the first portion based on the one or more first splitting coefficients; determining the second portion based on the one or more first splitting coefficients; determining the first link based on the one or more second splitting coefficients; and determining the second link based on the one or more second splitting coefficients. In some examples, the one or more first splitting coefficients are determined based on one or more constraints including a workload constraint limiting the amount of data received by a node to a maximum amount of data, a coefficient limiting constraint limiting a sum of the one or more first splitting coefficients and the one or more second splitting coefficients to one, and a third constraint limiting each splitting coefficient of the one or more first splitting coefficients and the one or more second splitting coefficients to be greater than or equal to zero and less than or equal to one. According to at least one aspect of the present disclosure, a non-transitory computer-readable medium containing thereon instructions