CN-121997711-A - Model optimization method based on bare metal

CN121997711ACN 121997711 ACN121997711 ACN 121997711ACN-121997711-A

Abstract

The invention discloses a model optimization method based on bare metal, which comprises the following steps of S1, initializing and adapting the bare metal hardware environment, S2, dispatching and optimizing calculation force, and S3, optimizing data transmission. The method has the beneficial effects that the utilization rate of the GPU is improved to more than 90% from 60% of the existing scheme, the utilization rate of the CPU is improved to more than 80% from 55% and the utilization rate of the memory bandwidth is improved to more than 95% from 70% by initializing and adapting the bare metal hardware environment and optimizing calculation force and data transmission.

Inventors

YANG JIE

Assignees

甘肃燧弘绿色算力有限公司

Dates

Publication Date: 20260508
Application Date: 20251223

Claims (5)

1. A model optimization method based on bare metal is characterized by comprising the following steps: S1, initializing and adapting a bare metal hardware environment; S2, dispatching and optimizing calculation force; And S3, optimizing data transmission.
2. The method for optimizing a model based on bare metal according to claim 1, wherein the step S2 further comprises the steps of: S21, acquiring calculation force load data of the GPU/CPU in real time through a calculation force monitoring module; s22, constructing a computational load prediction model based on a long-term and short-term memory network, and inputting load data to predict future load change trend; and S23, carrying out scheduling optimization according to the load prediction result.
3. The method for optimizing a model based on bare metal according to claim 2, wherein the step S22 further comprises the steps of: S22.1, preprocessing the collected original load data to construct a supervised learning sample; s22.2, constructing a long-term and short-term memory network model and training the long-term and short-term memory network model; s22.3, converting the data into tensor format, inputting the tensor format into a trained long-term and short-term memory network model, and outputting a prediction sequence by the model; s22.4 converting the normalized result of the predicted sequence into an actual load value, ; S22.5, adopting moving average filtering to the predicted 60 time steps of load values, analyzing the load change trend of 1 minute in the future, and outputting the judging result to the dynamic calculation scheduling module.
4. The method for optimizing a model based on bare metal according to claim 3, wherein the step S22.1 further comprises the steps of: S22.1.1 cleaning the original load data, and calculating the average value of each index And standard deviation Reject out beyond The data of the range is complemented by adopting a linear interpolation method to the missing data; S22.1.2 mapping the preprocessed three-dimensional load data to The section is subjected to normalization processing, ; Wherein, the As the raw data is to be processed, As a historical minimum value of the index, Is the historical maximum of the index; s22.1.3 constructing a supervised learning sample to generate a training set, a verification set and a test set.
5. The method for optimizing a model based on bare metal according to claim 4, wherein in step S22.2, the long-term memory network model comprises an input layer, a hidden layer, a dropout layer, a full connection layer and an output layer.

Description

Model optimization method based on bare metal Technical Field The invention relates to the technical field of artificial intelligence, in particular to a model optimization method based on bare metal. Background Along with the rapid development of artificial intelligence technology, the parameter scale of large models (such as GPT series, LLaMA series, religion series, etc.) is continuously enlarged, and the demands on hardware computing power, storage bandwidth and data transmission efficiency are increasingly raised. The current deployment mode of the large model is mainly divided into two types, namely virtualized deployment and bare metal deployment, wherein the bare metal deployment is to directly deploy the large model on a bare metal server without a virtualization layer, so that the hardware performance can be exerted to the greatest extent. However, existing bare metal deployment schemes have the following problems: 1) The compatibility requirements of the large model on CPU architectures (such as x86 and ARM), GPU models (such as NVIDIA A100 and AMD MI 250) and memory types (such as DDR5 and HBM 3) are high, and the existing scheme lacks customized optimization strategies aiming at different hardware combinations, so that the utilization rate of hardware resources is lower than 60%; 2) In the training or reasoning process of the large model, the problem of unbalanced computational load exists, such as that part of the GPU is in a full-load state, part of the GPU is in an idle state, and the existing scheduling algorithm, such as polling scheduling and priority scheduling, cannot dynamically adapt to the computational power requirement of the large model, so that the overall calculation efficiency is low; 3) The existing bare metal scheme does not optimize a network protocol for the data transmission characteristics of a large model, so that the data transmission delay accounts for more than 30% of the whole training time. Disclosure of Invention The invention aims to overcome the defects of the prior art and provides a model optimization method based on bare metal. The invention aims at realizing the technical scheme that the model optimization method based on bare metal comprises the following steps of: S1, initializing and adapting a bare metal hardware environment; S2, dispatching and optimizing calculation force; And S3, optimizing data transmission. Preferably, in step S2, the method further includes the steps of: S21, acquiring calculation force load data of the GPU/CPU in real time through a calculation force monitoring module; s22, constructing a computational load prediction model based on a long-term and short-term memory network, and inputting load data to predict future load change trend; and S23, carrying out scheduling optimization according to the load prediction result. Preferably, in step S22, the method further comprises the steps of: S22.1, preprocessing the collected original load data to construct a supervised learning sample; s22.2, constructing a long-term and short-term memory network model and training the long-term and short-term memory network model; s22.3, converting the data into tensor format, inputting the tensor format into a trained long-term and short-term memory network model, and outputting a prediction sequence by the model; s22.4 converting the normalized result of the predicted sequence into an actual load value, ; S22.5, adopting moving average filtering to the predicted 60 time steps of load values, analyzing the load change trend of 1 minute in the future, and outputting the judging result to the dynamic calculation scheduling module. Preferably, in step S22.1, the method further comprises the steps of: S22.1.1 cleaning the original load data, and calculating the average value of each index And standard deviationReject out beyondThe data of the range is complemented by adopting a linear interpolation method to the missing data; S22.1.2 mapping the preprocessed three-dimensional load data to The section is subjected to normalization processing, ; Wherein, the As the raw data is to be processed,As a historical minimum value of the index,Is the historical maximum of the index; s22.1.3 constructing a supervised learning sample to generate a training set, a verification set and a test set. Preferably, in step S22.2, the long-short-term memory network model includes an input layer, a hidden layer, a dropout layer, a full connection layer, and an output layer. The invention has the advantages that the utilization rate of the GPU is improved to more than 90% from 60% of the existing scheme, the utilization rate of the CPU is improved to more than 80% from 55% and the utilization rate of the memory bandwidth is improved to more than 95% from 70% by initializing and adapting the bare metal hardware environment and optimizing calculation force and data transmission. Drawings And no. Detailed Description For the purpose of making the objects, technical solutions and a