Search

CN-121985381-A - Space-sky-ground integrated Internet of vehicles task unloading method based on RAG enhanced DRL

CN121985381ACN 121985381 ACN121985381 ACN 121985381ACN-121985381-A

Abstract

The task unloading method is applied to an air-to-ground integrated vehicle networking scene consisting of a ground base station, an unmanned aerial vehicle small base station and a low-orbit satellite, wherein the ground base station and the unmanned aerial vehicle small base station provide signal coverage of a local range, and the signal of the low-orbit satellite covers the whole urban road network. According to the real-time position and the communication network condition, the vehicle dynamically performs local reasoning or offloads tasks to the ground, the unmanned aerial vehicle and the satellite to perform collaborative reasoning. The MEC controller deploys a DRL model for task scheduling decisions. Constructing a dynamic retrieval enhancement generation RAG system based on a Model Context Protocol (MCP), wherein a static road communication knowledge base, track prediction service and a real-time traffic situation are packaged into a retrievable knowledge source; meanwhile, the MEC controller integrates an RAG mechanism, uniformly accesses knowledge sources through the MCP gateway, and provides knowledge support for task scheduling decisions of the DRL.

Inventors

  • SHEN HANG
  • GUO TONGLIANG
  • WANG TIANJING
  • BAI GUANGWEI

Assignees

  • 南京工业大学

Dates

Publication Date
20260505
Application Date
20251122

Claims (7)

  1. 1. A task unloading method of an air-space-ground integrated vehicle networking based on a RAG enhanced DRL is characterized in that the task unloading method is applied to an air-space-ground integrated vehicle networking SAGVN scene consisting of a ground base station, an unmanned aerial vehicle small base station and a low-orbit satellite, wherein the ground base station and the unmanned aerial vehicle small base station provide signal coverage of a local range, and the signal of the low-orbit satellite covers the whole urban road network; the MEC controller deploys a deep reinforcement learning model DRL for task scheduling decision; And meanwhile, the MEC controller integrates a RAG mechanism, uniformly accesses the knowledge source through an MCP gateway, and provides knowledge support for task scheduling decision of the DRL in a non-stationary environment.
  2. 2. The method for task offloading an air-space-ground integrated vehicle networking based on a RAG enhanced DRL of claim 1, wherein in SAGVN scene, RAG framework based on MCP assistance The set of ground base stations, drones and satellites are respectively denoted as 、 、 The vehicle and the base station set are respectively represented as , The vehicle can only establish connection with one base station in the same time slot; according to the timeliness and the acquisition mode of knowledge, SAGVN environmental knowledge is divided into three types: 1) Static knowledge comprising long-term stable expert knowledge of road topography features, base station deployment topology and historical congestion rules, wherein an entry index set in a road network knowledge base is expressed as Knowledge item The center position of the corresponding geographic area is expressed as The interference degree description of the topographic features of the current position to the communication performance of the base station is expressed as Including topography features, base station deployment conditions, communication propagation effects, and peak period congestion communication descriptions expressed as Describing the historical traffic congestion characteristics of the current road section in the peak period and the pressure of the current road section on the communication of the base station, and knowledge items The corresponding complete domain knowledge is ; 2) Dynamic knowledge is classified into the following two types according to the generation mode: a) Computational knowledge, i.e. prospective knowledge generated by calculation from a trajectory prediction model based on the vehicle At the moment of time Previous historical trace data The track prediction result provides a prospective decision basis for the selection of the cooperative base station, and the vehicles are ensured to be continuously in the coverage range of the base station during task reasoning; b) Perception knowledge of real-time perceived accident information by IoT sensor, road side unit RSU, vehicle terminal and satellite link, etc. at time Road emergency information acquired by the MEC controller through the edge node perception network is expressed as Attributes including event type, location coordinates, severity, and time stamp; When the vehicle is At the moment of time The generated reasoning task is uploaded to the base station The MEC controller obtains decision support information by accessing static and dynamic knowledge services through the MCP gateway, namely, retrieving a knowledge base to obtain topographic communication characteristics and historical congestion rules, calling a track prediction service to obtain prospective prediction knowledge, and inquiring real-time traffic situation service to obtain emergency information; Communication model Order the Representing a vehicle With a base station Time-varying decay factor of (1), assuming at time instant Vehicle, vehicle and method for controlling the same With a base station The Euclidean distance between them is expressed as The channel gain between the two is expressed as (1), Wherein, the A reference point for calibrating the channel gain for a reference distance; Is that Channel gain at; determining the path loss index according to the type of the base station; Order the For base stations Based on the position coordinates of (a) Vehicle, vehicle and method for controlling the same In the future The set of movement trajectories within the time window is represented as Then, the vehicle At the position of The position of the moment is predicted as Based on this, the vehicle With a base station At the position of The channel gain at time is predicted as (2), Order the And For vehicles And base station The frequency spectrum resources in the base station are distributed to each vehicle in the unit of mutually orthogonal sub-channels, the bandwidth of each sub-channel is assumed to be Order-making Representing allocation to tasks Then, the vehicle To the base station Submitting tasks The uplink transmission rate at the time is calculated as (3), Wherein, the Representing average background noise; Order the Representing allocation to tasks Calculating the number of sub-channels of the result, base station To vehicles Return task The downlink transmission rate based on the predicted channel gain is predicted as (4), End-to-end computation delay modeling Tasks Comprising a set of characterization parameters Wherein Representing the task data size (bits), Representing tasks Is a time delay constraint of (1); A. task offloading latency The task offloading delay represents the time required by the receiving base station to transfer tasks to the controller offloading queue and to transfer to the cooperating base station processing queue The set of tasks collected within the coverage area is expressed as The total element number of the collection is Order-making machine Representing a vehicle With a base station Establish a connection otherwise The average time length of the task uploaded from the vehicle to the base station is calculated as III(5), Modeling task arrival between a single vehicle and a base station as a poisson process; Vehicle order The arrival rate of the generating task is The arrival rate of the tasks in the MEC controller unloading queue is (6), The offloading queue processes only one task at a time, the task offloading process is modeled as an M/M/1 queue model, and the service strength of the MEC controller offloading queue is defined as (7), Enqueue of the dequeue is determined by the task arrival rate, dequeue is determined by the transmission rate, and in order to ensure the stability of the queue, the service strength must meet (8), Arranged in tasks The previous set of dequeued tasks is represented as Suppose a vehicle Tasks generated The time delay from the base station upload to the MEC controller and the controller forward to the cooperative base station is Task(s) Is calculated as the unloading delay of (2) (9), B. Task reasoning time delay The task reasoning time delay is the time taken by the task to be processed in the local processing queue of the vehicle end for the local processing task; Let binary variable And Respectively represent task offloading to cooperative base stations Processing and processing locally at the vehicle end, otherwise, 0; Vehicle with a vehicle body having a vehicle body support Task is carried out Offloading to a cooperative base station The time spent making the reasoning is expressed as Vehicle (B) Local reasoning task The time spent is expressed as Then the cooperative base station The average time length of all tasks in the processing queue is calculated as (10), The MEC controller distributes the tasks in the unloading queue to different collaborators for processing, the arrival of the tasks in each collaborator processing queue follows the Poisson process, and a binary variable is made Representing the controller to drive the vehicle Generated tasks Allocation to cooperative base stations Processing, otherwise 0, controller assigned to base station The duty cycle of (2) is expressed as (11), Cooperative base station The arrival rate of the tasks in the processing queue is The processing queue is modeled as an M/M/1 queue model, a cooperative base station Is defined as (12), The service strength of the processing queue must be satisfied (13), Cooperative base station processes tasks in queues The previous set of task indexes is Suppose a vehicle Tasks generated The time delay required by the cooperative base station or the vehicle processing is Then task Task reasoning time delay of (1) is calculated as (14), C. Result return delay The result feedback time delay represents the time required by the result data to be transmitted back to the vehicle from the cooperative base station after the task reasoning is completed, and the cooperative base station Middle task Back transmission vehicle The result return delay spent in this process is calculated as (15), Tasks The total delay of (a) includes task unloading delay, task processing delay and task return delay, expressed as (16), Problem modeling Defining a binary variable (17), If a task In delay constraint Is finished at the bottom Otherwise ; Order the A set of task scheduling policies expressed as ; In the whole process from the completion of task processing to the completion of result receiving, the vehicle must be continuously located in the coverage area of the cooperative base station, if the vehicle still does not receive the task result when leaving the coverage area of the cooperative base station, the vehicle is regarded as task failure even if the time delay of the task is not exceeded Generated tasks Is represented as a cooperating base station candidate set of (c) (18), Wherein, the Represented by Time vehicle With a base station A predicted distance between the two; The following binary variables are defined , If it is , Otherwise, 0; Under the constraint of task delay, the optimization goal of task scheduling is to maximize the number of successfully completed tasks, modeled as : (19), (19a), (19b), (19c), (19d), And (19e), Constraint (19 a) ensures that each vehicle can only connect with a unique base station for transmission and cannot establish connection with a plurality of base stations at the same time; constraint (19 b) ensures that each task can only be allocated to one cooperative base station process; under constraint (19 c), each task can only select one task processing mode; constraint (19 d) ensures that the selected cooperative base station can continuously cover the vehicle in the whole task processing and result transmission process, and particularly in a high-speed moving scene, task failure caused by moving the vehicle out of the coverage of the base station is avoided; the constraint (19 e) ensures the stability of the controller offload queues and the individual base station processing queues.
  3. 3. The space-to-earth integrated internet of vehicles task offloading method of a DRL based on RAG enhancement as claimed in claim 2, wherein the task offloading in SAGVN scenarios is implemented by adopting a collaborative reasoning task offloading method of RAG-DRL: in offline training, the RAG accesses static and dynamic knowledge through the MCP gateway in a unified way, converts the knowledge in the heterogeneous traffic field into decision basis for DRL, and realizes task unloading decision driven by knowledge; Firstly, the RAG acquires static knowledge and real-time traffic situation through an MCP gateway, and acquires topographic communication characteristics, historical congestion rules and sudden accident information, wherein the heterogeneous knowledge is understood and inferred by LLM to generate quantized environment perception factors, and road network situation enhancement information is provided for DRL offline training and online reasoning; Meanwhile, candidate collaborators which maintain connection and are screened out by track knowledge are used for restricting the unloading action of the DRL.
  4. 4. The space-sky integrated internet of vehicles task offloading method of a DRL based on RAG enhancement according to claim 3, wherein in the collaborative reasoning task offloading method using RAG-DRL, the MCP-based trajectory knowledge acquisition method predicts the vehicle trajectory using a space-time diagram neural network model STGNN, and registers to the MCP gateway in the form of service to provide a unified trajectory prediction service for RAG; STGNN model pre-training: construction of local traffic map with target vehicle as center Adjacent matrix The edge weight of the vehicle is determined by the reciprocal of the distance between vehicles, and vehicles with closer distances are given greater weight to reflect stronger space interaction influence; LSTM network is used for processing time sequence track data to capture time characteristic of vehicle movement ; Aggregating neighborhood vehicle information on a space-time diagram constructed by the GCN network to obtain spatial features reflecting inter-vehicle interaction relation ; In the feature fusion process, the temporal and spatial features are passed through a pair of learnable weight parameters And Linear combinations to generate a comprehensive spatio-temporal representation ; Future generation through multi-layer perceptron mapping Predicted trajectory over time (20), Wherein, the And The learnable parameters of the time convolution and the graph convolution respectively, Is a bias term; the mean square error is used to construct a loss function (22), Wherein, the Representing a vehicle At the position of Real track of moment; The parameters of STGNN model are updated by AdamW optimizer in training process, the parameters are initialized randomly, then the gradient information based on the loss function is optimized continuously, the learning rate is started from initial value, and the model is gradually converged to the optimal solution after entering decay period after preheating stage; The trajectory prediction model serves: The method comprises an offline training stage, a service registration stage, an online calling stage and a MEC controller, wherein the model STGNN completes pre-training on a historical track dataset, the model STGNN is used as an MCP tool to register to a gateway, a standardized prediction interface is exposed to the outside, and the MEC controller obtains the future through the MCP gateway in the online calling stage Track prediction results within a time window.
  5. 5. The space-sky integrated internet of vehicles task offloading method of a DRL based on RAG enhancement according to claim 3, wherein in the collaborative reasoning task offloading method using RAG-DRL, the vector library construction and multisource knowledge retrieval method is as follows: Constructing a multisource knowledge retrieval mechanism to acquire heterogeneous traffic knowledge, wherein a unified vector representation space is constructed for integrating static knowledge, and real-time traffic situation is dynamically acquired through an MCP gateway, so that comprehensive knowledge support is provided for task scheduling decision-making; Knowledge of the field Is converted into a vector representation To support similarity calculation and quick retrieval; assume that the boundary values of longitude and latitude ranges of the coverage area of the urban road network are respectively 、 、 And Order-making machine Is expressed as the longitude and latitude coordinates of (a) And The coordinate vector is normalized to (22), For knowledge items Is a topographic communication descriptive text Semantic features extracted from the pre-trained Sentence-BERT model [13] are expressed as (23), The position and topography features are stitched as (24), Normalized by L2 to (25), The heterogeneous knowledge items are converted into a unified vector representation; All normalized vectors With its original knowledge item Is matched and corresponds to the obtained vector library set ; Assume that a query is expressed as Wherein Representing the longitude and latitude coordinates of the current vehicle, Representing a topographical feature description acquired through the open source geographic data platform, Normalized to (26), Extracted as via Sentence-BERT model (27), Splicing And Obtaining a query vector ; Normalized by L2 to (28), (29), Based on the cosine similarity degree, And (3) with The similarity between is quantified as (30), Based on cosine similarity score, the closest to the query vector Knowledge item information is screened from a traffic knowledge base, and a set of indexes of the knowledge items is expressed as (31), Extracting a topographic communication description set from the data And congestion communication description set ; A time mapping variable corresponds to a discrete time period identifier; the set of congestion communication descriptions for a period is represented as (32), MCP service interface continuously receives road side unit RSU, traffic camera and vehicle-mounted terminal report The event stream of the moment is obtained to obtain the current moment Road construction information of (a) Accident reporting And traffic control information ; The road accident information obtained by the controller accessing the current traffic situation through the MCP gateway is expressed as , Controller description set through integrated topography communication Congestion communication description set in rush hour And road burst information Providing rich a priori knowledge support for subsequent DRL decisions.
  6. 6. The space-to-ground integrated vehicle networking task offloading method of a DRL based on RAG enhancement as claimed in claim 3, wherein in the collaborative reasoning task offloading method of RAG-DRL, a RAG enhancement DRL algorithm is adopted to realize knowledge-driven collaborative reasoning task offloading optimization, the algorithm deeply fuses traffic domain expert knowledge with reinforcement learning decision process, and guides an agent to make optimal task scheduling decision in a complex vehicle networking environment through environmental perception factor generation and action space constraint, and the method specifically comprises the following steps: a. Environment awareness factor generation and caching LLM is used as knowledge understanding and reasoning engine to make heterogeneous knowledge And The method needs to be converted into a quantifiable decision factor to be used as the state input of the DRL model, and comprises the following steps: Based on And LLM generated state enhancement triples are represented as (33), Wherein the topography influencing factor Representing topography to base station The comprehensive influence coefficient of communication quality, the smaller the value, the more the topography is to the base station The greater the negative impact of (a) on peak time sensitivity factor A congestion risk factor representing a time-location combination, the value of which is inversely proportional to the negative impact of congestion on task offloading, a sudden accident impact factor The influence coefficient of the emergency on the communication environment is represented, and the value of the influence coefficient is inversely proportional to the negative influence of the emergency on task unloading; the adoption of the structured prompting engineering constraint output format requires that the generated environment perception factors must be in the following condition While introducing multiple rounds of consistency check to query LLM for the same scene multiple times and calculate variance when the variance exceeds a threshold value When adopting average value or conservation estimation strategy; MEC controller adopts RAG result buffer mechanism, in period In the method, knowledge retrieval and factor generation are performed by RAG through MCP gateway, and the result is cached and used for tasks in the period; b. RAG enhanced DRL offline training The problem of RAG-assisted DRL task offloading is modeled as a markov decision process MDP, where the expressions for state space, action space and rewards are as follows: The state space S is that the task scheduling needs to comprehensively consider the vehicle position and speed information, task parameters, base station load, prediction information and state enhancement triplets ; The original state is expressed as (34), Environmental status enhanced by RAG Represented as (35), Action space A, in order to ensure task completeness, must meet the constraint that the cooperative base stations can continuously cover during action selection, select a candidate cooperative base station set The base station in (a); Action Is defined as (36), Rewarding R, namely adopting a rewarding function for sensing environmental risks in combination with RAG quantized environmental factors: In round When a task is successfully completed within the delay constraint, the rewards earned are expressed as (37), When a task fails to be completed within the delay constraint, the penalty obtained is expressed as (38), Wherein the method comprises the steps of And The performance weight and the loss weight of the execution action respectively, 、 、 The weight coefficient is the environmental factor; In state Execute action downwards The obtained bonus function is expressed as (39), Adopting DDQN algorithm to iterate and optimize unloading strategy to obtain maximum expected accumulated rewards, and DDQN algorithm to maintain two neural networks, wherein the strategy network and the target network are used for selecting actions, and the target network is used for calculating a target Q value; Order the Representing a set of candidate scheduling policies, the action of maximum benefit in each state is represented as (40), The calculation formula of the target Q value is expressed as (41), Wherein the method comprises the steps of Representing the parameters of the policy network, Representing the parameters of the target network, Representing a discount factor; The evaluation network selects an action that maximizes the Q value, and the target network calculates the Q value of the action.
  7. 7. The space-to-ground integrated internet of vehicles task offloading method of a DRL based on RAG enhancement of claim 1, wherein the task scheduling method under different traffic events comprises: a. The mountain area communication shadow is that a vehicle runs in the coverage areas of a ground base station, an unmanned aerial vehicle and a satellite, and a task request is generated and forwarded to an inference queue of the MEC controller through the unmanned aerial vehicle closest to the ground base station; Meanwhile, according to feedback information of track prediction service, the vehicle drives away from the coverage of the unmanned aerial vehicle during task processing and is continuously positioned in the coverage of the satellite and the base station; The satellite distributes physical resources for the tasks according to the principle of first-come first-serve and transmits the processed tasks back to the vehicle; b. The method comprises the steps of generating a task request when a vehicle is located in the coverage area of a ground base station and a satellite in a peak period of a weekday and a night peak period; according to road knowledge from RAG, the load of ground base station is saturated continuously during the late peak period of the area, and the vehicle is continuously in the coverage range of unmanned plane, base station and satellite during the task processing period by calling track prediction service; c. the sudden traffic accident is that a task request is generated when a vehicle runs in the coverage areas of a ground base station and a satellite; The RAG acquires real-time traffic situation knowledge, and discovers that the load of surrounding base stations is increased and the signal is interfered by strong signals caused by front sudden traffic accidents, so that tasks are processed locally at the vehicle end.

Description

Space-sky-ground integrated Internet of vehicles task unloading method based on RAG enhanced DRL Technical Field The invention relates to application of an artificial intelligence technology in the field of Internet of vehicles, in particular to an air-space-ground integrated Internet of vehicles task unloading method based on a RAG enhanced DRL. Background With the acceleration evolution of the automatic driving technology, the deep learning task of the vehicle in the links of visual perception, path planning, target detection and the like puts more strict requirements on computing resources and time delay. The vehicle-mounted computing unit has inherent limitations in terms of computing power, storage and the like, and is difficult to stably support real-time execution of a high-precision reasoning task for a long time. Space-Air-Ground Integrated Vehicular Networks, SAGVN) provides continuous coverage and differential computing power support for vehicles by cooperatively utilizing multi-layer heterogeneous network resources such as ground base stations, unmanned aerial vehicle small base stations, low-orbit satellites and the like. However, task offloading in SAGVN requires comprehensive handling of topology rapid evolution, base station load fluctuations, and bursty traffic events on a millisecond time scale, making conventional scheduling strategies inadequate. Deep reinforcement learning (Deep Reinforcement Learning, DRL) has become a mainstream method for constructing a real-time task offloading strategy due to the adaptive optimization capability. However, the strong dynamic and heterogeneous nature of the Internet of vehicles environment makes it faced two key bottlenecks, 1) lack of explicit utilization of knowledge of traffic fields, need extensive exploration to master basic laws, and are high in training cost, 2) slow in response to low-frequency but high-impact emergency situations, and are difficult to make reliable decisions. The large language model (Large Language Model, LLM) provides new technical support for introducing structured knowledge and semantic reasoning capabilities. The LLM-based retrieval enhancement generation (RETRIEVAL-Augmented Generation, RAG) mechanism can unify static expert knowledge (road topography, communication coverage, historical congestion mode) and dynamic real-time information (sudden accidents, short-time congestion and temporary construction) and provide continuously updatable knowledge supplement for DRL, so that adaptability and robustness of an unloading strategy under non-steady and sudden scenes are improved. While RAG techniques offer significant advantages in knowledge fusion and intelligent reasoning, building a RAG-enhanced collaborative reasoning and scheduling framework still faces the following challenges: 1) The deep fusion of traffic knowledge, namely, the knowledge related in the scene of the Internet of vehicles has obvious heterogeneity, including road structure, congestion mode, communication coverage, sudden accidents and the like, and the knowledge has differences in sources, timeliness and expression forms. The prior study such as Ren et al [1] builds a user configuration vector database, hussien et al [2] performs behavior prediction based on a knowledge graph, EACO-RAG framework [3] realizes cross-node knowledge sharing through hierarchical retrieval, but verifies RAG validity, mainly focuses on relatively stable semantic knowledge, and lacks systematic support for unified modeling and depth fusion of multidimensional knowledge such as static expert knowledge, dynamic road events, time-varying communication conditions and the like. 2) And the high-efficiency retrieval of heterogeneous knowledge, wherein the traffic related knowledge comprises geographic coordinates, road topology, link quality description, semantic road condition information, emergency records and the like, and shows remarkable heterogeneity. Park et al [4] propose a search method based on keyword matching, wang et al [5] adopts place name recognition to realize mapping from text to coordinates, zhang et al [6] utilizes a dynamic search mechanism to fuse historical and real-time traffic data, but the methods are difficult to capture fine granularity differences of coordinate continuity and position semantics, and lack the joint search capability of three kinds of knowledge of geographic space, text semantics and dynamic events. 3) The environmental adaptability of the decision of the intelligent agent is improved, DRL depends on a large amount of interactions and trial and error with the environment, and decision lacks generalization capability when facing low-frequency but high-influence events such as accidents, abrupt changes of communication links and the like. Wang et Al [7] adopts depth deterministic strategy gradient (DDPG) to optimize task unloading of the Internet of vehicles, al-Tarawneh et Al [8] propose an online unloading method for context awareness, li