CN-121978698-A - Markov decision-based space debris laser ranging method

CN121978698ACN 121978698 ACN121978698 ACN 121978698ACN-121978698-A

Abstract

The invention relates to the technical field of laser ranging, in particular to a space debris laser ranging method based on Markov decision. The method has the advantages that the observation time is divided into discrete time slots, a Markov decision model is built, the reinforcement learning algorithm is combined with training to obtain an optimal task planning strategy, the efficient collaborative observation of the Distributed space debris laser ranging network is realized, the overall efficiency and the resource utilization rate of an observation task are remarkably improved, the planning difficulty faced by the traditional method under the multi-station collaborative and dynamic environment is overcome, the global optimality of the strategy is ensured by maximizing the expected value of accumulated future rewards, the network can adaptively cope with weather changes and target priority fluctuation, and the technical problem that in the existing Distributed-DLR scene, more variables and constraints are introduced due to the fact that sites are Distributed at different places, and the laser ranging effect in the prior art is poor is solved.

Inventors

PI XIAOYU
LI ZHULIAN
ZHAI DONGSHENG
LI YUQIANG

Assignees

中国科学院云南天文台

Dates

Publication Date: 20260505
Application Date: 20260408

Claims (10)

1. The space debris laser ranging method based on Markov decision is applied to a distributed space debris laser ranging network, and the distributed space debris laser ranging network consists of a plurality of ground stations for space debris laser ranging, and is characterized by comprising the following steps: Dividing the time for observing the space debris into a plurality of discrete time slots; Constructing a Markov decision model for space debris laser ranging, wherein the Markov decision model comprises a state space, an action space, a state transition probability and a reward function under each time slot; training the Markov decision model by adopting a reinforcement learning algorithm, and iteratively updating a state-action cost function to be converged by maximizing the expected value of accumulated future rewards so as to obtain an optimal ranging task planning strategy; And generating an observation task sequence of each ground station in the distributed space debris laser ranging network according to the obtained optimal task planning strategy.
2. The space debris laser ranging method according to claim 1, wherein the state space comprises an observation state of each station at a plurality of time slots, the observation state comprising a ranging result of the target debris and an environmental factor relatively independent from the ranging result, the ranging result comprising ranging success and ranging failure and being represented by two different parameters, respectively.
3. The space debris laser ranging method of claim 2, wherein the environmental factors include a set of meteorological parameters, a set of site parameters, and a set of target parameters represented by numerical values.
4. A space debris laser ranging method according to claim 3, wherein the set of target parameters comprises an observed value positively correlated with target debris priority, size and track height, as well as observed completion and remaining visible time length of the target debris.
5. The spatial debris laser ranging method of claim 1, wherein the action space is a set of actions selectable by each ground station for each time slot, the set of actions comprising continuing to observe a current target debris and switching to observe a new target debris that is currently visible.
6. The space debris laser ranging method according to claim 2, wherein the state transition probability is a ranging success probability at a next time calculated from an environmental factor at a previous time.
7. The spatial debris laser ranging method of claim 2, wherein the cumulative future rewards are obtained by introducing a discount factor to adjust the instantaneous rewards of each time slot from the current time to the end of the observation of the distributed spatial debris laser ranging network, and calculating the sum of the instantaneous rewards adjusted for each time slot.
8. The space debris laser ranging method according to claim 7, wherein the instant rewards of each time slot of the distributed space debris laser ranging network are obtained according to rewards calculation of each station, and the rewards of the stations are calculated according to whether ranging is successful or not, specifically as follows: if the distance measurement is successful, the rewards of the site are positively correlated with the observation value and the meteorological parameter set in the target parameter set corresponding to the site; If the distance measurement fails, the rewards of the stations are preset failure rewards, and the value of the failure rewards is smaller than the minimum value of rewards when the distance measurement is successful.
9. The spatial debris laser ranging method of claim 8, wherein the reward calculation for the station introduces a penalty factor positively correlated therewith, the penalty factor being proportional to the effective observation time period for the station from the current time to the end of the observation.
10. The spatial debris laser ranging method of claim 8, wherein the station's reward calculation incorporates a reward factor positively correlated therewith, the reward factor effective when the available observation time of the target debris reaches an available time threshold.

Description

Markov decision-based space debris laser ranging method Technical Field The invention relates to the technical field of laser ranging, in particular to a space debris laser ranging method based on Markov decision. Background With the increasing number of space debris, the space debris laser ranging (Debris LASER RANGING, DLR) technology is used as an extension of the satellite laser ranging (SATELLITE LASER RANGING, SLR) technology, and has important significance for guaranteeing the safety of space environment, one important direction in the DLR research is to coordinate and utilize the observation resources of a plurality of ground stations to construct a Distributed network, specifically, laser is emitted from one station, and other stations receive echo signals from space debris, as shown in fig. 1, the innovative method is called Distributed space debris laser ranging (Distributed-DLR), and has two main advantages that firstly, the effective receiving area of the echo signals is increased by using a plurality of telescopes, and secondly, multidimensional ranging data can be obtained from different directions, so that the accuracy of determining non-cooperative target tracks is improved. In Distributed-DLR observation tasks, the main challenge is real-time target selection and multi-station cooperation, and for traditional single-station observation, a station only needs to generate own observation task plans and execute the observation task plans, however, under the multi-station condition, due to different geographical positions of the station, the transit conditions of the same target are different, and some targets can appear at the same time, so that it is required to determine which is the 'best' target. In addition, local weather can influence the running condition of the station and bring more uncertainty, but in the existing Distributed-DLR scene, the station is Distributed at different places, more variables and constraints are introduced, the problem becomes extremely complex, and the effect of laser ranging by the traditional method is poor. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a space debris laser ranging method based on Markov decision, which solves the technical problem that in the prior Distributed-DLR scene, because stations are Distributed at different places, more variables and constraints are introduced, so that the effect of laser ranging in the prior art is poor. In order to solve the technical problems, the invention provides a space debris laser ranging method based on Markov decision, which is applied to a distributed space debris laser ranging network, wherein the distributed space debris laser ranging network consists of a plurality of ground stations for space debris laser ranging, and the method comprises the following steps: Dividing the time for observing the space debris into a plurality of discrete time slots; Constructing a Markov decision model for space debris laser ranging, wherein the Markov decision model comprises a state space, an action space, a state transition probability and a reward function under each time slot; training the Markov decision model by adopting a reinforcement learning algorithm, and iteratively updating a state-action cost function to be converged by maximizing the expected value of accumulated future rewards so as to obtain an optimal ranging task planning strategy; And generating an observation task sequence of each ground station in the distributed space debris laser ranging network according to the obtained optimal task planning strategy. Preferably, the state space includes an observation state of each station when the station is in a plurality of time slots, the observation state includes a ranging result of the target fragment and an environmental factor relatively independent from the ranging result, and the ranging result includes ranging success and ranging failure and is represented by two different parameters respectively. Preferably, the environmental factors include a set of meteorological parameters, a set of site parameters, and a set of target parameters represented by numerical values. Preferably, the set of target parameters includes an observation value that is positively correlated to target fragment priority, size, and track height, as well as an observation completion and remaining visible time period for the target fragment. Preferably, the action space is a set of actions selectable by each ground station for each time slot, the set of actions including continuing to observe a current target fragment and switching to observing a new target fragment that is currently visible. Preferably, the state transition probability is a ranging success probability at the next time calculated from an environmental factor at the previous time. Preferably, the cumulative future rewards are obtained by introducing discount factors to adjust the instantaneous rewards of the distributed