CN-122022044-A - Urban surrounding land utilization optimization method based on deep reinforcement learning

CN122022044ACN 122022044 ACN122022044 ACN 122022044ACN-122022044-A

Abstract

The invention relates to a method for optimizing urban surrounding land utilization based on deep reinforcement learning, belonging to the technical field of urban science. The method comprises the steps of inputting pixel characteristics in an original land type image into a constructed state encoder based on graph sampling to obtain feature information of points to be optimized and serve as state information of the points to be optimized, setting rewarding items to make decisions on optimization types of areas to be optimized, calculating dominant functions according to values of the optimization types, adjusting optimization strategies based on the dominant functions, and carrying out cooling amplitude prediction by using a trained space dubin model after the optimization is finished so as to obtain land utilization optimization results based on cooling amplitude prediction results and preset landscape indexes. The method aims at solving the technical problems that the prior art is difficult to directly process the high-dimensional continuous remote sensing feature space and has obvious defects in modeling the multi-level space semantic relationship.

Inventors

WANG GUOZHU
ZHANG YAPING
CHEN XU

Assignees

云南师范大学

Dates

Publication Date: 20260512
Application Date: 20260202

Claims (5)

1. The urban surrounding land utilization optimization method based on deep reinforcement learning is characterized by comprising the following steps of: S1, constructing a state encoder based on graph sampling; s2, inputting pixel characteristics in an original land type image into the state encoder to obtain aggregated point characteristic information to be optimized output by the state encoder, and taking the characteristic information as state information of points to be optimized; s3, according to the state information of the point to be optimized, setting a reward item to make a decision on the optimization type of the area to be optimized, calculating a dominance function according to the value of the optimization type, and adjusting an optimization strategy based on the dominance function; s4, after repeating S2-S3 of all pixel characteristics in the area to be optimized, finishing optimization; And S5, after the optimization is finished, performing cooling amplitude prediction by using a trained space dubin model so as to obtain a land utilization optimization result based on the cooling amplitude prediction result and a preset landscape index.
2. The urban surrounding land use optimizing method based on deep reinforcement learning according to claim 1, wherein the S1 specifically is: The state encoder for graph sampling comprises four SAGEConv functions, wherein two SAGEConv functions are connected with ReLu activation functions and are respectively used for carrying out dimension transformation on dynamic features and static features, the dynamic features and the static features after dimension transformation are spliced to obtain aggregate state features, the other two SAGEConv functions are strategy heads and value heads, the strategy heads and the value heads comprise linear transformation functions and ReLu activation functions and are used for receiving the aggregate state features and respectively outputting optimization types and values corresponding to the optimization types.
3. The urban surrounding land use optimizing method based on deep reinforcement learning according to claim 1, wherein the S2 specifically is: S21, starting layer-by-layer outward optimization from a main urban area, and optimizing the land type in the area to be optimized by randomly selecting the optimization sequence of points to be optimized in the layer; S22, after obtaining a point to be optimized, using graph sampling to aggregate neighborhood node characteristic information within a third order of the point to be optimized, wherein the neighborhood node characteristic information within the third order comprises static characteristics and dynamic characteristics, the static characteristics are composed of multi-channel characteristic information, the dynamic characteristics are composed of land type characteristic information which dynamically changes, the static characteristics and the dynamic characteristics are combined to obtain final characteristics of the point to be optimized, and the expression is as follows: ; Wherein, the Is the first The final characteristics of the individual optimization points, As a feature of the static state, As a dynamic characteristic of the device, Representation to be normalized Or (b) The output is a new expression of the vector, Representing the presentation to be Or (b) And splicing the corresponding new vector expressions to obtain final characteristics.
4. The urban surrounding land utilization optimizing method based on deep reinforcement learning as claimed in claim 3, wherein in the process of aggregating neighborhood node characteristic information within third order of point to be optimized by using graph sampling, the method is characterized in that The aggregate expression of the order is: ; Wherein, the Is the first The aggregate character of the steps, Represent the first A neighborhood node set of individual nodes; representing a neighborhood range of node feature aggregation; aggregation functions for nodes; is an activation function; is the first Trainable parameters of the order.
5. The urban surrounding land use optimizing method based on deep reinforcement learning according to claim 4, wherein the S3 specifically is: S31, constructing a reward function by setting a reward item, wherein the expression is as follows: ; Wherein, the As a function of the total prize, 、、、 Are weight coefficients for the bonus items, wherein, And The rewarding item is set based on night light and population density angle, and the expression is: ; ; ; ; ; In the formula, 、 Night light value and central pixel neighborhood effective mask region respectively representing central pixel The maximum value of night light in the range, 、 Population density values respectively representing center pixels and effective mask areas of neighborhood of center pixels The population density maximum in the range, 、 Respectively introducing night lamplight and population density normalized values of the central pixels Night light and population density integrated intensity term as central pixel and according to And type change case definition If the type is changed, the information of the type, If the type is not changed, ; With normalized values of the central picture element 、 Structure multiplicative gating The expression is: ; In the formula, 、 The power indexes are respectively gating power indexes corresponding to night lamplight and population density; Wherein, the The method is based on a rewarding item set by a space durian model, and only gives positive rewards when temperature reduction occurs and conversion is determined, wherein the expression is as follows: ; ; ; ; In the formula, Is the temperature change of the center picture element, Is a space weight matrix, and the microcosmic expression form is , For a land use type feature matrix, The temperature change of the neighbor pixel being the center pixel, And (3) with Representing the direct effect and spatial overflow effect parameters respectively, Representing the coefficients of the spatial autoregressive, As an error term, The number of neighborhood pixels representing the center pixel, Representing the characteristic change quantity of the land utilization type, And The direct effect and the indirect effect brought to the temperature by the change of the central picture element respectively, Responding to the weight coefficient for the neighborhood; Wherein the bonus item The method consists of three parts, namely a communication polymerization index CPI, a concentrated discrete index CDI and a morphology regulation index FRI, wherein the expression is as follows: ; ; ; ; In the formula, Neighborhood effective mask region representing current center pixel Intra candidate type The corresponding number of connected plaques is set up, Representing candidate types The total area of the corresponding connected patches is, Representing candidate types The total circumference of the corresponding connected plaque, Representative of Is effective area of (a); S32, calculating a merit function according to the value of the optimization type, so as to adjust an optimization strategy based on the merit function, and sequentially transforming the positions of points to be optimized according to an optimization sequence until the whole optimization process is finished, wherein the adjustment optimization strategy specifically comprises: ; ; Wherein, the For the objective function of the policy update, In order to take the form of a desired, Estimating a value for the GAE merit function; Is a super parameter controlling the update amplitude; representing a new policy used for type decisions, Representing old policies used by type decisions, where Representation of The state of the region to be optimized in the state step, Representation of Type selection under the state step; Representing the probability ratio of new strategy to old strategy Based on the state space and the action space, wherein the state space is a vector space formed by superposition of multi-channel remote sensing characteristic data and ground data, the action space is discrete selection of land utilization types, Will be by shear probability ratio Limits are placed within the interval [1- ε, 1+ε ] to constrain the update amplitude of the policy.

Description

Urban surrounding land utilization optimization method based on deep reinforcement learning Technical Field The invention relates to a method for optimizing urban surrounding land utilization based on deep reinforcement learning, belonging to the technical field of urban science. Background The distribution problem of land cover is essentially a multi-objective combinatorial optimization problem, usually solved by global search algorithms such as Simulated Annealing (SA), genetic Algorithm (GA), particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO). However, the method keeps the number of the existing land utilization types unchanged, only can carry out non-dynamic adjustment on the land utilization type distribution, can only optimize relatively less data, and is difficult to meet the actual demands. To solve this problem, a search algorithm based on MACO algorithm is proposed, but the research focuses more on the surface features, ignoring the comprehensive features of urban dynamics. When the objective function is constructed, the consideration of the surface temperature and three indexes is mainly considered, and the social and economic considerations are lacked. Another major challenge in land planning is the huge solution space, the number of pixels contained in the high resolution raster image of any city can easily reach millions or even tens of millions, bringing about a larger land-type combined space, and the land planning problem is also translated into a problem of how to find the best optimization position and optimization type within the huge combined space. In such complex spatial decisions, the traditional optimization methods such as MACO discretization optimization mechanism are difficult to directly process the high-dimensional continuous remote sensing feature space, and have obvious defects in modeling multi-level spatial semantic relations. Disclosure of Invention The invention aims to provide a city peripheral land utilization optimization method based on deep reinforcement learning, and aims to solve the technical problems that the prior art is difficult to directly process a high-dimensional continuous remote sensing feature space and obvious defects exist in modeling a multi-level space semantic relation. In order to achieve the above purpose, the technical scheme of the invention is that a city peripheral land utilization optimization method based on deep reinforcement learning is provided, the method introduces graph sampling to effectively capture global or local space information, comprehensively considers cooling, layout and economic development coordination design rewarding functions, and obtains a final optimization result by means of the characteristic of stable optimization of a near-end strategy optimization method, and the method comprises the following steps: S1, constructing a state encoder based on graph sampling; s2, inputting pixel characteristics in an original land type image into the state encoder to obtain aggregated point characteristic information to be optimized output by the state encoder, and taking the characteristic information as state information of points to be optimized; s3, according to the state information of the point to be optimized, setting a reward item to make a decision on the optimization type of the area to be optimized, calculating a dominance function according to the value of the optimization type, and adjusting an optimization strategy based on the dominance function; s4, after repeating S2-S3 of all pixel characteristics in the area to be optimized, finishing optimization; And S5, after the optimization is finished, performing cooling amplitude prediction by using a trained space dubin model so as to obtain a land utilization optimization result based on the cooling amplitude prediction result and a preset landscape index. Optionally, the S1 specifically is: The state encoder for graph sampling comprises four SAGEConv functions, wherein two SAGEConv functions are connected with ReLu activation functions and are respectively used for carrying out dimension transformation on dynamic features and static features, the dynamic features and the static features after dimension transformation are spliced to obtain aggregate state features, the other two SAGEConv functions are strategy heads and value heads, the strategy heads and the value heads comprise linear transformation functions and ReLu activation functions and are used for receiving the aggregate state features and respectively outputting optimization types and values corresponding to the optimization types. Optionally, the S2 specifically is: S21, starting layer-by-layer outward optimization from a main urban area, and optimizing the land type in the area to be optimized by randomly selecting the optimization sequence of points to be optimized in the layer; S22, after obtaining a point to be optimized, using graph sampling to aggregate neighborhood node characteristic