Search

CN-121981620-A - Multi-agent-based adaptive updating decision support method for industrial heritage region

CN121981620ACN 121981620 ACN121981620 ACN 121981620ACN-121981620-A

Abstract

The invention discloses an industrial legacy region adaptive updating decision support method based on multiple agents, which comprises data acquisition and preprocessing, agent attribute and behavior rule setting, multiple agent interactive simulation and result set generation, adaptive evaluation system establishment, reinforcement learning model training and optimal strategy extraction, and scheme database establishment. According to the method, by constructing a building agent and a developer agent, a dynamic interaction process of an industrial heritage region in a functional transformation and development decision process is simulated, so that an optimal updating strategy sequence under different states and external situations is generated by a reinforcement learning model. According to the invention, the self-adaptive evolution and dynamic optimization of the industrial heritage region updating process are realized through the reinforcement learning mechanism, the self-adaptive capacity of the updating scheme to the spatial environment and the economic state change is improved, and scientific decision basis is provided for the reutilization and transformation of different types of industrial heritage.

Inventors

  • SHI YI
  • ZHANG XUN
  • YANG JUNYAN
  • ZHANG YIYANG
  • HU SHUXIAN
  • Sun Tongxu
  • LIU YIFAN
  • JIA ZIHENG
  • LI QIUYING
  • Cui ao

Assignees

  • 东南大学

Dates

Publication Date
20260505
Application Date
20251231

Claims (7)

  1. 1. The multi-agent-based adaptive updating decision support method for the industrial legacy areas is characterized by comprising the following steps of: S1, data acquisition and preprocessing The method comprises the steps of determining a data acquisition boundary according to an industrial heritage region updating range defined by a local government or a planning administration department, extracting the geometric form, roof characteristics, layer numbers and occupied area of a regional building by adopting unmanned aerial vehicle oblique photography and high-resolution remote sensing images, generating building height information by combining digital surface model DSM, acquiring the build-up time, structure type and use function of the building by calling a local house building archive and a real estate registration database, analyzing the integrity of the outer facade and main material characteristics of the building by utilizing street view images and combining a computer vision recognition algorithm, acquiring the distribution of main function facilities, road networks and peripheral service nodes in the regional based on geographic information system GIS platform and POI data, calling enterprise registration archives and lease operation data to extract the use type, lease income, energy consumption level and resident enterprise category, acquiring building safety conditions, pollution load and environment suitability indexes by combining field investigation and environment monitoring data, removing missing or error records after data consistency inspection and abnormal value recognition and manual rechecking, normalizing and standardizing the digital variables, constructing a comprehensive property database by taking building monomers as space units, and forming a three-level economic benefit standard set according to the building physical property state and economic benefit state and economic benefit standard; s2, setting attribute and behavior rules of the agent The intelligent agent constructed by the system comprises two main bodies of a building intelligent agent and a developer intelligent agent, wherein the building intelligent agent attribute comprises 9 indexes of a physical characteristic attribute, an economic operation attribute and a functional state attribute 3, the simulation step length is set to be 1 quarter, the comprehensive adaptation value of the building intelligent agent is updated according to the current state and external constraint in each simulation step length, when the comprehensive adaptation value is more than or equal to 0.85, the building intelligent agent is defined as a high adaptation state, the building maintains the original function and is reserved, when the adaptation value is less than or equal to 0.65 and is less than or equal to 0.85, the building intelligent agent is defined as a medium adaptation state, the local reinforcement or outer elevation repair behavior is triggered, when the adaptation value is less than or equal to 0.40 and is less than or equal to 0.65, the building intelligent agent is defined as a low adaptation state, the building intelligent agent enters into a functional transformation or composite recycling stage, and when the adaptation value is less than or equal to 0.40, the intelligent agent is defined as a failure state, and the whole reconstruction or demolition behavior is triggered; The developer agent attributes comprise investment characteristic attributes, expected benefit attributes and risk preference attributes 3 major categories, namely 8 indexes; the developer agent selects investment strategy behaviors in each simulation step according to the adaptation value of the building agent and the average adaptation value of the area, wherein the investment strategy behaviors comprise continuous investment maintenance, structural transformation investment, function replacement investment and withdraw investment withdrawal, the continuous investment maintenance is executed when the average adaptation value of the area is more than or equal to 0.8 and the return on investment rate is more than or equal to 8 percent, the structural transformation investment is executed when the average adaptation value of the area is more than or equal to 0.8 and the return on investment rate is more than or equal to 5 percent, the function replacement or part withdraw investment is executed when the average adaptation value of the area is more than or equal to 0.8 and the return on investment rate is less than or equal to 5 percent, the structure transformation investment is executed when the average adaptation value of the area is between 0.6 and 0.8, the function replacement investment is executed when the return on investment rate is less than 3 percent and withdraw investment withdrawal is executed when the average adaptation value of the area is between 0.4 and 0.6, the return on investment rate is more than or equal to 3 percent, the function replacement investment is executed when the return on investment rate is more than or equal to 3 percent, the return on investment is 3 percent is executed when the average adaptation value of the area is between the 0.4 and 0.6, the return on investment is less than or equal to 34 percent; In the multi-agent simulation process, building agents dynamically update adaptation values according to investment behaviors of developers, and adjust investment strategies according to gain feedback, wherein a first round of simulation is carried out by taking a building adaptation value threshold value as a basis, a second round of simulation and a later simulation stage thereof are carried out, on the basis of inheriting a first round of results, the quantity proportion of integrated building groups in the four adaptation intervals is in a group decision with the gain feedback of the developers, if the quantity proportion of the building adaptation values is more than or equal to 0.85, the areas keep the current situation and maintain the existing investment strategies, if the quantity proportion of the building adaptation values is more than or equal to 0.85, the areas carry out local transformation and configure corresponding maintainability investment, if the proportion of the adaptation values is less than or equal to 0.65, the areas carry out functional replacement or compound utilization and adjust investment, and if the proportion of the adaptation values is less than or equal to 0.40 is the largest, and when a plurality of interval proportions are the same, the final scheme is determined according to the priority of the reconstruction of the construction of the maintenance > replacement > is maintained, and the gain feedback is adjusted to a regular stage or the same in the gain feedback after the priority is determined, and the gain is adjusted to be equal to the same in the gain strength; S3, multi-agent interactive simulation and result set generation The simulation modeling is carried out on the attributes and behavior rules of the building agent, the developer agent and the simulation modeling to construct a bidirectional interaction mechanism for building state evolution and investment strategy decision, the simulation modeling is established based on a Markov decision process MDP framework, a state set S, a behavior set A, a state transition rule P and a feedback signal R are defined and used for describing the dynamic relationship among building state change, investment behavior selection and external environment feedback, wherein the state set S corresponds to the state variables of the building agent and comprises comprehensive adaptation values, energy consumption levels and gain indexes, the behavior set A corresponds to the behavior variables of the developer agent and comprises continuous investment maintenance, structural transformation investment, function replacement investment and withdraw investment exit, the state transition rule P is used for describing the dynamic change relationship between the comprehensive adaptation values, energy consumption levels and gain indexes of the building agent after different investment behaviors are executed, and when the investment limit, transformation modes or withdraw investment strategies selected by the developer agent are different, the building state is transferred according to set constraint conditions, and the evolution process from the current state S t to the next state S t+1 is formed; Introducing a feedback mechanism based on a reinforcement learning principle in a simulation process, so that a building agent obtains positive, negative or neutral feedback signals according to adaptability change after investment behaviors, wherein a feedback result is used for influencing investment trends and strategy parameters of developer agents in a subsequent period to form environment response data which can be called by a reinforcement learning model, when the building agent executes investment update behaviors, if the increment amplitude of an adaptation value is more than or equal to 0.10, a system judges the investment strategy as positive effective behaviors and gives rewarding feedback, if the decrement amplitude of the adaptation value is more than or equal to 0.10, negative ineffective behaviors are judged and punishment feedback is given, and if the change amplitude of the adaptation value is between-0.10 and +0.10, the punishment or punishment feedback is not triggered; Setting four constraint conditions of morphology, economy, environment and time in a simulation environment by combining the slice basic data and planning limiting conditions in the step S1, wherein the morphology constraint is that the upper limit of the volume ratio after construction transformation is not more than 2.5 and the single body height increase is not more than 20% of the original value, the economy constraint is that the total integrated investment sum of developers is not more than 5 hundred million yuan and the single quarter investment proportion is not more than 25% of the total budget, the environment constraint is that the slice energy consumption increase in a simulation period is not more than 5%, the pollution load index is controlled within a range of +/-10% of a baseline value, the time constraint is that the total simulation period is set to 8 quarters, each quarter is one simulation step, the simulation process adopts a Monte Carlo repeated simulation method to carry out random sampling and iterative simulation, the repeated simulation times are 1000 times, and relevant index time sequence data for calculating the slice comprehensive adaptability index CAI is recorded in each simulation; S4, establishing an adaptability evaluation system The method comprises the steps of (1) constructing a comprehensive adaptability assessment system of an industrial heritage region based on a simulation result set in the step (S3), wherein the assessment system comprises 4 primary indexes of morphological adaptability, economic adaptability, environmental adaptability and social adaptability, wherein the morphological adaptability index reflects the adaptation degree of a space structure after building updating and land layout to control a planning boundary and a volume rate, the economic adaptability index reflects the economic rationality and the income stability of an investment strategy, the environmental adaptability index is used for assessing the degree of influence of updating behaviors on energy and environment of the region, and the social adaptability index reflects the comprehensive influence of the updating behaviors on the vitality and cultural continuity of the region; After the index data is normalized, determining each index weight by adopting a mode of combining an entropy weight method and an analytic hierarchy process AHP, carrying out consistency check CR less than or equal to 0.1, and representing a comprehensive adaptability evaluation function in a mode of weighting and summing each single index, and outputting a comprehensive adaptability index CAI of a patch, wherein the value range is 0-1, when the CAI is more than or equal to 0.8, the patch updating strategy is judged to have high adaptability, when the CAI is less than or equal to 0.5 and less than or equal to 0.8, the patch updating strategy is judged to have medium adaptability, and when the CAI is less than or equal to 0.5, the patch updating strategy is judged to have low adaptability and the investment scheme is required to be optimized again; S5, training the reinforcement learning model and extracting the optimal strategy Constructing an initialized Q-learning reinforcement learning algorithm based on a Markov decision process MDP, defining a state set S to comprise comprehensive adaptation values, energy consumption levels and benefit indexes of building agent state variables, and representing a state vector S t as a combination of the variables at a time step t, wherein the dimension is d s =3; defining action set A as investment strategy behavior of developer agent, a 1 as continuous investment maintenance, a 2 as structure transformation investment, a 3 as function replacement investment, a withdraw investment exit, defining rewarding function R to adopt long-short term combination, namely calculating staged rewards through steps S3 and S4, discretizing state set S to divide 5 equidistant intervals, setting initial value of Q-table to 0, setting training times 1000 to be consistent with simulation repetition times, controlling time step number 8 of each round to be consistent with simulation total period, randomly extracting a group of initial states from multiple groups of result sets generated in step S3 to perform Q-learning algorithm training, selecting action a t according to current state S t by using epsilon-greedy strategy, executing action a t in simulation environment and updating system based on state transition rule of step S3, and calculating rewards CAI t according to step S4; S6, constructing a scheme database The method comprises the steps of extracting a building state matrix to be a building state table based on a plurality of groups of result sets generated in the step S3, extracting an investment strategy matrix to be an investment strategy table, extracting an energy consumption evolution curve to be an energy consumption evolution table, extracting a comprehensive adaptability distribution result to be an adaptability evaluation table, constructing a developer investment strategy sequence table according to an optimal strategy sequence obtained through training of a reinforcement learning model in the step S5, associating data tables through building IDs, establishing indexes for building state adaptation values, investment types, environment suitability scores and strategy types, importing the result table sets extracted in the step S3 and the step S5 into a database in batches through an ETL tool in a CSV format, carrying out data cleaning and format verification, updating the newly generated optimal strategy sequence in the training process of the reinforcement learning model to the strategy sequence table in real time through an API interface, synchronously outputting the scheme database and the reinforcement learning model, distinguishing building function transformation results based on color coding of a GIS platform, and finally generating a standardized decision report containing a staged updating path, an investment and energy consumption curve, a space function adjusting result and the comprehensive adaptability evaluation index based on the scheme database.
  2. 2. The multi-agent-based adaptive update decision support method for industrial legacy areas according to claim 1, wherein S1 specifically comprises the following steps: S1.1, importing an industrial heritage region range defined by a local government or a planning main department into a geographic information system platform, uniformly adopting a CGCS2000 coordinate reference and the same data time point record, generating a unique building coding building_id according to the combination of an administrative division code, a land block number and a building serial number for subsequent data indexing and space operation, calling an unmanned aerial vehicle oblique photographic image, superposing a high-resolution remote sensing image, setting the forward image overlapping degree of the oblique photographic image to be not less than 70%, setting the sideways image overlapping degree to be not less than 60%, extracting a building contour line by adopting an image segmentation and edge detection algorithm, calculating a building height value h by constructing a difference value between a digital surface model DSM and a digital elevation model DEM, recording as h=DSM-DEM, dividing the height value by a reference layer height range by 3.2m to 3.6m according to a preset layer height conversion standard to obtain a building label, performing surface area calculation based on the building contour surface, and outputting a space data set comprising the building contour_id, a geometric form, a roof feature, and a height and the layer number; S1.2, mapping building attribute data and building_id according to address matching rules and a geocoding algorithm through interface services of a real estate registration system and a local house building archive, and extracting and constructing age, structure type and use function fields; loading street view images, detecting and analyzing wall cracks, outer elevation falling and texture of materials by adopting a computer vision recognition model, generating an outer elevation integrity coefficient Fi, limiting the value range of Fi to 0-1, encoding the material category into discrete classification fields, and synchronously writing the discrete classification fields into a building attribute data structure; S1.3, loading POI image layers and road center line data in a GIS platform, constructing a road topological structure and generating a network data set, taking a building centroid as a starting point, road nodes and functional facilities as end points, calculating reachability indexes of buildings to service nodes and forming an OD cost matrix by setting walking time and road grades as impedance parameters, accessing enterprise registration files and lease management data, setting an address merging and fuzzy matching threshold value to be not lower than 0.80, identifying building resident enterprise types, rent income levels and annual unit area energy consumption intensity indexes, establishing a building-enterprise corresponding relation table, combining site investigation records and environment monitoring sensor data, extracting air pollutant concentration and structure safety hazard records according to sampling frequency not lower than 15 minutes, wherein the air pollutants comprise PM2.5, NO 2 and VOCs, calculating building safety indexes and environment suitability parameters Ei, and matching the safety indexes and pollution load indexes with building_id, and generating an expansion attribute set; S1.4, performing field structure checksum format unified processing on the multi-source data generated in S1.1 to S1.3, setting the attribute filling rate threshold to be not lower than 95%, establishing logic consistency rules among layer height, build year, use and enterprise types to screen unreasonable records, adopting a statistical distribution analysis method to detect abnormal values, identifying abnormal data points of the numerical attribute through a 3 sigma discriminant principle or an IQR four-bit interval method, removing records which do not accord with the distribution rules after manual rechecking, performing Min-Max normalization or Z-Score standardization conversion on reserved samples, constructing a relational database and a spatial database structure which take building_id as a main key, and generating a standardized data set according to building physical characteristics, functional operation states and economic benefit dimensions.
  3. 3. The multi-agent-based adaptive update decision support method for industrial legacy areas according to claim 2, wherein S2 specifically comprises the following steps: S2.1, building agent attributes comprise 9 indexes including physical characteristic attributes, economic operation attributes and functional state attributes 3, and the specific indexes are as follows: The 9 attributes are written into a building agent state vector in simulation initialization, the simulation step length is 1 quarter, the fields are refreshed according to the consistency check, abnormal value identification and standardization rule of S1 before each simulation step length starts, the period value is inherited by the blank, the abnormal value is removed and then normalized, the building agent reevaluates the self comprehensive adaptation value according to the three attributes of the current physical characteristics, economic operation and functional state in each simulation step, the response of the behavior is triggered according to the self comprehensive adaptation value, and the comprehensive adaptation value is recorded as And (3) calculating according to weighted combination: wherein w P +w E +w F =1, and w P 、w E 、w F is 0 or more, morphological index And gamma 1 、γ 2 、γ 3 、γ 4 is equal to or greater than 0, norm (x) = (x-x min )/(x max -x min ) ∈ [0,1] represents interval normalization, H i is building height, F i is layer number, D S E {0.6,0.7,0.8,0.9} is the structural endurance coefficient, the economic index is the occupied area Η 1 +η 2 +η 3 =1 and η 1 、η 2 、η 3 are all 0 or more, wherein In order to obtain a lease income score mapped according to the synchronous percentile of the fragment, R i is more than or equal to P75 and is 1.0, P50 is more than or equal to R i < P75 and is 0.7, R i < P50 and is 0.4, Scoring energy consumption intensity; taking the weight of the mixture to be 1.0, Taking the weight of the mixture to be 0.7, Taking 0.4 of energy consumption as a reverse index The meter is used for measuring the number of the wires, In order to score the operational load, Taking the weight of the mixture to be 1.0, Taking the weight of the mixture to be 0.7, Taking 0.4, wherein the function index F i =μ 1 m U +μ 2 m C ,μ 1 +μ 2 =1 and mu 1 、μ 2 are all more than or equal to 0, wherein m U epsilon {1.0,0.8,0.6} is a matching coefficient of application and slice positioning, m C epsilon {1.0,0.8,0.6} is a matching coefficient of the resident industry and a target list; executing actions per threshold after completion of the calculation when When the building is defined as a high adaptability state, the building keeps the original functions and is reserved When defined as medium adaptability state, triggering local reinforcement or facade repair When defined as a low adaptability state, enter a functional transformation or composite recycling stage If the structure is reinforced in the period, the d S gear is lifted and the whole is restored or removed Adding gain of +0.05 to +0.10, and repairing the facade to raise the integrity score of the facade to not lower than 0.7 Adding gain of +0.03 to +0.06, synchronously updating m U ,m C when implementing function transformation Post recalculation To suppress short-term disturbances, the short-term disturbances, H i ,F i , R i ,E i ,O i equal time sequence adopts three-period rolling average value smoothing, when two adjacent periods are adopted When the effective investment behavior is lowered and not recorded, automatically entering a technical physical examination process and limiting the current improvement of the class actions; S2.2, setting the attribute of the developer agent and decision behavior rules, wherein the attribute of the developer agent comprises 8 indexes including investment characteristic attribute, expected profit attribute and risk preference attribute 3, and the specific table is as follows: The 8 attributes are written into the state vector of the developer agent in the simulation initialization, the simulation step length is 1 quarter, the field is refreshed according to the consistency check, abnormal value identification and standardization rule of S1 before each simulation step length starts, the period value is inherited by the blank, the abnormal value is removed and normalized, the developer agent selects investment strategy behaviors in each simulation step length according to the adaptation value of the building agent and the average adaptation value of the area, including continuous investment maintenance, structural transformation investment, function replacement investment and withdraw investment exit, and the strategy readiness index is recorded as And (3) calculating according to weighted combination: Wherein w I +w B +w R =1, and w I 、w B 、w R are equal to or greater than 0, investment ability index Alpha 1 +α 2 +α 3 =1, and alpha 1 、α 2 、α 3 is equal to or greater than 0;S I,t =min{1,I t /I max , a single quarter payable score, lambda e 0,1 is a capital flow ratio score, S T,t =1-norm(T pref ) is a recovery preference reverse score, and norm (x) = (x-x min )/(x max -x min ) e 0,1, a sufficient score index of return Beta 1 +β 2 =1, and beta 1 、β 2 are all 0 or more, wherein S r,t is a yield grading score, when r act,t ≥r exp is 1.0,0.8r exp ≤r act,t <r exp is 0.7, r act,t <0.8r exp is 0.4, S P,t is a predicted recovery period grading score, when T pred is equal to or less than τ1.0, τ < T pred is equal to or less than 1.5 τ is 0.7, T pred is equal to or more than 1.5 τ and 0.4, wherein r act,t is the current predicted actual yield, r exp is the desired yield, T pred is the predicted recovery period, τ is the recovery period threshold, and the risk preference index is equal to or greater than 0 Gamma 1 +γ 2 +γ 3 =1, and Gamma 1 、γ 2 、γ 3 is more than or equal to 0, wherein S ρ,t is a gain deviation tolerance score, delta r t =r act,t -r exp is set, when |Deltar t | is less than or equal to 1.0, rho < |Deltar t | is less than or equal to 1.5rho is less than or equal to 0.7, |Deltar t | > 1.5rho is less than or equal to 0.4, S κ,t is a decision sensitivity score, when 0.5-or less than or equal to 0.8 is less than or equal to 1.0,0.3-or 0.8-or less than or equal to 0.5-or 0.8-or less than or equal to 0.7, kappa <0.3 or kappa >0.9 is less than or equal to 0.4, S ξ,t is a withdraw investment sensitivity score, when 0.3-or less than or equal to 0.6-or less than or equal to 1.0,0.2-or less than 0.3-or 0.8-less than or equal to 0.4-less than or equal to 0.0.0.8-less than or equal to 0.0.0-0.0-4-less. Executing actions per threshold after completion of the calculation when Defined as high readiness, performing continuous investment maintenance when Defined as the middle preparation degree, the construction improvement investment is carried out when the cost is 0.40 less than or equal to Defined as low readiness, performs a function replacement investment when Defining as failure readiness, executing withdraw investment exit, adopting three-period rolling mean smoothing for r act,t and T pred to restrain short-term disturbance, writing parameters of Deltar t 、I t , lambda, kappa, zeta and the like back to state vector at the end of each simulation step, if two adjacent periods are adopted Continuously descending and Δr t <0, automatically entering a conservative strategy, leaving only maintenance or withdraw investment actions to reduce risk exposure; S2.3, in the multi-agent simulation process, the first round of simulation is based on a building comprehensive adaptation value threshold value, namely, when the comprehensive adaptation value is more than or equal to 0.85, the building is defined as a high adaptation state, the building keeps the original function and remains, when the comprehensive adaptation value is more than or equal to 0.65 and is less than or equal to 0.85, the building is defined as a middle adaptation state, the local reinforcement or outer elevation repair behavior is triggered, when the adaptation value is more than or equal to 0.40 and is less than or equal to 0.65, the building enters a function transformation or composite reuse stage, when the adaptation value is less than or equal to 0.40, the building is defined as a failure state, the overall reconstruction or demolition behavior is triggered, the second round and the post simulation stage thereof are defined as a failure state, on the basis of inheritance of a first round result, the quantity proportion of the comprehensive building group is decided with the gain feedback of a developer by the quantity proportion, if the building adaptation value is more than or equal to 0.85, the patch area is kept under the current state and remains the existing investment policy, if the proportion of the building adaptation value is more than or equal to 0.65 and is more than or equal to 0.85, if the proportion of the adaptation value is more than or equal to 0.65 and is most equal to 0.85, the patch area is carried out into the local adaptation state and is configured to the function transformation or equal to 0.85, the patch area is carried out into the function transformation is carried out to the function transformation is more than or equal to 40, and is equal to 40.
  4. 4. The multi-agent-based adaptive update decision support method for industrial legacy areas according to claim 3, wherein S3 specifically comprises the following steps: s3.1, establishing a MDP frame < S, A, P, R > of a Markov decision process based on the attribute and the behavior rule established in the step S2, wherein the state set S is a state vector of the building agent Respectively representing comprehensive adaptation value, energy consumption level and income index, all of which are normalized to [0,1] by the extremely bad method E i (t) is obtained by weighting the form, economy and function three-fold index of the step S2, energy consumption of a unit area is normalized after three-period rolling mean, R i (t) is lease rate of return or operation cash flow return normalized, a behavior set A is an investment strategy set { a 1 ,a 2 ,a 3 ,a 4 } of an agent of a developer, wherein a 1 is continuous investment maintenance, a 2 is structural improvement investment, a 3 is function replacement investment, a 4 is withdraw investment to exit, a state transition rule P is used for describing conditional probability P (S i (t+1)∣s i (t), a; Θ) that is transferred from a current state S i (t) to a next state S i (t+1) under a given behavior a, evolution of the conditional probability P is acted by four types of constraints of form, economy, environment and time in a simulation environment, constraint parameters are taken from a planning boundary, a fund and a pre-calculation upper limit, energy consumption and a base line and total construction period data of the agent of the step S1 and are constraint with a parameter set Θ, wherein the state transition rule P is used for describing single-step investment, the feedback signal R is used for carrying out reinforcement learning on the behavior; S3.2, after each simulation step, changing the adaptive value of the building Performing grading assignment to form instant feedback: Wherein R + >0,r - >0 is positive and negative feedback coefficient, positive feedback corresponds to effective investment and gives rewarding feedback, negative feedback corresponds to ineffective investment and gives punishment feedback, and a neutral interval does not trigger punishment and punishment, wherein R i (t) is written back to strategy parameters of a developer agent to influence investment trend and proportion of the next simulation step; s3.3, setting four constraint conditions of morphology, economy, environment and time in a simulation environment by combining the slice basic data and the planning limiting conditions in the step S1, wherein the morphology constraint is that the volume rate of the slice after transformation is not more than 2.5, the single-quarter height increment is not more than 20% of the original value, the economic constraint is that the total accumulated investment amount is not more than 5 hundred million yuan, the single-quarter investment is not more than 25% of the total budget, the environment constraint is that the slice energy consumption increment is not more than 5%, the pollution load index is controlled within a range of +/-10% of a base line value, the time constraint is that the total simulation period is 8 quarters, and each quarter is a simulation step length; S3.4, under the premise of meeting S3.3 constraint, carrying out random sampling and iterative evolution on a parameter set Θ and external situation variables by adopting Monte Carlo repeated simulation, wherein the repetition number is 1000, each simulation is recorded as follows, a building side comprises an A (b) i (t) sequence, an energy consumption index E i (t) and a benefit index R i (t), a developer side comprises an investment behavior sequence { a t }, current investment benefit rates ract and t, an evaluation side comprises a related index time sequence for calculating a comprehensive adaptability index CAI of a patch, and a shutdown criterion is set so that when fluctuation amplitude of building adaptation values meets the requirement When the two continuous quarters are reached and the adjustment rate of the investment strategy of the developer is lower than 1%, the system judges that the steady state convergence condition is reached and automatically terminates the simulation to generate a plurality of groups of result sets corresponding to different parameter conditions, including building state matrixes, namely all buildings The method comprises the steps of timing sequence matrix, investment strategy matrix, namely cycle-behavior four-quadrant coding sequence and investment quota distribution, energy consumption evolution curve, namely energy consumption track and split statistics of a slice region and a single body, comprehensive adaptability distribution, namely the number proportion of statistics according to the interval (more than or equal to 0.85), [0.65, 0.85), [0.40,0.65), (< 0.40), wherein the result set is used as the input of reinforcement learning model training and optimal strategy extraction, and instant feedback (R i (t)) is used for marking single-step behavior effect, and CAI related sequences are aggregated into long-term target evaluation in step S4.
  5. 5. The multi-agent-based adaptive update decision support method for industrial legacy areas according to claim 4, wherein S4 specifically comprises the following steps: s4.1, constructing a comprehensive adaptability assessment system of the industrial heritage region based on the simulation result set of the step S3, wherein the assessment system comprises 4 primary indexes of morphological adaptability, economic adaptability, environmental adaptability and social adaptability, and secondary indexes and a valued caliber are configured under each primary index, the morphological adaptability index reflects the adaptability degree of a space structure and a land layout after building update to the control of a planning boundary and a volume rate, the economic adaptability index reflects the economic rationality and the income stability of an investment strategy, the environmental adaptability index is used for assessing the influence degree of updating behaviors on the energy and the environment of the region, the social adaptability index reflects the comprehensive influence of the updating behaviors on the vitality and the cultural continuity of the region, the larger the benefit type indicates the better, the cost type is opposite to the benefit type, and the primary index, the secondary index and the attribute and the index are shown in the following table: Converting all secondary indexes into a 0-1 interval, wherein the benefit indexes are linearly mapped to 0-1 according to the minimum value and the maximum value of samples in the same batch by adopting a range standard z= (x-minx)/(maxx-minx), the cost indexes are linearly mapped to the cost indexes by adopting a reverse range standard z= 1- (x-minx)/(maxx-minx), the smaller numerical value is higher, the variables with seasonal fluctuation are firstly subjected to three-period rolling average smoothing and then standardized, the energy consumption and the residence rate are included, when the single loss rate is not more than 10%, the similar median is used for supplementing, and more than 10% of samples do not participate in the current round summarization; S4.2, obtaining objective weight of the normalized secondary index matrix Z= [ Z ij ] by adopting an entropy weight method, defining p ij =z ij /∑ i z ij , calculating an entropy value e j =-k∑ i p ij lnp ij (k=1/lnn) so as to obtain a difference degree d j =1-e j , wherein the objective weight is On the basis, a pair comparison matrix is constructed by adopting an analytic hierarchy process to obtain subjective weights And consistency test is carried out on the judgment matrix, the consistency ratio CR is required to be less than or equal to 0.10 and is effective, the final weights are fused in a product normalization mode, and the comprehensive weights are taken The weights W (1) of the four first-level indexes are determined by the same method and the sum of the weights is ensured to be 1; S4.3, in each statistics period, weighting and summing the standardized scores of the secondary indexes according to the comprehensive weight in each primary index to obtain a primary index score S k =∑ m w k,m z k,m , and then summing according to the primary index weight to obtain a comprehensive adaptability index of the patch The CAI epsilon [0,1] is executed according to the main weight, wherein the CAI is more than or equal to 0.80 and is judged to be highly adaptive, the CAI is more than or equal to 0.50 and is judged to be moderately adaptive, and the CAI is less than or equal to 0.50 and is judged to be low adaptive and a scheme re-optimization flow is triggered; Outputting an evaluation report of each secondary index standardized value, comprehensive weight, primary index score and CAI value, and establishing a one-to-one mapping relation with the simulation sample number of the step S3, wherein CAI is used as a long-term rewarding amount in reinforcement learning training of the step S5, and instant feedback R is used as a short-term return so as to support subsequent optimal strategy extraction and path decision.
  6. 6. The multi-agent-based adaptive update decision support method for industrial legacy areas according to claim 5, wherein S5 specifically comprises the following steps: S5.1, initializing a Q-learning reinforcement learning model based on a Markov decision process MDP, wherein a state set S is a comprehensive adaptation value, an energy consumption level and a benefit index of a building agent state variable, normalizing the comprehensive adaptation value domain benefit index, calculating the energy consumption level according to kWh/m 2 each year, a state vector S t is represented as a combination of the variables in a time step t, the dimension is d s =3, an action set A is investment strategy behavior of a developer agent, an action set A= { a 1 ,a 2 ,a 3 ,a 4 }, a 1 is continuous investment maintenance, a 2 is structure transformation investment, a 3 is function replacement investment, a 4 is withdraw investment exit, a reward function R adopts long-short term combination, namely R t =λCAI t +(1-λ)R t , CAI t is a comprehensive adaptation index calculated in step S4, R t is instant feedback of step S3, the continuous value state set S is divided into [0,0.2 ], [0.2,0.4 ], [0.6,0.8 ], [ 74 ] and five dimensions A|Q|8|the number of simulation training cycles is equal to the total number of simulation training cycles, and the simulation cycles are set to be equal to the total number of times of simulation training cycles of the simulation models in each time to 1000; S5.2 randomly extracting a group of initial states S 0 for training the Q-learning algorithm according to the multi-group result set generated in the step S3, selecting an action a t by using epsilon-greedy strategy according to the current state S t for each time step t=0 to T-1, exploring by randomly selecting the action with probability epsilon to be 0.1, selecting the action with the largest probability epsilon, namely a t =argmax a Q(s t , a), utilizing the action with the largest probability 1 epsilon, executing a developer investment behavior a t in a simulation environment, updating the comprehensive adaptation value, the energy consumption level and the gain index of the building agent based on the step S3 and the step S4, calculating the staged prize r t based on the step S3, updating the rule formula of Q-learning to Q(s t ,a t )←Q(s t ,a t )+α[r t +γmax a' Q(s t+1 ,a')-Q(s t ,a t )],, wherein alpha is the learning rate to be 0.1, gamma is the discount factor to be 0.9, and maxa 'Q (S t+1 , a') is the largest Q-value of the next state S t+1 , calculating the total of the rewards of the times And record the average prize Multiplying the exploration rate epsilon by a decay factor of 0.99 to gradually reduce exploration and increase utilization every 100 rounds of training are completed; S5.3 calculating the average reward rate of change for consecutive 10 rounds of training And extracting an optimal strategy pi * from the converged Q-table, selecting actions according to the optimal strategy pi * for each time step t, and finally forming a developer investment strategy sequence.
  7. 7. The multi-agent-based adaptive update decision support method for industrial legacy areas according to claim 6, wherein S6 specifically comprises the following steps: S6.1, extracting a building state matrix in a plurality of groups of result sets generated in the step S3 as a building state table, extracting an investment strategy matrix as an investment strategy table, extracting an energy consumption evolution curve as an energy consumption evolution table, extracting comprehensive adaptability distribution results as an adaptability evaluation table, and constructing an optimal strategy sequence obtained by training a reinforcement learning model in the step S5 as a developer investment strategy sequence table; S6.2, reading a data table through ETL, processing a missing field by using a mean value in the same simulation ID, marking the record for manual review, finding out an abnormal value which does not accord with a value range, replacing the abnormal value with NULL, automatically generating a data exception report for subsequent manual review by an administrator, importing the cleaned and verified data table into a database in a CSV format, constructing a RESTful API, completing a training round for the reinforcement learning model in the step S5 or generating a group of new optimal strategy sequences for data update, and automatically generating a decision report of the latest version; S6.3, performing full-scale backup on the whole set of scheme database by using a mysqldump tool in the database, outputting the compressed backup file to the AWS S3, creating a field version on a data table, recording all major corrections to historical data or changes of model parameters, and finally generating a standardized decision report containing a staged updating path, an investment and energy consumption evolution curve, a space function adjustment result and a comprehensive adaptability evaluation index based on the scheme database through JasperReports and outputting the standardized decision report in a PDF format.

Description

Multi-agent-based adaptive updating decision support method for industrial heritage region Technical Field The invention belongs to the technical field of intelligent planning and design, and particularly relates to an industrial legacy area adaptive updating decision support method based on multiple intelligent agents. Background Along with the progress of urban development in China from incremental expansion to stock optimization, industrial heritage areas spread over a plurality of old industrial cities are used as space carriers for material witness and industrial civilization in specific historic periods of cities, and improvement and update of the urban heritage areas are increasingly and highly valued. The modification of an industrial legacy area is not a simple demolition or closed protection, but is an adaptive update with high complexity and comprehensiveness. The method has the advantages that the method can not only continue urban vein and show cultural confidence, but also make a disc of land which is used at low efficiency and effectively release urban space potential, optimize urban function layout, and drive economic and social development, so as to realize gorgeous turning from industrial rust belt to life belt. The current transformation of the industrial heritage area has two core limitations on the practical operation level, one is that the industrial heritage update is a complex system engineering involving multiparty interests, the existing decision mode is difficult to balance complex game relations among multiple main bodies, especially the conflict between the building heritage protection and the commercial development appeal, so that the decision process is high in subjectivity and low in transparency, or the development vitality is lost due to over protection, or the heritage value is destroyed due to over development. The other is that the updating effect of the industrial heritage region is to cover the comprehensive benefits of society, environment, culture and other multi-dimensions, the existing assessment method is focused on a single dimension, and the capability of system integration and coupling analysis is lacking. In order to solve the limitation of current reconstruction and update, the method introduces a multi-agent system and a reinforcement learning mechanism, converts the subjective fuzzy game relation into a visual and quantized simulation result, realizes fusion and comprehensive adaptability evaluation of multi-source heterogeneous information, provides data and model-driven scientific and technical support for updating decisions of industrial heritage areas and cities, and improves scientificity and accuracy of industrial heritage protection and reuse in China. Disclosure of Invention In order to solve the defects in the background art, the invention aims to provide the multi-agent-based adaptive updating decision support method for the industrial legacy areas, which realizes the adaptive evolution and dynamic optimization of the updating process of the industrial legacy areas through a reinforcement learning mechanism, improves the adaptive capacity of an updating scheme to the change of the space environment and the economic state, and provides scientific decision basis for the reutilization and transformation of different types of industrial legacy. The technical scheme adopted by the invention is that the multi-agent-based adaptive updating decision support method for the industrial heritage region comprises the following steps: S1, data acquisition and preprocessing And determining a data acquisition boundary according to the updating range of the industrial heritage region defined by the local government or planning authorities. The method comprises the steps of extracting geometric forms, roof features, layers and occupied areas of a residential building by unmanned aerial vehicle oblique photography and high-resolution remote sensing images, generating building height information by combining digital surface models DSM, acquiring building years, structure types and using functions of the building by calling local building files and real estate registration databases, analyzing integrity and main material features of an outer elevation of the building by combining street view images with a computer vision recognition algorithm, acquiring main functional facilities, road networks and distribution of peripheral service nodes in the residential building based on geographic information system GIS platform and POI data, calling enterprise registration files and lease operation data, extracting building using types, lease benefits, energy consumption levels and resident enterprise categories, and acquiring building safety conditions, pollution loads and environment suitability indexes by combining on-site investigation and environment monitoring data. After data consistency check, abnormal value identification and manual rechecking, missing or error records