CN-121981382-A - Urban green space recovery benefit evaluation method and system based on scene sensitive audio-visual fusion model
Abstract
The invention belongs to the technical field of urban environment perception and smart city management, and particularly relates to an urban green space recovery benefit evaluation method and system based on a scene sensitive audio-visual fusion model. The method comprises the steps of synchronously collecting urban green space video and audio data, extracting the visual element duty ratio and three types of sound scene duty ratios, constructing an audio-visual comprehensive characteristic index, respectively establishing linear prediction sub-models aiming at different space types such as parks, residential areas and streets to form a scene sensitivity coefficient matrix, automatically calling corresponding parameters to complete differential prediction according to scene labels during evaluation, and finally generating a recovery benefit distribution thermodynamic diagram and a diagnosis report by combining geographic positions. The method overcomes the limitation that the traditional model cannot distinguish scene differences, and realizes scene division and high-precision intelligent evaluation of urban green space recovery benefits.
Inventors
- ZHANG HAONING
- SHENG QIANQIAN
- Ren Henan
- ZHANG QIANRU
- ZHU ZUNLING
- LI CHAODE
Assignees
- 南京林业大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260112
Claims (10)
- 1. The urban green space restoration benefit evaluation method based on the scene-sensitive audio-visual fusion model is characterized by comprising the following steps of: s100, acquiring an audio-visual data set of an urban green space to be evaluated, wherein the audio-visual data set comprises video data, audio data and corresponding space type labels; S200, extracting a key frame of the video data, and carrying out semantic segmentation on the key frame by adopting a depth visual semantic segmentation model to obtain the pixel duty ratio of a preset visual element class; S300, carrying out spectrum analysis and sound source classification on the audio data to obtain the duty ratio of three sound scenes of natural sound, human sound and mechanical sound; S400, inputting the vision comprehensive characteristic index and the sound scene comprehensive characteristic index into a pre-trained scene sensitive restoring force prediction model, selecting a corresponding sub-model according to the space type, and calculating a restoring force predicted value; And S500, combining the resilience predicted value with geographical position information thereof, performing space mapping and rendering on an electronic map, and generating a green land resilience benefit distribution thermodynamic diagram of an urban scale.
- 2. The urban green space restoration benefit assessment method based on the scene-sensitive audiovisual fusion model according to claim 1, wherein the step S100 specifically comprises: S101, selecting a typical greenbelt place from the range of a city as a target sample place based on a preset space type of the city to be evaluated, wherein the space type comprises parks, residential areas and streets; S102, synchronously acquiring video data and audio data of each target sample by using acquisition equipment; S103, correlating the video data, the audio data, the marked space type labels and the accurate geographic position information acquired by a GPS or geographic information system acquired by each sample, and constructing and forming the audio-visual data set.
- 3. The urban green space restoration benefit assessment method based on the scene-sensitive audiovisual fusion model according to claim 1, wherein the step S200 specifically comprises: S201, extracting key frame images from the video data according to fixed time intervals; s202, processing each key frame image by adopting a semantic segmentation model based on deep learning, classifying each pixel in the image, and identifying and counting the number of pixels belonging to a preset visual element category, wherein the preset visual element category comprises sky, plants, grasslands, buildings or hard pavement, and the semantic segmentation model is DeepLab series or SegFormer model; s203, for each key frame, calculating the proportion of the pixel number of each type of visual element to the total pixel number to obtain the initial proportion of each type of visual element of the key frame; S204, according to the contribution degree of various visual elements to the recovery benefit, weighting and fusing the final pixel duty ratio of various visual elements to generate a visual comprehensive feature index V for comprehensively representing the visual environment features.
- 4. The urban green space restoration benefit assessment method based on the scene-sensitive audiovisual fusion model according to claim 3, wherein the step S300 specifically comprises: S301, performing short-time Fourier transform on the audio data to acquire time spectrum characteristics of the audio data; S302, performing sound source classification on the audio data based on a Mel frequency cepstrum coefficient or a deep neural network sound source classification model, and identifying and separating out audio components belonging to three sound scenes of natural sound, human sound and mechanical sound; S303, respectively calculating the energy duty ratio or the time duty ratio of the natural sound component, the human sound component and the mechanical sound component in the whole audio time length to obtain the natural sound duty ratio, the human sound duty ratio and the mechanical sound duty ratio; S304, carrying out weighted combination on the three types of sound scene duty ratios to construct a sound scene comprehensive characteristic index S, wherein positive weight is given to the sound scene duty ratio with positive effect on the recovery benefit, and negative weight is given to the sound scene duty ratio with negative effect on the recovery benefit.
- 5. The urban green space restoration benefit assessment method based on the scene-sensitive audiovisual fusion model according to claim 4, wherein the pre-trained scene-sensitive restoration force prediction model is obtained by the following steps: collecting the audio-visual data of a plurality of urban green places, organizing subjects for subjective evaluation experiments aiming at the audio-visual data of each place, obtaining corresponding recovery power scoring truth values, and forming a training sample set, wherein each sample comprises a visual comprehensive characteristic index V, a sound scene comprehensive characteristic index S, a space type label T and a recovery power scoring truth value SRRS; Dividing the training sample set into three subsets of parks, living areas and streets according to the space type label T; for each subset of spatial types, constructing a corresponding restoring force prediction equation, as follows: ; Wherein, the For the predicted value of restoring force in the spatial type T, Is a scene sensitivity constant term, As a result of the weighting coefficients of the visual features, The sound scene characteristic weight coefficient; v and S of all samples in the subset are used as independent variables, corresponding SRRS is used as dependent variables, the independent variables are substituted into a restoring force prediction equation, and a linear mixed effect model is adopted for fitting training so as to estimate optimal model parameters under the space type; Training three space types to obtain parameter group , , ) The integrated scene sensitivity coefficient matrix M is stored for indexing and calling according to the input space type label in step S400.
- 6. The urban green space restoration benefit assessment method based on a scene-sensitive audiovisual fusion model according to claim 5, wherein the linear mixed effect model comprises the following formula: ; Wherein, SRRS is the true value of the restoring force score, To characterize the random effects of subject individual differences.
- 7. The urban green space restoration benefit assessment method based on scene-sensitive audiovisual fusion model according to claim 5, wherein the parameter integration of three types of scenes is stored as a scene sensitivity coefficient matrix Comprising the formula: ; Wherein the matrix Stores all parameters of a spatial type corresponding predictor model , , 。
- 8. Urban green space restoration benefit assessment system based on scene-sensitive audiovisual fusion model, applied to the steps of executing the urban green space restoration benefit assessment method according to any one of claims 1-7, characterized in that it comprises: The system comprises an audio-visual data acquisition module, a data acquisition module and a data processing module, wherein the audio-visual data acquisition module is used for acquiring an audio-visual data set of an urban green space to be evaluated, and the data set comprises video data, audio data, a space type tag and geographic position information; the audio-visual characteristic extraction module is used for processing the video data to generate a visual comprehensive characteristic index, and processing the audio data to generate a sound scene comprehensive characteristic index; the scene sensitive type restoring force prediction module is used for inputting the vision comprehensive characteristic index and the sound scene comprehensive characteristic index into a pre-trained scene sensitive type restoring force prediction model, selecting a corresponding sub-model according to the space type, and calculating a restoring force predicted value; And the space visualization and thermodynamic diagram generation module is used for receiving the predicted restoring force value and the corresponding geographic position information thereof, performing space mapping and rendering on the predicted restoring force value on an electronic map, and generating and outputting a green land restoring benefit distribution thermodynamic diagram with a city scale.
- 9. The urban green space restoration benefit evaluation system based on the scene-sensitive audiovisual fusion model according to claim 8, wherein the scene-sensitive restoration force prediction module stores a scene-sensitive coefficient matrix M which comprises three groups of model parameters corresponding to three space types of parks, residential areas and streets; The scene sensitive restoring force prediction module indexes the corresponding parameter group from the matrix M according to the input space type label T , , ; Based on the indexed parameter set, the input visual comprehensive feature index V and the sound scene comprehensive feature index S are combined, and a restoring force predicted value is calculated according to the following formula : 。
- 10. The urban green space restoration benefit assessment system based on the scene-sensitive audiovisual fusion model according to claim 9, further comprising a model training module for acquiring a training dataset comprising a plurality of samples, each sample having a visual integrated feature index V, a sound scene integrated feature index S, a spatial type tag T and a restoration force score truth value SRRS; dividing the training data set into subsets according to the space type labels T; for each space type subset, fitting training is carried out by adopting a linear mixed effect model by taking V and S as independent variables and SRRS as dependent variables to obtain model parameters of the space type , , ; And updating model parameters of all the space types obtained through training into the scene sensitivity coefficient matrix M.
Description
Urban green space recovery benefit evaluation method and system based on scene sensitive audio-visual fusion model Technical Field The invention belongs to the technical field of urban environment perception and smart city management, and particularly relates to an urban green space recovery benefit evaluation method and system based on a scene sensitive audio-visual fusion model. Background The urban green space is used as an important public space for relieving the mental pressure of urban residents and promoting physical and mental health, and the accurate recovery benefit evaluation of the urban green space is a key link for optimizing the layout of the green space and improving the quality of urban living environment and the happiness of residents. Currently, traditional green space assessment methods rely mainly on environmental data of a single dimension, such as visual landscape pictures or acoustic monitoring recordings. And the evaluator uniformly predicts the restoration potential of the green land by using a universal analysis model. However, the method has the remarkable limitation that the real recovery benefit of the green land is deeply regulated by the type of the space scene (such as park, street and living area) to which the green land belongs, and the influence weights of the visual elements and the soundscape elements on psychological recovery in different scenes are systematically different. These differentiated response mechanisms have not been effectively characterized and exploited in conventional unified models. Therefore, the comprehensive and accurate differentiated assessment of the recovery benefits of the diversified urban green space scenes is difficult to perform only by means of single-mode data or a fusion model with fixed weight, and the precision and the space applicability of the assessment results cannot meet the requirements of modern urban fine planning and intelligent management. Disclosure of Invention The invention aims to provide an urban green land recovery benefit evaluation method and system based on a scene sensitive audio-visual fusion model, so as to solve the problem that the conventional method uses the same set of evaluation standards for all types of green lands (such as parks, streets and living areas) to cause inaccurate evaluation results and cannot reflect the real recovery benefits of different environments. The invention realizes the above purpose through the following technical scheme: The invention provides an urban green space recovery benefit evaluation method based on a scene sensitive audiovisual fusion model, which comprises the following steps: s100, acquiring an audio-visual data set of an urban green space to be evaluated, wherein the audio-visual data set comprises video data, audio data and corresponding space type labels; S200, extracting a key frame of the video data, and carrying out semantic segmentation on the key frame by adopting a depth visual semantic segmentation model to obtain the pixel duty ratio of a preset visual element class; S300, carrying out spectrum analysis and sound source classification on the audio data to obtain the duty ratio of three sound scenes of natural sound, human sound and mechanical sound; S400, inputting the vision comprehensive characteristic index and the sound scene comprehensive characteristic index into a pre-trained scene sensitive restoring force prediction model, selecting a corresponding sub-model according to the space type, and calculating a restoring force predicted value; And S500, combining the resilience predicted value with geographical position information thereof, performing space mapping and rendering on an electronic map, and generating a green land resilience benefit distribution thermodynamic diagram of an urban scale. Further, the step S100 specifically includes: S101, selecting a typical greenbelt place from the range of a city as a target sample place based on a preset space type of the city to be evaluated, wherein the space type comprises parks, residential areas and streets; S102, synchronously acquiring video data and audio data of each target sample by using acquisition equipment; S103, correlating the video data, the audio data, the marked space type labels and the accurate geographic position information acquired by a GPS or geographic information system acquired by each sample, and constructing and forming the audio-visual data set. Further, the step S200 specifically includes: S201, extracting key frame images from the video data according to fixed time intervals; s202, processing each key frame image by adopting a semantic segmentation model based on deep learning, classifying each pixel in the image, and identifying and counting the number of pixels belonging to a preset visual element category, wherein the preset visual element category comprises sky, plants, grasslands, buildings or hard pavement, and the semantic segmentation model is DeepLab series or SegForme