CN-121982551-A - Estimation method and device for forest resource checking

CN121982551ACN 121982551 ACN121982551 ACN 121982551ACN-121982551-A

Abstract

The present disclosure relates to an estimation method and apparatus for forest resource inventory, the method comprising generating a training data set based on multimodal data, wherein the multimodal data comprises a data set associated with a plurality of site types of a forest area and at least comprises hyperspectral image data, training a deep learning model based on a loss function using the training data set as input data to obtain a trained deep learning model, and processing the multimodal data using the trained deep learning model to output at least one inversion result associated with forest resource inventory.

Inventors

Gong Xuzheng
WU JIHAO
ZHU BING
CHEN DAN
WANG XIANHUI
CHENG LULU
WAN MENG
LIN JIANZHENG
GENG BIN
Feng Baidong
JIA XIA
LI YUANYUAN
MEI DANDAN
ZHU JIAQING
LIN FEN
JIANG CHEN

Assignees

浙江省测绘科学技术研究院
浙江财经大学

Dates

Publication Date: 20260505
Application Date: 20260210

Claims (20)

1. An estimation method for forest resource inventory, comprising: Generating a training data set based on multimodal data, wherein the multimodal data includes a data set associated with a plurality of site types of a forest area and includes at least hyperspectral image data; Training a deep learning model based on a loss function using the training data set as input data to obtain a trained deep learning model, and Processing the multimodal data using the trained deep learning model to output at least one inversion result associated with the forest resource inventory.
2. The method of claim 1, wherein the deep learning model comprises: The shallow feature fusion module is configured to perform feature extraction on the input data to generate a plurality of shallow feature maps corresponding to different scales, and fuse features in each of the plurality of shallow feature maps to generate a plurality of shallow fusion feature maps corresponding to the different scales; a three-dimensional feature extraction module configured to perform three-dimensional feature extraction on the hyperspectral image data in the input data to generate a plurality of three-dimensional feature maps corresponding to the different scales; An attention feature extraction module configured to sequentially apply two different attention layers to the fused input data to generate a dual attention feature map; a deep feature fusion module configured to fuse the plurality of shallow fusion feature maps, the plurality of three-dimensional feature maps, and the dual-attention feature map to generate a deep fusion feature map, and And the regression inversion module is configured to carry out regression inversion on the forest resource check based on the deep fusion feature map.
3. The method of claim 1, wherein the generating a training data set based on multi-modal data comprises: Converting the multimodal data to generate multiple types of raster data sets having the same resolution, wherein the raster data sets represent information associated with multiple site types of different geographic subregions in the forest area, and And cutting the multiple types of raster data sets to generate the training data set.
4. A method according to claim 3, wherein the conversion process further comprises: performing raster conversion on the multi-modal data to generate multiple types of raster data sets; Geometrically fine correcting and pixel level position registering raster data of the multiple types of raster data sets as reference type with other types of raster data sets, wherein the raster data of the reference type has the highest resolution in the multiple types of raster data sets, and The other types of raster data sets after geometric fine correction and pixel level position registration are up-sampled to the resolution of the reference type of raster data using nearest neighbor interpolation to generate the multiple types of raster data sets with the same resolution.
5. The method of claim 4, wherein the trimming process further comprises: cutting the multiple types of raster data sets by using sliding windows to generate the training data set, and Wherein the sliding window has x Pixel size of x and 0.25 X, and x is an integer multiple of 512.
6. The method of claim 2, wherein the different scales comprise a first scale, a second scale, and a third scale.
7. The method of claim 6, wherein the shallow feature fusion module comprises: a first residual convolution layer configured to perform a first convolution process on the input data in a first scale to generate a first shallow feature map SF1 corresponding to the first scale, and fuse features in the first shallow feature map in a manner of adding and multiplying the features to generate a first shallow fused feature map MSF1 corresponding to the first scale; a second residual convolution layer configured to perform a second convolution process on the input data at a second scale to generate a second shallow feature map SF2 corresponding to the second scale, and fuse features in the second shallow feature map in a manner of adding and multiplying to generate a second shallow fused feature map MSF2 corresponding to the second scale, and A third residual convolution layer configured to perform a third convolution process on the input data in a third scale to generate a third shallow feature map SF3 corresponding to the third scale, and fuse features in the third shallow feature map in a manner of adding and multiplying to generate a third shallow fusion feature map MSF3 corresponding to the third scale; Wherein the first scale is smaller than the second scale is smaller than the third scale, and the plurality of shallow fusion feature maps corresponding to the different scales represent local and global relationships between the plurality of site types at different geographic scales.
8. The method of claim 7, wherein the shallow feature fusion module further comprises: A first maximum pooling layer configured to be connected after the first residual convolution layer to perform a first maximum pooling process on the feature map subjected to the convolution process of a first scale to generate the first shallow feature map SF1, and And the second maximum pooling layer is configured to be connected after the second residual convolution layer so as to perform second maximum pooling processing on the feature map subjected to the convolution processing of the second scale to generate the second shallow feature map SF2.
9. The method of claim 8, wherein fusing features in each of the plurality of shallow feature maps in an additive and multiplicative manner comprises: fusing the sum of the addition of the features of each type in the first shallow feature map and the product of multiplication of the features of each type a to generate a first shallow fused feature map; Fusing the sum of the addition of each b types of features in the second shallow feature map with the product of the multiplication of each b types of features to generate the second shallow fused feature map, and Fusing the sum of the addition of each c types of features in the third shallow feature map with the product of multiplication of each c types of features to generate a third shallow fused feature map; wherein 1< a < b < c, and a, b, c are integers.
10. The method of claim 6, wherein the three-dimensional feature extraction module comprises: A first three-dimensional convolution layer configured to perform a convolution process of a first scale on the hyperspectral image data to generate a first three-dimensional feature map 3DSF1 corresponding to the first scale; A second three-dimensional convolution layer configured to perform a second-scale convolution process on the hyperspectral image data to generate a second three-dimensional feature map 3DSF2 corresponding to the second scale, and A third three-dimensional convolution layer configured to perform a third-scale convolution process on the hyperspectral image data to generate a third three-dimensional feature map 3DSF3 corresponding to the third scale; Wherein the first scale is smaller than the second scale is smaller than the third scale, and the plurality of three-dimensional feature maps represent spatial features and spectral features of the forest area at different geographic scales.
11. The method of claim 10, wherein the three-dimensional feature extraction module further comprises: A third maximum pooling layer configured to be connected after the first three-dimensional convolution layer to perform a third maximum pooling process on the feature map subjected to the convolution process of the first scale to generate the first three-dimensional feature map 3DSF1, and And the fourth maximum pooling layer is configured to be connected after the second three-dimensional convolution layer so as to perform fourth maximum pooling processing on the feature map subjected to the convolution processing of the second scale to generate the second three-dimensional feature map 3DSF2.
12. The method of claim 6, wherein the attention feature extraction module comprises: A channel attention layer configured to model correlations between channels of the fused input data to generate a channel attention map; a fourth residual convolution layer configured to be connected after the channel attention layer to perform a fourth convolution process on the channel attention map output by the channel attention layer; A spatial attention layer configured to be connected after the fourth residual convolution layer to model correlations between spatial locations of feature maps output by the fourth residual convolution layer to generate a spatial attention map; A fifth residual convolution layer configured to be connected after said spatial attention layer to perform a fifth convolution process on said spatial attention pattern output by said spatial attention layer, and And a normalization layer configured to connect after the fifth residual convolution layer to perform normalization processing on the feature map output by the fourth residual convolution layer to generate the dual-attention feature map, where the dual-attention feature map is used to assign attention weights to features with high channel and spatial position correlation in the data set associated with the plurality of site types.
13. The method of claim 6, wherein the deep feature fusion module comprises: A first deconvolution layer configured to fuse a third shallow fusion feature map MSF3 corresponding to a third scale, a third three-dimensional feature map 3DSF3 corresponding to the third scale, and the dual-attention feature map as a first fusion input, and perform a first deconvolution process on the first fusion input; a sixth residual convolution layer configured to be connected after the first deconvolution layer to perform a sixth convolution process on the feature map output by the first deconvolution layer; A second deconvolution layer configured to be connected after the sixth residual convolution layer, to fuse a second shallow fusion feature map MSF2 corresponding to a second scale, a second three-dimensional feature map 3DSF2 corresponding to the second scale, and a feature map output by the sixth residual convolution layer, and to perform a second deconvolution process on the second fusion input; a seventh residual convolution layer configured to be connected after the second deconvolution layer to perform a seventh convolution process on the feature map output by the second deconvolution layer, and And a third inverse convolution layer configured to be connected after the seventh residual convolution layer to fuse a first shallow fusion feature map MSF1 corresponding to a first scale, a first three-dimensional feature map 3DSF1 corresponding to the first scale, and a feature map output by the seventh residual convolution layer to be used as a third fusion input, and perform a third inverse convolution process on the third fusion input to generate the deep fusion feature map, wherein the deep fusion feature map represents features associated with a plurality of site types of a forest area.
14. The method of claim 2, wherein the regression inversion module comprises: A first convolution regression layer configured to output a first inversion result associated with a first index of the forest resource inventory based on the deep fusion feature map, and A second convolution regression layer configured to output a second inversion result associated with a second index of the forest resource inventory based on the deep fusion feature map; wherein the first index is a forest resource real-world amount and the second index is a forest resource value amount.
15. The method of claim 14, wherein the training a deep learning model comprises: minimizing the value of the loss function by updating the weights and/or bias parameters of the deep learning model, wherein the loss function is as follows: Loss= where N is the number of training data sets in a batch, For the regression inversion module to output a first inversion result based on an nth set of training data in the set of training data, A second inversion result output for the regression inversion module based on an nth set of training data of the set of training data sets, For a first tag truth value corresponding to a first inversion result based on the output of the nth set of sample patches, A second tag truth value corresponding to a second inversion result based on the output of the nth set of sample patches, and the first tag truth value and the second tag truth value are generated based on a real-world quantity attribute field and a value-world quantity attribute field of forest resource inventory result vector data.
16. The method of claim 1, further comprising: generating a test data set based on the multimodal data; processing the test data set using the trained deep learning model to output test results, and Based on the test results, a precision metric of the trained deep learning model is determined.
17. The method according to any one of claims 1 to 16, wherein, The plurality of site types includes terrain, vegetation, and soil.
18. The method of claim 17, wherein the multimodal data further comprises: high-resolution optical image HRO, synthetic aperture radar image SAR, digital elevation model image DEM, soil type data image Soil of the forest area, and The phases of the multimodal data differ by no more than a predetermined time threshold, the predetermined time threshold comprising one month.
19. An apparatus for forest resource inventory, comprising: One or more processors, and One or more memories in which a computer executable program is stored which, when executed by the processor, performs the estimation method for forest resource inventory according to any of claims 1-17.
20. A computer program product comprising a computer program or instructions, wherein the computer program or instructions, when executed by a processor, implements an estimation method for forest resource inventory according to any one of claims 1-17.

Description

Estimation method and device for forest resource checking Technical Field The present disclosure relates to deep learning technology in the field of artificial intelligence, and in particular, to an estimation method, apparatus and computer program product for forest resource inventory based on the deep learning technology. Background Forest resources are an important natural resource. The change of forest resources has important influence on global carbon circulation, climate change, biodiversity and ecological environment. The forest resource checking has important significance for clearing assets, is beneficial to realizing the property main body and the owner responsibility of all forest resources of the citizen, and optimizes the development and utilization of the forest resources and the system protection pattern. However, the complex mountain forest environment has the characteristics of large relief, continuous forest sheeting and obvious vertical distinction of the forest. Because of the complexity of the ecosystem and the topography layout in such an environment, the activity capability of human beings in the area is limited, and a large-scale monitoring and evaluating mode is difficult to establish. The traditional forest resource investigation mainly depends on manual field investigation, such as topographic map mapping, tree species investigation and the like of a forest area, and has the defects of extremely high manpower and material resources consumption, low efficiency and obvious incapability of meeting the requirement of the current natural resource asset investigation on large-scale and high-frequency forest resource investigation. In addition, limited by information processing capability, fewer methods for inverting the real object quantity and the value quantity of forest resources, such as remote sensing images, can be comprehensively utilized. Therefore, an intelligent method for checking forest resources based on multi-mode data fusion is needed to improve efficiency and accuracy of checking forest resources. Disclosure of Invention In view of this, the present disclosure provides a method, apparatus, and computer program product for forest resource inventory. According to one aspect of the disclosure, an estimation method for forest resource inventory is provided, comprising generating a training data set based on multimodal data, wherein the multimodal data comprises a data set associated with a plurality of site types of a forest area and at least comprises hyperspectral image data, training a deep learning model based on a loss function using the training data set as input data to obtain a trained deep learning model, and processing the multimodal data using the trained deep learning model to output at least one inversion result associated with the forest resource inventory. In addition, according to an embodiment of the disclosure, the deep learning model includes a shallow feature fusion module configured to perform feature extraction on the input data to generate a plurality of shallow feature maps corresponding to different scales and fuse features in each of the plurality of shallow feature maps to generate a plurality of shallow fusion feature maps corresponding to the different scales, a three-dimensional feature extraction module configured to perform three-dimensional feature extraction on the hyperspectral image data in the input data to generate a plurality of three-dimensional feature maps corresponding to the different scales, an attention feature extraction module configured to sequentially apply two different attention layers to the fused input data to generate a dual attention feature map, a deep feature fusion module configured to fuse the plurality of shallow fusion feature maps, the plurality of three-dimensional feature maps, and the dual attention feature map to generate a deep fusion feature map, and a regression inversion module configured to perform regression inversion on forest inventory based on the deep fusion feature map. For example, the different scales include a first scale, a second scale, and a third scale. In addition, according to the embodiment of the disclosure, generating the training data set based on the multi-modal data comprises converting the multi-modal data to generate multiple types of raster data sets with the same resolution, wherein the raster data sets represent information associated with multiple land types of different geographic subregions in the forest region, and cutting the multiple types of raster data sets to generate the training data set. For example, the conversion process further includes raster converting the multi-modal data to generate multiple types of raster data sets, geometrically fine correcting and pixel-level position registering raster data of a reference type from among the multiple types of raster data sets with a highest resolution of the multiple types of raster data sets, and upsampling the other typ