Search

CN-121980291-A - Atmospheric pollutant transmission track clustering analysis method and system

CN121980291ACN 121980291 ACN121980291 ACN 121980291ACN-121980291-A

Abstract

The invention relates to the technical field of GIS, in particular to a method and a system for clustering analysis of atmospheric pollutant transmission tracks, which comprises the following steps: extracting sampling longitude and latitude time and toluene concentration calculation ratio to construct track points, selecting core points to screen space neighborhood, calculating ratio difference and threshold value to filter and form semantic neighbors, satisfying condition clustering to calculate average value and judging pollution source with source spectrum similarity, in the invention, chemical fingerprint is constructed by extracting toluene and benzene concentration calculation ratio to avoid concentration fluctuation interference caused by atmospheric dilution, and (3) implementing double constraint by combining the spherical distance and the ratio difference value, eliminating interference points which are physically adjacent but have different components, constructing a semantic connected set according to the difference sequence to generate a track cluster, reconstructing a homologous transmission path, calculating the mean value in the cluster, matching a source spectrum by using cosine similarity, quantifying the component similarity, locking a pollution source, solving the multi-source mixing problem by using an internal component rule, getting rid of weather parameter dependence, and improving the tracing accuracy.

Inventors

  • LIU QINGWEI
  • ZHANG YUAN
  • GU CHANGJU
  • WANG HUI
  • LI XIAOYAN

Assignees

  • 北京众弈环境科技有限公司
  • 河南省焦作生态环境监测中心

Dates

Publication Date
20260505
Application Date
20260127

Claims (10)

  1. 1. The atmospheric pollutant transmission track clustering analysis method is characterized by comprising the following steps of: S1, acquiring discrete sampling records, extracting longitude values, latitude values, sampling time stamps, toluene concentration values and benzene concentration values, calculating toluene/benzene concentration ratio, and constructing multidimensional track data points; s2, selecting a core capture object from the multidimensional track data points, calculating the spatial physical distance between the core capture object and the data points, screening the data points with the spatial physical distance smaller than the neighborhood radius, and constructing a spatial neighborhood candidate set; S3, acquiring toluene/benzene concentration ratios of the space neighborhood candidate set and the core capture object, calculating absolute differences of toluene/benzene concentration ratios between the core capture object and elements, and generating a characteristic ratio difference sequence; s4, comparing the numerical value in the characteristic ratio difference sequence with a preset characteristic difference threshold, removing candidate points with numerical values larger than the preset characteristic difference threshold in the spatial neighborhood candidate set, reserving candidate points with numerical values smaller than or equal to the preset characteristic difference threshold, and constructing a semantic connected neighbor set; S5, when the number of the semantic connected neighbor sets meets the minimum number of the included points, combining the core capture object and the semantic connected neighbor sets to construct an atmospheric pollutant transmission track cluster, calculating an arithmetic average value of toluene/benzene concentration ratios in the atmospheric pollutant transmission track cluster, comparing the arithmetic average value with a source component spectrum based on cosine similarity, and outputting a pollution source classification result.
  2. 2. The atmospheric contaminant transfer trajectory cluster analysis method of claim 1, wherein the multi-dimensional trajectory data points comprise longitude values, latitude values, sampling time stamps, and toluene/benzene concentration ratios, the spatial neighborhood candidate set comprises a core capture object, adjacent data points having a spatial physical distance less than a neighborhood radius, the feature ratio difference sequence comprises an absolute difference in toluene/benzene concentration ratio between the core capture object and the candidate points, the semantic connected neighbor set comprises retention candidate points having feature difference values meeting feature difference threshold requirements, and the pollution source classification result comprises a category of pollution sources consistent with a source component spectrum comparison.
  3. 3. The atmospheric contaminant transfer trace clustering analysis method according to claim 1, wherein said core capture objects are current process data points sequentially selected in a sampling time sequence among multi-dimensional trace data points; the spatial physical distance is a spherical distance calculated based on longitude values and latitude values; The neighborhood radius is a predetermined spatial distance threshold for defining data points having a geospatial proximity relationship with the core capture object.
  4. 4. The method for clustering analysis of atmospheric pollutant transmission trajectories according to claim 1, wherein the characteristic difference threshold is a preset upper limit of toluene/benzene concentration ratio difference; And the characteristic difference threshold is determined according to the statistical distribution of toluene/benzene concentration ratios in the historical pollution source sample data, and when the absolute difference value of the toluene/benzene concentration ratios of the core capture object and the data points in the space neighborhood candidate set is smaller than or equal to the characteristic difference threshold, the core capture object and the space neighborhood candidate set are judged to have connectivity in the pollution source semanteme.
  5. 5. The method for clustering analysis of the transmission trajectories of the atmospheric pollutants according to claim 1, wherein the specific steps of S1 are as follows: S101, acquiring discrete sampling records through an atmospheric environment monitoring site, performing field analysis and data extraction operations on the records, extracting longitude values, latitude values, sampling time stamps, toluene concentration values and benzene concentration values, and performing standardized integration on the extracted data to generate an environment discrete sampling feature set; S102, calling the environment discrete sampling feature set to obtain a toluene concentration value and a benzene concentration value, calculating a concentration ratio of chemical components representing pollutants, and carrying out association anchoring on the calculated value and an original data record index to generate a toluene/benzene concentration ratio; and S103, searching corresponding longitude values, latitude values and sampling time stamps in the environment discrete sampling feature set based on the toluene/benzene concentration ratio, establishing a mapping relation between the concentration ratio and space-time coordinate dimensions, and carrying out structural recombination and encapsulation on data according to the mapping relation to construct a multidimensional track data point.
  6. 6. The method for clustering analysis of the transmission trajectories of the atmospheric pollutants according to claim 1, wherein the specific steps of S2 are as follows: S201, traversing a set formed by the multi-dimensional track data points, selecting a data item to be processed currently as a core capture object, synchronizing the multi-dimensional track data points in the index set, extracting geographic coordinate values of the two parties and establishing a corresponding association relation, and generating a core capture object coordinate pairing set; s202, invoking the core capture object coordinate pairing set, executing spherical distance measurement operation on the core capture object and the multidimensional track data points according to the spatial distribution characteristics under a geographic coordinate system, quantifying the spatial interval degree between the data points, and generating a spatial physical distance measurement sequence; S203, based on the space physical distance measurement sequence, a preset neighborhood radius threshold value is obtained, the distance measurement value is compared with the neighborhood radius threshold value, data items with the distance value smaller than the neighborhood radius threshold value are screened, aggregation and recombination are carried out on the track data points meeting the conditions, and a space neighborhood candidate set is constructed.
  7. 7. The method for clustering analysis of the transmission trajectories of the atmospheric pollutants according to claim 1, wherein the specific step of S3 is as follows: s301, based on index identifiers included in the space neighborhood candidate set, backtracking and retrieving toluene/benzene concentration ratio values, respectively extracting concentration ratio data of a core capture object and candidate points, establishing a corresponding relation between the ratio data of the core object and the ratio data of the candidate points, and generating a core candidate ratio pairing set; s302, calling the core candidate ratio pairing set, performing numerical difference measurement calculation on pairing data, quantifying the deviation degree of a core capture object and a candidate point on chemical pollutant composition, and generating a concentration ratio deviation measurement; s303, based on the concentration ratio deviation measurement, carrying out serialization recombination and encapsulation on the deviation measurement value according to the original arrangement order of the data points in the space neighborhood candidate set, and constructing a characteristic ratio difference sequence.
  8. 8. The method for clustering analysis of the transmission trajectories of the atmospheric pollutants according to claim 1, wherein the specific step of S4 is as follows: S401, calling the feature ratio difference sequence, introducing a preset feature difference threshold value to serve as a screening reference, executing threshold value judging operation on difference values in the sequence, defining whether the value fluctuation is in a value interval defined by the threshold value, generating a coincidence mark aiming at element positions meeting judging conditions, constructing a screening algorithm based on dynamic weights, and generating a candidate point coincidence evaluation table; s402, performing index mapping and data cleaning on a space neighborhood candidate set based on the candidate point coincidence degree evaluation table, eliminating candidate points marked as invalid in a logic mask, locking and extracting track data items with difference values meeting threshold constraint, and performing queuing recombination on the screened data entities to generate a homogeneity retaining point set; s403, based on the homogeneity reserved point set, performing attribute verification on the remaining data points subjected to space and chemical attribute double verification, performing structural integration and encapsulation on the track points with the homogeneity characteristics, establishing an association set of the core capture object under a target semantic rule, and constructing a semantic connectivity neighbor set.
  9. 9. The method for clustering analysis of the transmission trajectories of the atmospheric pollutants according to claim 1, wherein the specific step of S5 is as follows: S501, calling the semantic connected neighbor set, counting the total number of data items included in the set, introducing a preset minimum inclusion point threshold as a reference boundary for density clustering judgment, performing numerical comparison and verification on the counted number and the minimum inclusion point threshold, extracting core capture object data after the verification number meets a density constraint condition, performing set merging operation with track data items in the neighbor set, integrating data entities with adjacent space and convergent attributes, and generating an atmospheric pollutant transmission track cluster; S502, traversing all track data points packaged in the clusters based on the atmospheric pollutant transmission track clusters, extracting toluene and benzene concentration ratios associated with the data points, constructing a chemical characteristic numerical value group to be processed, executing arithmetic average operation processing on the numerical value group, quantifying and characterizing the integral centralized trend of the clusters on chemical component constitution, establishing a central measurement index reflecting the chemical characteristics of the transmission track, and generating a cluster concentration ratio average value; S503, calling the concentration ratio average value in the cluster to obtain a source component spectrum comprising a plurality of types of pollution source standard characteristic vectors, mapping the standard vectors in the cluster average value and the source component spectrum to the same characteristic space, calculating cosine similarity, judging the category of the pollution source according to the similarity value, and matching corresponding pollution source label information to generate a pollution source classification result.
  10. 10. An atmospheric contaminant transfer trajectory cluster analysis system for implementing an atmospheric contaminant transfer trajectory cluster analysis method according to any one of claims 1 to 9, said system comprising: The track point construction module is used for executing S1, wherein discrete sampling records containing atmospheric environment monitoring data are obtained through an atmospheric environment monitoring site, longitude values, latitude values, sampling time stamps, toluene concentration values and benzene concentration values are extracted from the discrete sampling records, the toluene concentration values are divided by the benzene concentration values to calculate toluene/benzene concentration ratios representing chemical components of pollutants, mapping relations among the toluene/benzene concentration ratios, the longitude values, the latitude values and the sampling time stamps are established, and multidimensional track data points are constructed; the space neighborhood primary screening module is used for executing S2, traversing the set formed by the multi-dimensional track data points to select a core capture object to be processed currently, calling a spherical distance calculation formula to calculate the space physical distance between the core capture object and other multi-dimensional track data points in the set under a geographic coordinate system, carrying out numerical comparison on the space physical distance and a preset neighborhood radius, screening the data points with the space physical distance smaller than the neighborhood radius, and constructing a space neighborhood candidate set; The characteristic difference calculation module is used for executing S3, namely respectively obtaining toluene/benzene concentration ratios of candidate points in the space neighborhood candidate set and the core capture object, calculating absolute differences of the toluene/benzene concentration ratios between the core capture object and the candidate points, and generating a characteristic ratio difference sequence; The semantic connectivity construction module is used for executing S4, wherein each numerical value in the characteristic ratio difference sequence is respectively compared with a preset characteristic difference threshold value, candidate points with the corresponding numerical value larger than the characteristic difference threshold value are removed from the space neighborhood candidate set, and candidate points with the corresponding numerical value smaller than or equal to the characteristic difference threshold value are reserved to construct a semantic connectivity neighbor set; and S5, counting the total number of elements in the semantic connected neighbor set, merging the core capture object and the semantic connected neighbor set to generate an atmospheric pollutant transmission track cluster when the total number of elements meets the preset minimum inclusion point number, calculating the arithmetic average value of toluene/benzene concentration ratios of all data points in the atmospheric pollutant transmission track cluster, carrying out numerical difference measurement on the arithmetic average value of the concentration ratios in the cluster and the standard characteristic value in the source component spectrum, calculating the absolute difference, carrying out normalization processing, and outputting a pollution source classification result.

Description

Atmospheric pollutant transmission track clustering analysis method and system Technical Field The invention relates to the technical field of GIS (geographic information system), in particular to a clustering analysis method and system for an atmospheric pollutant transmission track. Background The technical field of space-time track data management and analysis of a Geographic Information System (GIS) relates to storage management and mining analysis of continuous position data of a moving object changing along with time in a geographic space, and the technical field is characterized in that a database system capable of reflecting the motion rule and spatial distribution characteristics of the object is established by acquiring space-time sequence data of global positioning system coordinates, remote sensing images and ground monitoring sites and utilizing space-time index construction and track compression means, wherein the traditional atmospheric pollutant transmission track clustering analysis method is used for carrying out association analysis by utilizing real-time concentration values acquired by an environment detection unit and wind force and wind direction parameters acquired by weather monitoring equipment, aiming at the problem that deviation exists between a detection point and an actual volatilization source due to the fact that ozone and the like are easily influenced by air current, the prior art is used for estimating the initial volatilization area of the ozone by matching with wind field data in a historical weather database according to point readings of the ground fixed monitoring sites and reversely pushing diffusion paths of pollutants under different weather environments by an atmospheric diffusion model. In the prior art, when the atmospheric pollutant transmission analysis is processed, the real-time concentration reading of a ground monitoring station is mainly relied on to be correlated and deduced by combining wind power and wind direction parameters acquired by meteorological monitoring equipment. The inverse logic which excessively depends on external meteorological parameters and an atmospheric diffusion model often generates accumulated errors due to space-time resolution limitation of wind field data or local microclimate disturbance when facing pollutants such as ozone which are easily influenced by air flow and undergo long-distance floating migration. Trace backtracking is simply carried out according to concentration gradients and wind directions, so that high concentration point positions which are not homologous but are close in space position are easily associated with each other in an error mode, and objective facts that different pollution sources are overlapped in the same area are ignored. In the absence of efficient identification of the nature of the chemical constituents within the contaminant, it is difficult to distinguish whether the test site data originates from direct transmission from a single emissions source or secondary results from a mixture of multiple emissions sources, solely by physical diffusion laws. When the wind speed and the wind direction in the actual environment change rapidly or have complex terrain blockage, the estimation model based on the historical meteorological database cannot accurately restore the real transmission path, so that the initial volatilizing region is positioned to have obvious spatial offset. Only spatial distribution and concentration values are used as clustering basis, the method is extremely easy to be interfered by accidental high-concentration noise points, the generated transmission track contains a large amount of pseudo-associated data, and the actual motion rule and spatial distribution characteristics of specific pollutants are difficult to truly reflect, so that subsequent pollution treatment decisions are misled. Disclosure of Invention In order to solve the technical problems in the prior art, the embodiment of the invention provides an atmospheric pollutant transmission track clustering analysis method, which comprises the following steps: S1, acquiring discrete sampling records, extracting longitude values, latitude values, sampling time stamps, toluene concentration values and benzene concentration values, calculating toluene/benzene concentration ratio, and constructing multidimensional track data points; s2, selecting a core capture object from the multidimensional track data points, calculating the spatial physical distance between the core capture object and the data points, screening the data points with the spatial physical distance smaller than the neighborhood radius, and constructing a spatial neighborhood candidate set; S3, acquiring toluene/benzene concentration ratios of the space neighborhood candidate set elements and the core capture objects, calculating absolute differences of toluene/benzene concentration ratios between the core capture objects and the elements, and generating a