CN-121979869-A - Enterprise ESG data processing method, device, equipment and medium
Abstract
The invention provides an enterprise ESG data processing method, device, equipment and medium, which comprise the steps of collecting ESG basic data of an enterprise from a set data source interface, managing the collected ESG basic data based on an AI large model, storing the managed data by adopting a distributed layered storage architecture, and solving the technical problems of low collection efficiency, poor data quality, non-uniform caliber, inflexible storage, difficult guarantee of data safety and incapacitation of decision in traditional ESG data processing by a full-flow design of automatic collection, AI intelligent management, distributed layered storage and authority display based on user authority display data.
Inventors
- LI NING
- XIE QI
- WANG ZHONGLING
- CHEN HONGHAO
Assignees
- 华福证券有限责任公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251229
Claims (10)
- 1. A method for processing ESG data of an enterprise, comprising the steps of: s1, acquiring ESG basic data of an enterprise from a set data source interface; s2, managing the acquired ESG basic data based on the AI large model, wherein the method comprises the following steps: Constructing basic resources, and establishing an industry-enterprise-index three-level feature library, wherein industry-level features comprise industry codes, index codes, data distribution rules and abnormal judgment thresholds, enterprise-level features comprise enterprise IDs, historical data sequences, trend features and personalized thresholds, and index-level features comprise index codes, calculation logic, associated index lists and associated constraint rules; performing hierarchical anomaly detection, performing industrial-level feature detection, enterprise-level feature detection and index-level feature detection through an AI large model, and generating an anomaly data report containing anomaly level, anomaly type and judgment basis; Performing exception and missing data complement and correction, and generating complement data and confidence scores through internal time sequence reasoning complement and external industry complement and policy constraint correction; Realizing data caliber unification, mapping the nonstandard expression to a standard index through text vector conversion and semantic matching, and setting a similarity threshold value to be 0.85; Building a four-dimensional knowledge graph, building a four-dimensional association structure of ESG data-business scene-policy standard-industry label, and storing the four-dimensional association structure in a Neo4j graph database; S3, storing the treated data by adopting a distributed layered storage architecture; And S4, displaying the data based on the user permission.
- 2. The method for processing ESG data of an enterprise of claim 1, wherein S1 is specifically: initializing interface authentication, reading authentication information from a configuration file, finishing identity verification through an authentication interface, acquiring access credentials and storing the access credentials in a Redis cache, setting expiration time and starting a timing refreshing task; Constructing a request parameter, defining a request parameter structure body containing basic parameters and screening parameters, checking parameter data types, value ranges and the integrity of the necessary-to-be-filled word segments through a checking function, and serializing the checked parameters into a JSON character string; Performing paging or batch acquisition, for the paging acquisition, calculating the total page number according to the total number totalCount and each page number pageSize, and circularly initiating the request until the acquisition is completed; carrying out data receiving and analysis, judging a data format according to the response header Content-Type, extracting enterprise ID, index name, index numerical value, statistics time and data source core fields, and forming a standardized basic data structure; And executing data integrity check, performing field level check and magnitude level check, generating a check report and performing data management on the checked data.
- 3. The method for processing enterprise ESG data as claimed in claim 2, it is characterized in that the method comprises the steps of, The paging collection specifically comprises: Initiating an initial request by carrying parameters of page=1 and pageSize =100, and acquiring first page data and total number totalCount; Calculate the total number of pages according to formula totalPage = ceil (totalCount/pageSize); starting to circularly initiate requests from page=2, wherein each request interval is set to be 500ms until page > totalPage; Storing the current page number in a Redis cache after each acquisition is completed, and continuously acquiring from the cache page number after restarting The data integrity check includes: Checking whether the enterprise ID, the index name, the index value and the statistics time core field are missing, checking whether the statistics time format is a set format, whether the index value is a numerical value type and whether the enterprise ID is a set fixed length character string; And (3) checking the order of magnitude, namely calculating the difference rate of the number of collected products of the number of total products of the number of collected products/the number of total products of the number of total products, triggering an alarm and automatically starting re-collection if the difference rate is more than 5%, and recording difference information if the difference rate is less than or equal to 5%.
- 4. The method for processing ESG data of an enterprise of claim 1, wherein the hierarchical anomaly detection in S2 specifically includes: calculating the quantile of the data to be detected in the industrial data distribution through a DeepSeek-67B model, and marking the data as 'industrial abnormality' if the quantile is lower than 5% or higher than 95%; the enterprise-level feature detection comprises the steps of decomposing a historical data sequence into a trend item, a season item and a residual item through an LSTM algorithm, and comparing deviation of current data and the trend item to judge abnormality; And (3) detecting index-level characteristics, namely verifying logic consistency among indexes through an AI large model, and identifying logic conflict and weak association abnormality.
- 5. The method for processing ESG data of claim 1, wherein the data complement and correction in S2 includes: Internal time sequence reasoning complementation, namely predicting a missing value based on at least 3 pieces of continuous historical data through an AI large model, calculating a 95% confidence interval, and marking the confidence interval as low confidence if the confidence interval span exceeds 50% of the fluctuation range of the historical data; The external industry complements the standard, namely 10-20 set enterprises are screened based on the characteristics of industry, scale, income, profit margin, region and patent number multidimensional feature cluster through DeepSeek-67B model, and weighted average complement suggested values are calculated; policy constraint correction, namely retrieving relevant policy terms from a policy knowledge base, and taking the policy constraint conditions as hard rules for complement correction.
- 6. The method for processing ESG data of an enterprise of claim 1, wherein unifying the data apertures in S2 includes: slicing the document according to chapters and paragraphs by using an AI large model, wherein the slice size is controlled within 500 words; Extracting index names, values, units, calculation caliber descriptions and five core elements of a statistical period from a slice containing the values; Converting the index name and the calculation caliber description into text vectors through Hugging Face Transformers; Performing similarity calculation on the text vector and a standard index vector in the ESG semantic map, automatically completing mapping when the similarity is more than or equal to 0.85, and triggering manual confirmation when the similarity is less than 0.85 by marking as 'caliber fuzzy'; the building of the four-dimensional knowledge graph in the step S2 comprises the following steps: the data layer is used for storing the enterprise ID, index codes, numerical values and statistical time as data nodes into Neo4j; Establishing a software development and test, manufacturing and supply chain management service scene node, and associating ESG index nodes through a service scene-index relation edge; a policy standard layer, wherein policy clauses are used as policy nodes, and the policy nodes are related to data layer index nodes through the relationship of policy-setting index and policy-setting constraint condition; Establishing an industry standard node, and associating the data layer index nodes through the relation of enterprise index and industry standard.
- 7. The method for processing ESG data of an enterprise of claim 1, wherein S3 is specifically: Establishing a layered storage system, wherein a hot data layer adopts MySQL and Redis to store characteristic data in nearly 1 year, a warm data layer adopts MySQL partition table to store process data in 1-3 years, and a cold data layer adopts MinIO objects to store result data in more than 3 years; Executing data classification and storage mapping, storing feature data into a MySQL feature data table, storing knowledge graph data into Neo4j, storing process data into the MySQL process data table, and storing result data into MinIO objects; Ensuring data writing consistency, wherein MySQL adopts a transaction mechanism, neo4j performs association verification, minIO generates MD5 verification codes; the step S5 specifically comprises the following steps: Establishing an authority system, defining five roles of a system administrator, an operation and maintenance person, a business analyst, an enterprise user and an auditing person, and realizing authority management and control through a JWT token and role-based access control; The method has the advantages of providing functions of data query, visual analysis, early warning management and data derivation, adopting a Vue 3.0 framework at the front end, adopting ECharts 5.4.4 components for visualization, and adopting a FastAPI framework at the rear end.
- 8. An enterprise ESG data processing device, comprising: the data acquisition module acquires ESG basic data of enterprises from a set data source interface; The data governance module governs the acquired ESG basic data based on the AI big model, and comprises: Constructing basic resources, and establishing an industry-enterprise-index three-level feature library, wherein industry-level features comprise industry codes, index codes, data distribution rules and abnormal judgment thresholds, enterprise-level features comprise enterprise IDs, historical data sequences, trend features and personalized thresholds, and index-level features comprise index codes, calculation logic, associated index lists and associated constraint rules; performing hierarchical anomaly detection, performing industrial-level feature detection, enterprise-level feature detection and index-level feature detection through an AI large model, and generating an anomaly data report containing anomaly level, anomaly type and judgment basis; Performing exception and missing data complement and correction, and generating complement data and confidence scores through internal time sequence reasoning complement and external industry complement and policy constraint correction; Realizing data caliber unification, mapping the nonstandard expression to a standard index through text vector conversion and semantic matching, and setting a similarity threshold value to be 0.85; Building a four-dimensional knowledge graph, building a four-dimensional association structure of ESG data-business scene-policy standard-industry label, and storing the four-dimensional association structure in a Neo4j graph database; The data storage module is used for storing the treated data by adopting a distributed layered storage architecture; and the data display module is used for displaying data based on the user permission.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when the program is executed by the processor.
- 10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1 to 7.
Description
Enterprise ESG data processing method, device, equipment and medium Technical Field The present invention relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for processing ESG data of an enterprise. Background The current ESG data management of enterprises has three major pain points, namely, one with scattered data sources, including enterprise annual reports, supervision notices, third party rating reports, sensor data, social media public opinion, enterprise self rewards and punishments and the like, the traditional acquisition mode is difficult to realize the whole coverage and unified format, two with uneven data quality, the problems of missing, redundancy, contradiction, semantic ambiguity and the like, the manual cleaning efficiency is low and is easily influenced by subjective factors, and three with single data presentation form, more static reports, the personalized analysis requirements of different users (management layers, supervision authorities and investors) cannot be met, and the associated logic and potential risks behind the data are difficult to mine. In the prior art, ESG data governance multi-dependency rule engine and traditional machine learning algorithm have the defects of poor adaptability across source data, weak semantic understanding capability, insufficient visual interactivity and the like. The AI large model has strong natural language processing, multi-modal understanding and logical reasoning capability, and provides technical possibility for solving the problems. Disclosure of Invention The invention aims to solve the technical problems of low acquisition efficiency, poor data quality, non-uniform caliber, inflexible storage, difficult guarantee of data safety and incapacitation of decision in traditional ESG data processing of enterprises through the full-flow design of automatic acquisition, AI intelligent management, distributed layered storage and authority display. In a first aspect, the present invention provides a method for processing ESG data of an enterprise, including the following steps: s1, acquiring ESG basic data of an enterprise from a set data source interface; s2, managing the acquired ESG basic data based on the AI large model, wherein the method comprises the following steps: Constructing basic resources, and establishing an industry-enterprise-index three-level feature library, wherein industry-level features comprise industry codes, index codes, data distribution rules and abnormal judgment thresholds, enterprise-level features comprise enterprise IDs, historical data sequences, trend features and personalized thresholds, and index-level features comprise index codes, calculation logic, associated index lists and associated constraint rules; performing hierarchical anomaly detection, performing industrial-level feature detection, enterprise-level feature detection and index-level feature detection through an AI large model, and generating an anomaly data report containing anomaly level, anomaly type and judgment basis; Performing exception and missing data complement and correction, and generating complement data and confidence scores through internal time sequence reasoning complement and external industry complement and policy constraint correction; Realizing data caliber unification, mapping the nonstandard expression to a standard index through text vector conversion and semantic matching, and setting a similarity threshold value to be 0.85; Building a four-dimensional knowledge graph, building a four-dimensional association structure of ESG data-business scene-policy standard-industry label, and storing the four-dimensional association structure in a Neo4j graph database; S3, storing the treated data by adopting a distributed layered storage architecture; And S4, displaying the data based on the user permission. In a second aspect, the present invention provides an enterprise ESG data processing apparatus, including: the data acquisition module acquires ESG basic data of enterprises from a set data source interface; The data governance module governs the acquired ESG basic data based on the AI big model, and comprises: Constructing basic resources, and establishing an industry-enterprise-index three-level feature library, wherein industry-level features comprise industry codes, index codes, data distribution rules and abnormal judgment thresholds, enterprise-level features comprise enterprise IDs, historical data sequences, trend features and personalized thresholds, and index-level features comprise index codes, calculation logic, associated index lists and associated constraint rules; performing hierarchical anomaly detection, performing industrial-level feature detection, enterprise-level feature detection and index-level feature detection through an AI large model, and generating an anomaly data report containing anomaly level, anomaly type and judgment basis; Performin