CN-121217756-B - Big data information acquisition processing method and system

CN121217756BCN 121217756 BCN121217756 BCN 121217756BCN-121217756-B

Abstract

The invention relates to the technical field of electric digital data processing, and particularly discloses a big data information acquisition and processing method and system, wherein the method comprises the steps of firstly backtracking and extracting the running state of an internet data source end in a designated window, calculating the source health degree and judging whether data are credible according to the running state, and the method is used for timely reducing or isolating an abnormal source, avoiding abnormal slow-dragging account closing and providing basis for subsequent parameter self-adaption; and finally, carrying out self-adaptive data decryption on the effective data set at the data calculation port, completing the decryption of the minimum range and producing processable indexes under the condition of only a required period and a necessary field, and completing the acquisition and processing of the big data information.

Inventors

LIU GUANGJUN

Assignees

北京浩太同益科技发展有限公司

Dates

Publication Date: 20260508
Application Date: 20250910

Claims (7)

1. The big data information acquisition and processing method is characterized by comprising the following steps: s1, extracting running state data of an internet data source under a backtracking appointed window, evaluating source health degree of the internet data source, and judging a data trusted state of the internet data source; S2, based on the data trusted state, the data service port calls a produced data set of the internet data source end to execute big data processing, wherein the big data processing comprises big data encryption processing and idempotent production processing; s3, marking the output data set after the big data processing as an effective data set, performing self-adaptive data decryption on the effective data set at a data computing port, and finally obtaining a processable index of the effective data set to finish the big data information acquisition processing; the specific evaluation process of evaluating the source health degree of the internet data source end comprises the following steps: The running state data of the internet data source end comprises clock offset of the internet data source end, packet loss rate of the internet data source end, check error rate of the internet data source end and sampling intervals of the internet data source end in a backtracking appointed window; Performing standard deviation processing on each sampling interval of the internet data source end to obtain and record the sampling interval as the sampling rate stability of the internet data source end; Respectively carrying out normalization processing on the clock offset of the internet data source end, the packet loss rate of the internet data source end, the check error rate of the internet data source end and the sampling rate stability of the internet data source end, and introducing weight factors into normalization processing results to sequentially carry out weighted aggregation to obtain the source health degree of the internet data source end; the specific judging process for judging the data trust state of the internet data source end comprises the following steps: comparing the source health degree of the Internet data source end with a predefined source health degree threshold value to obtain a source health comparison result, and judging the data trusted state of the Internet data source end by the data service port based on the source health comparison result; the source health comparison result comprises a source health first comparison result and a source health second comparison result; the first comparison result of the source health is displayed that the source health degree of the Internet data source end is larger than or equal to a source health degree threshold value, and the second comparison result of the source health is displayed that the source health degree of the Internet data source end is smaller than the source health degree threshold value; If the source health comparison result is displayed as a first source health comparison result, the data service port judges that the data state of the Internet data source end is data trusted, and if the source health comparison result is displayed as a second source health comparison result, the data service port judges that the data state of the Internet data source end is data untrusted; The data service port calls a produced data set of an internet data source end to execute big data processing, and the specific execution process is as follows: The data service port judges that the data state of the Internet data source terminal is data trusted, and the Internet data source terminal transmits a produced data set to the data service port under a backtracking designated window for executing big data encryption processing; the big data encryption processing is executed by the following specific execution process: If a plurality of data points in the output data set carry security labels, the data points are expressed as sensitive data points, big data encryption processing is carried out on the data points carrying the security labels, and the rest data points are kept in a plaintext state; The data service port recruits and transmits the source health degree of the Internet data source end to the affiliated central unit in real time, the central unit instantly generates a single key, and calls a first effective window of the key according to the source health degree of the Internet data source end, the single key and the first effective window of the key are transmitted to the data service port, and the data service port encapsulates the association information of the sensitive data point and the single key to form a random encapsulation set for encapsulating the sensitive data point; The specific limiting process of the key first effective window comprises the steps of carrying the source health of the internet data source end into a mapping relation comparison set based on the mapping relation comparison set between the source health and the effective window correction element according to the source health of the internet data source end, obtaining an effective window correction element, and coupling the effective window correction element with an initial key effective window in a central unit to obtain the key first effective window; And when the data service port judges that the data state of the internet data source terminal is data untrustworthy, the data service port calls a produced data set of the internet data source terminal under a backtracking appointed window to execute idempotent production processing.
2. The method for acquiring and processing big data information according to claim 1, wherein the idempotent production process is specifically implemented by the following steps: performing difference processing on the source health degree of the internet data source end and a source health degree threshold value to obtain source health degree deviation of the internet data source end; comparing the source health degree deviation of the internet data source end with predefined source health degree deviation intervals, determining a specific interval of the source health degree deviation of the internet data source end, and executing idempotent production processing on the output data set; each source health degree deviation interval comprises a source health degree first deviation interval, a source health degree second deviation interval, a source health degree third deviation interval and a source health degree fourth deviation interval; when the source health degree deviation of the internet data source end belongs to a source health degree first deviation interval, performing difference processing on the source health degree deviation of the internet data source end and a source health degree first deviation interval minimum value to obtain a source health degree deviation first overflow quantity of the internet data source end, matching to obtain a weight parameter influence element, coupling the weight parameter influence element with a weight parameter corresponding to a predefined output data set to obtain an adaptive weight parameter corresponding to the output data set, wherein the adaptive weight parameter is used for configuring a process of using a data computing port to compute the output data set; When the source health degree deviation of the internet data source end belongs to the second deviation interval of the source health degree or the third deviation interval of the source health degree, carrying out controllable data restoration on the output data set; When the source health degree deviation of the internet data source end belongs to a source health degree fourth deviation interval, the data service port rejects the output data set of the internet data source end, pre-warns the trust state of the internet data source end, and enters a manual judgment waiting mode to wait for manually judging whether to reestablish the data transmission link of the internet data source end.
3. The method for collecting and processing big data information as set forth in claim 2, wherein the step of performing controllable data restoration on the output data set, and performing data restoration if the source health degree deviation of the internet data source end belongs to a second deviation interval of the source health degree, comprises the following specific restoration processes: If the source health degree deviation of the internet data source end belongs to a source health degree second deviation interval, marking data points which do not carry security labels in the output data set as non-sensitive data points, performing original photocopying on the non-sensitive data points and storing the non-sensitive data points in a data service port storage unit, performing repair actions on the non-sensitive data points, marking the non-sensitive data points which finish the repair actions as gray data points, configuring a repair label, and identifying the repair label by the data service port to perform quality detection; The gray data points and the non-sensitive data points are subjected to cosine similarity comparison to obtain repair similarity of the gray data points, the repair similarity is checked with a predefined repair similarity definition value, when the repair similarity of the gray data points is smaller than the repair similarity definition value, the data service port passes through the gray data points to represent that the gray data points meet the calculation process of the data calculation port, and otherwise, the data service port isolates the gray data points; And the data service port simultaneously executes big data encryption processing on the sensitive data points, performs difference processing on the minimum value of the source health degree deviation and the source health degree second deviation interval of the internet data source end to obtain a source health degree deviation second overflow amount of the internet data source end, maps to obtain an effective window correction element, is coupled with an initial key effective window in the central unit to obtain a key second effective window, and configures a big data encryption processing process based on the key second effective window.
4. The method for collecting and processing big data information according to claim 3, wherein the step of performing controllable data restoration on the output data set further comprises the steps of determining that the output data set is retransmitted preferentially if the source health degree deviation of the internet data source end belongs to a third deviation interval of the source health degree, wherein the retransmission process comprises the following steps: Performing difference processing on the minimum value of the source health degree deviation of the internet data source end and the source health degree third deviation interval to obtain a source health degree deviation third overflow amount of the internet data source end, and based on a mapping relation comparison set between the source health degree deviation third overflow amount and a backtracking window influence factor, bringing the source health degree deviation third overflow amount of the internet data source end into the mapping relation comparison set to obtain the backtracking window influence factor, and coupling with a backtracking designated window to obtain a segmentation adaptation window; extracting running state data of the Internet data source end under the segmentation adaptation window, re-evaluating source health degree of the Internet data source end by the data service end, recording the source health degree as retransmission health degree of the Internet data source end, comparing the retransmission health degree of the Internet data source end with a predefined retransmission health degree threshold value according to the retransmission health degree of the Internet data source end, re-judging the data trusted state of the Internet data source end for iteration until the retransmission health degree of the Internet data source end is smaller than the retransmission health definition degree, and triggering an iteration stopping condition; and recording output data of the Internet data source end under the segmentation adaptation window into a retransmission data set, simultaneously executing big data encryption processing on data points carrying security tags in the retransmission data set by the data service port, mapping to obtain an effective window adjusting parameter according to the retransmission health degree of the Internet data source end, coupling with an initial key effective window in the central unit, obtaining a key third effective window, and configuring a big data encryption processing process based on the key third effective window.
5. The method for collecting and processing big data information according to claim 1, wherein the self-adaptive data decryption comprises the following specific analysis processes: the data service port uploads an effective data set carrying the single key and the key effective window to the data calculation port, and uses and calculates the effective data set; the data calculation port invokes Shan Cisheng the key to the central unit, performs data point one-to-one calibration with the received real-time single key of the effective data set, extracts the real-time key effective window of the effective data set if the single key data point is successfully calibrated one by one, matches with the current time point, and performs decryption calculation on the effective data set through the closed check of the data calculation port if the current time point belongs to the real-time key effective window; if the current time point does not belong to the real-time key effective window, the effective data set does not pass the closed check of the data computing port, the data computing port judges that the effective data set is invalid, the difference value processing is carried out on the current time point and the real-time key effective window termination time point, the effective delay difference of the effective data set is obtained, the effective delay difference is transmitted to the central unit, and the key generation process of the central unit is configured; If there is a single key data point-to-one calibration failure, the data computation port rejects the valid data set and feeds back the rejection status to the data service unit.
6. The method for collecting and processing big data information according to claim 5, wherein the self-adaptive data decryption further comprises: The active data set includes data points that carry a security tag and data points that do not carry a security tag, wherein adaptive data decryption is performed if and only if the data points that carry a security tag satisfy a decryption trigger condition: Extracting source health of an internet data source end, matching with a receiving window closing interval corresponding to each predefined source health interval, determining a specific interval of the source health of the internet data source end, and obtaining a receiving window closing interval corresponding to the interval, and recording the receiving window closing interval as a receiving window reference closing interval; Acquiring a termination arrival time point of the effective data set received by the data computing port, marking the termination arrival time point as a global watermark of the effective data set, comparing the global watermark with a termination time point of a receiving window reference closing interval, and executing self-adaptive data decryption on the effective data set in a hardware isolation area if and only if the global watermark of the effective data set is larger than or equal to the termination time point of the receiving window reference closing interval, or else, not executing the self-adaptive data decryption; And pulling the data points carrying the security tag in the hardware isolation area for decryption operation, and combining and aggregating the decryption result and the decryption result of the data points not carrying the security tag to obtain the processable index of the effective data set.
7. A big data information acquisition processing system, applying the big data information acquisition processing method according to any one of claims 1 to 6, characterized by comprising: The trusted state judging module is used for extracting the running state data of the internet data source end under the backtracking appointed window, evaluating the source health degree of the internet data source end and judging the data trusted state of the internet data source end; the big data processing module is used for calling a produced data set of the internet data source end by the data service port based on the data trusted state to execute big data processing, wherein the big data processing comprises big data encryption processing and idempotent production processing; and the effective data decryption module is used for marking the output data set after the big data processing is executed as an effective data set, carrying out self-adaptive data decryption on the effective data set at the data calculation port, finally obtaining the processable index of the effective data set and completing the big data information acquisition processing.

Description

Big data information acquisition processing method and system Technical Field The invention relates to the technical field of electric digital data processing, in particular to a big data information acquisition processing method and system. Background The existing big data information acquisition processing method generally comprises the steps of acquiring multi-source heterogeneous data at an edge or business end through a data acquisition proxy, a buried point or gateway, transmitting and buffering the multi-source heterogeneous data to an original data area through a message queue, conducting data management and safety control based on catalogues and blood edges, conducting verification, de-duplication, complementation, time alignment and standardization on the original data to generate a detail layer, conducting dimension modeling and index solidification in a subject area, conducting aggregation and state calculation through batch processing and stream processing, precipitating characteristics or indexes and supporting model training and online reasoning, conducting external service on results through online analysis processing and an application programming interface, running a full link through observability, service level protocol and cost management guarantee, and conducting closed loop acquisition, processing, service monitoring and optimization through a comparison experiment method and drift monitoring feedback iteration. For example, the chinese patent application CN119621830B discloses a large data-based visual data information acquisition system and medium thereof, and in particular relates to the field of electronic commerce, which comprises a data encryption acquisition module, a data cleaning and integration module, a data analysis module, a real-time data monitoring module, a visual template customization module, and a dynamic data interaction module. The visualized data information acquisition system based on big data and the medium thereof encrypt target platform data through a data encryption acquisition module to reduce hidden danger of data acquisition, perform real-time data processing operation through a real-time data monitoring module and set a data change rule and an early warning mechanism to acquire and update data in time, receive third key information through a dynamic data interaction module and display the third key information on a visualized interface according to a preset mode, and provide autonomous exploration and prediction analysis of dynamic data for users. For example, the Chinese patent application with publication number of CN120086293A discloses a large data information acquisition system based on a computer, and relates to the technical field of data acquisition and processing. The big data information acquisition system based on the computer comprises a data acquisition module, a data fusion module, a data storage and management module, a data processing and analysis module, a data analysis and mining module and a data security and privacy protection module, wherein the data acquisition module is mainly responsible for collecting original data from various data sources. According to the technical scheme, the processing links for encryption, decryption, cleaning or analysis mostly exist in the existing big data processing scheme, which means that plaintext appears on the processing surface, and the risk of data leakage and internal and external abuse can be amplified due to overlarge exposed surface of plaintext processing in the existing processing scheme caused by huge data volume and higher complexity and diversity of data structures in big data processing. In addition, most of big data processing schemes adopt real-time computing links driven by window aggregation and events, which means that statistical definition and inclusion range of the same data information under different running or recovery scenes may deviate, so that index computation of the existing processing scheme is easy to be repeated or missed, and accordingly account checking deviation, alarm false alarm or missing alarm and other risks of a cross-processing end are amplified. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a big data information acquisition and processing method and a big data information acquisition and processing system, which can effectively solve the problems related to the background art. The invention provides a big data information acquisition and processing method, which comprises the following steps of S1, extracting running state data of an Internet data source end under a backtracking appointed window, evaluating source health of the Internet data source end, judging a data trusted state of the Internet data source end, S2, calling a produced data set of the Internet data source end by a data service port based on the data trusted state to execute big data processing, wherein the big data processing comprises big data enc