CN-122001901-A - Big data sharing method and sharing system

CN122001901ACN 122001901 ACN122001901 ACN 122001901ACN-122001901-A

Abstract

The invention provides a big data sharing method and a big data sharing system, which comprise the steps of S1, acquiring real-time service data and environment dynamic data of a target sharing scene, wherein the real-time service data comprises multi-source heterogeneous data to be shared and data interaction characteristics, the environment dynamic data comprises scene service rule change data, data characteristic distribution change data and security level change data, S2, carrying out standardized preprocessing on the real-time service data, carrying out quantization coding processing on the environment dynamic data, and respectively extracting service characteristic vectors and environment dynamic characteristic vectors, and S3, calculating to obtain a target parameter set adapting to a current scene through a dynamic parameter optimization algorithm based on a preset initial parameter set, the service characteristic vectors and the environment dynamic characteristic vectors. The invention monitors the change rate of the service data and the environment data in real time, triggers the self-adaptive adjustment flow, and ensures that the system stably adapts to the dynamic scene for a long time.

Inventors

GU MINJUN
WU QINGDONG
LI PUYANG
YAO YU
WU XUCHEN
LIU JUN

Assignees

南京安夏电子科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260115

Claims (10)

1. A big data sharing method, comprising: S1, acquiring real-time service data and environment dynamic data of a target sharing scene, wherein the real-time service data comprises multi-source heterogeneous data to be shared and data interaction characteristics, and the environment dynamic data comprises scene service rule change data, data characteristic distribution change data and security level change data; S2, carrying out standardized preprocessing on the real-time service data, carrying out quantization coding processing on the environment dynamic data, and respectively extracting service feature vectors and environment dynamic feature vectors; S3, calculating a target parameter set adapted to the current scene through a dynamic parameter optimization algorithm based on a preset initial parameter set, the service feature vector and the environment dynamic feature vector, wherein the target parameter set comprises a data sharing weight parameter, a model decision threshold parameter and an encryption strength parameter; s4, according to the security level change data and the encryption strength parameter, carrying out hierarchical encryption processing on the standardized and preprocessed real-time service data by adopting a self-adaptive encryption algorithm to obtain encrypted data and plaintext data; s5, inputting the encrypted data, the plaintext data and the service feature vector into a preset sharing decision model, and driving the sharing decision model to output a preliminary sharing result by adopting the target parameter set, wherein the preliminary sharing result comprises a sharing data candidate set, a data receiving party list and a sharing interface identifier; S6, collecting feedback data of the preliminary sharing result, calculating a real-time performance evaluation index of a sharing decision model, and judging whether the real-time performance evaluation index meets a preset performance threshold; S7, if the real-time performance evaluation index does not meet a preset performance threshold, updating network parameters of the sharing decision model through a model self-iteration algorithm based on the environment dynamic feature vector and error data of the preliminary sharing result, and returning to the step S5; S8, based on the data receiver list and the sharing interface identification, outputting the shared data candidate set to a corresponding receiver through a cross-scene compatible sharing protocol to finish data sharing; And S9, monitoring the service data change rate and the environment dynamic data change rate of the target sharing scene in real time, and if any change rate exceeds a preset change threshold, returning to the step S1, and re-executing the parameter optimization and model adaptation flow.
2. The big data sharing method according to claim 1, wherein the calculation process of the dynamic parameter optimization algorithm in step S3 includes: S31, extracting key influence factors in the service feature vector, wherein the key influence factors comprise a data interaction frequency change rate delta f, a data interaction limit change rate delta C and data sharing demand intensity S; s32, extracting scene adaptation factors in the environment dynamic feature vector, wherein the scene adaptation factors comprise business rule change coefficients Evolution coefficient of data characteristics ; S33 for each parameter in the initial parameter set The target parameters are calculated by the following formula : Wherein, the For presetting the adjusting factor, satisfy And (2) and ; = (Current periodic interaction frequency-historical average interaction frequency)/historical average interaction frequency, = (Current period interaction amount-history average interaction amount)/history average interaction amount, The sharing request frequency and the emergency degree of the data receiver are obtained through weighted calculation; When the business rule is unchanged When business rules are fully updated And the cosine similarity change of the data characteristic distribution is calculated.
3. The big data sharing method according to claim 1, wherein the specific implementation procedure of step S2 includes: S21, carrying out standardized pretreatment on real-time service data: for numerical service data, a Z-score standardized formula is adopted for processing: Wherein, the Is the original numerical value type data, and the data is the data of the original numerical value type, As a historical average of such data, Historical standard deviation of the data; for text type service data, a TF-IDF algorithm is adopted to extract text characteristics and normalize the text characteristics into a vector form: Wherein, the Is a term In a document Is used for the word frequency of the word, Is a term In a document Is used to determine the number of occurrences of the picture, Is the sum of the number of occurrences of all terms in document d; Is a term Is used to determine the inverse document frequency of (c), For the total number of documents, To include the term For image type business data, adopting normalization processing to map pixel values to the [0,1] interval: ; S22, carrying out quantization coding processing on the environment dynamic data, namely mapping scene service rule change data into discrete values (no change: 0, partial change: 0.5 and complete change: 1), calculating a distribution difference value by KL divergence of the data feature distribution change data, mapping the data feature distribution change data into a [0,1] interval to serve as a data feature evolution coefficient, and mapping security level change data into 1-5 level quantized values corresponding to encryption intensity levels.
4. The big data sharing method according to claim 1, wherein the adaptive encryption algorithm in step S4 includes: s41, determining an encryption level L (L epsilon {1,2,3,4,5 }) according to the security level quantized value; S42, when L=1, encrypting the sensitive field by adopting an AES-128 algorithm, and storing the non-sensitive field in a plaintext form; when L=2, encrypting all service data by adopting an AES-192 algorithm, wherein the key length is 192 bits; when l=3, encrypting by adopting an AES-256 algorithm, and combining with hash check (SHA-256) to ensure data integrity; when L=4, adopting an AES-256 algorithm and a homomorphic encryption algorithm to support limited operation in a ciphertext state; When L=5, adopting an AES-256 algorithm and a federal learning encryption protocol to realize that the data is available and invisible; S43, optimizing a key distribution mechanism based on the Ethernet interaction protocol, and generating a dynamic key through the following formula: Wherein, the As a base key, T is a time stamp (accurate to seconds), The data receiver is uniquely identified by a data receiver, The key validity period is positively correlated with the security level, the validity period is 24 hours when L=1, and the validity period is shortened by 50% when L is increased by 1 level.
5. The big data sharing method according to claim 1, wherein the real-time performance evaluation index in step S6 includes sharing accuracy Shared recall Shared delay Data security rate The specific calculation mode is as follows: Wherein, the The number of data entries in the preliminary sharing result; The preset performance threshold is that Acc is more than or equal to 95%, rec is more than or equal to 90%, lat is less than or equal to 100ms, sec=100%, and when all indexes meet the threshold requirement, the preset performance threshold is judged to be met.
6. The big data sharing method according to claim 1, wherein the model self-iterative algorithm in step S7 adopts an adaptive gradient descent algorithm, and specifically includes: S71, constructing a loss function L, wherein the loss function is a multi-objective optimization function: Wherein, the To lose weight, satisfy And (2) and ; Is the maximum allowable sharing delay (preset to 500 ms); S72, calculating gradient ∇ L of the loss function L to the network parameter W of the shared decision model: Wherein, the Is the number of network parameters; s73, updating network parameters by adopting an adaptive learning rate: , wherein, For an initial learning rate (preset to 0.001), For the attenuation coefficient (preset to 0.1), For the initial loss value to be the same, ' Is an updated network parameter; And S74, repeating the steps S71-S73 until the loss function L is less than or equal to a preset loss threshold value (preset to 0.05) or the iteration number reaches a maximum iteration threshold value (preset to 100 times), stopping iteration and storing the updated model parameters.
7. The big data sharing system according to claim 1, wherein the cross-scenario compatible sharing protocol in step S8 comprises: S81, determining interface types (REST interface, webSocket interface and blockchain intelligent contract interface) based on the shared interface identification; s82, adopting corresponding data packet formats aiming at different interface types: For REST interface, adopting JSON format to package shared data, wherein the fields comprise data_id (data unique identifier), data_type (data type), data_content (data content, encrypted data or plaintext), timestamp, signature (digital signature); For the WebSocket interface, binary stream format transmission is adopted, and the data packet head part comprises an interface identifier, a data length and a check code; For a blockchain intelligent contract interface, encapsulating data according to the contract ABI specification, and calling a shared execution function of the contract to complete data uplink and receiver authorization; And S83, establishing an interface adaptation middleware, automatically identifying the interface type of a receiving party, converting the format of a data packet, and ensuring the data intercommunication compatibility of cross-region and cross-platform.
8. The big data sharing system according to claim 1, wherein the service data change rate R 1 and the environment dynamic data change rate R 2 in step S9 are calculated as follows: Wherein, the For the current period of the traffic feature vector, For the average of the historical 3-cycle traffic feature vectors, For the current periodic environment dynamic feature vector, For the average of the historical 3-cycle environment dynamic feature vectors, the ||seed || 2 is the L2 norm; The preset change threshold is R 1 -30% or R 2 -20%, when the condition is met, the scene is determined to be changed obviously, and the parameter re-optimization and model adaptation flow is triggered.
9. A big data sharing system, characterized by being applied to the big data sharing method of any of claims 1-8, comprising: The data acquisition module is used for acquiring real-time service data and environment dynamic data of a target sharing scene, wherein the real-time service data comprises multi-source heterogeneous data to be shared and data interaction characteristics, and the environment dynamic data comprises scene service rule change data, data characteristic distribution change data and security level change data; the data preprocessing module is used for carrying out standardized preprocessing on the real-time service data, carrying out quantization coding processing on the environment dynamic data and respectively extracting service feature vectors and environment dynamic feature vectors; The parameter optimization module is used for obtaining a target parameter set adapting to the current scene through calculation of a dynamic parameter optimization algorithm based on a preset initial parameter set, the service feature vector and the environment dynamic feature vector, wherein the target parameter set comprises a data sharing weight parameter, a model decision threshold parameter and an encryption strength parameter; the self-adaptive encryption module is used for carrying out hierarchical encryption processing on the standardized and preprocessed real-time service data by adopting a self-adaptive encryption algorithm according to the security level change data and the encryption intensity parameter to obtain encrypted data and plaintext data; The model decision module is used for inputting the encrypted data, the plaintext data and the service feature vector into a preset sharing decision model, and driving the sharing decision model to output a preliminary sharing result by adopting the target parameter set, wherein the preliminary sharing result comprises a sharing data candidate set, a data receiving party list and a sharing interface identifier; The performance evaluation module is used for collecting feedback data of the preliminary sharing result, calculating a real-time performance evaluation index of the sharing decision model and judging whether the real-time performance evaluation index meets a preset performance threshold value or not; The model iteration module is used for updating the network parameters of the sharing decision model through a model self-iteration algorithm based on the environment dynamic feature vector and the error data of the preliminary sharing result when the real-time performance evaluation index does not meet a preset performance threshold; the sharing execution module is used for outputting the shared data candidate set to the corresponding receiver through a cross-scene compatible sharing protocol based on the data receiver list and the sharing interface identifier when the real-time performance evaluation index meets a preset performance threshold value, so as to complete data sharing; And the dynamic monitoring module is used for monitoring the service data change rate and the environment dynamic data change rate of the target sharing scene in real time, triggering the data acquisition module to acquire data again if any change rate exceeds a preset change threshold value, and starting a new parameter optimization and model adaptation flow.
10. The big data sharing system of claim 9, wherein the hardware architecture of said system comprises: the system comprises an edge acquisition layer, a target sharing scene acquisition layer and a target sharing scene acquisition layer, wherein the edge acquisition layer consists of a data acquisition terminal, a sensor and an API gateway, the data acquisition terminal comprises an industrial-level internet of things terminal and a server cluster and is used for acquiring multi-source heterogeneous real-time service data; the edge computing layer is deployed at a computing server of the edge node, is configured with a CPU and a GPU and is used for executing low-delay computing tasks such as data preprocessing, dynamic parameter optimization and the like; The cloud platform layer comprises a cloud server cluster, a distributed database and a blockchain node cluster, wherein the cloud server cluster is used for deploying a shared decision model and executing model self-iterative training; the distributed database is used for storing real-time business data, environment dynamic data, model parameters and shared logs, and the blockchain node cluster is used for storing encrypted data hash values and shared authorization records so as to ensure data traceability; the application interface layer consists of interface adaptation middleware and a load balancer, wherein the interface adaptation middleware supports REST, webSocket self-adaptive conversion of the block chain intelligent contract interface, and the load balancer is used for distributing sharing requests and avoiding single-point faults.

Description

Big data sharing method and sharing system Technical Field The invention relates to the technical field of big data processing and information sharing, in particular to a big data sharing method and a big data sharing system. Background Along with the rapid development of artificial intelligence and the Internet of things technology, big data becomes a core production element of collaborative development of various industries, and big data sharing can break data island, so that resource optimization configuration and decision efficiency improvement are realized. At present, the big data sharing technology is widely applied to scenes such as intelligent supply chains, e-commerce user evaluation feedback and the like, and various data sharing methods and systems are formed. For example, patent CN117235181a discloses a method and a system for sharing big data of an intelligent supply chain, which are used for realizing data storage and sharing of each member node of the supply chain by constructing a blockchain network, designing intelligent contracts such as regional distribution contracts, data encryption contracts and sharing contracts, introducing an interestingness calculation model, and realizing automatic data sharing based on parameters such as transaction frequency, sharing frequency and the like. The technology guarantees the transparency of data sharing through the distributed characteristic of the blockchain, improves the data security through encryption contracts, but has obvious limitation in practical application. Another patent CN119917890a discloses a method and a system for sharing user information based on big data, which perform feature cluster analysis on false evaluation information of a target platform and similar platforms, fuse abnormal evaluation indexes, screen reliable evaluation data based on a dynamic credibility threshold, and realize credible sharing of user evaluation information. The technology focuses on the authenticity filtering of evaluation data, but does not solve the problems of parameter adaptation and model expansion in dynamic scenes. Although the above prior art realizes big data sharing in a specific scene, both adopt a core architecture of preset parameters and fixed algorithm, and lack self-adaptive capability to dynamic service scenes, and specific defects are as follows: in the comparison document 1, the interest degree calculation model depends on preset weight parameters When the transaction frequency and the data sharing mode of the supply chain members are dynamically changed (such as transaction shock and newly-increased long-term cooperation members caused by sales promotion activities), the preset weight cannot be adapted to scene change in real time, manual intervention is needed for fine adjustment, response is delayed, and sharing precision is easily reduced due to human misoperation. In the comparison file 2, the fusion weight of the abnormal evaluation indexes adopts a fixed allocation mode (the first abnormal evaluation index weight is 0.7, and the associated abnormal evaluation index weight is 0.3), the dynamic change of the association degree of the similar platforms (such as the reduction of the data reference value of a certain similar platform caused by false evaluation flooding) is not considered, and the fusion result of the abnormal evaluation indexes is deviated from the actual scene requirement due to the fixed weight, so that the screening accuracy of reliable evaluation data is affected. The parameters are dependent on manual adjustment, and the flexibility is extremely poor. The intelligent contract logic (including area allocation rules, automatic sharing triggering conditions and the like) of the comparison file 1 is solidified in the blockchain network, if a new service link (such as cross-border logistics tracking and after-sales service data synchronization) of the supply chain is added, the intelligent contract code needs to be modified and redeployed to the blockchain node, so that the development cost is high, the deployment period is long, and the continuity of the original sharing service can be influenced. The evaluation trend analysis plug-in of the comparison file 2 is trained based on the BP neural network, training data only covers the existing false evaluation features (such as short text good evaluation and low-quality picture evaluation), when new features (such as batch homogenization evaluation generated by AI and false evaluation with separated pictures and texts) appear in the false evaluation, an original model cannot identify the new features, sample data need to be collected again for model retraining, and evolution trend of the false evaluation cannot be responded in real time, so that unreliable evaluation data is missed. The model generalization capability is weak, and the new scene and the new characteristic are difficult to deal with The core defect of the prior art is that the intelligent algorithm lacks