CN-121979913-A - Big data calculation method, big data calculation device, electronic equipment, storage medium and program product
Abstract
The application relates to the technical field of computers, and provides a big data computing method, a big data computing device, electronic equipment, a storage medium and a program product. The big data calculation method comprises the steps of obtaining the maximum value of identifiers of a plurality of data queried last time, wherein each data corresponds to a unique identifier, the data are arranged according to the ascending order of the identifiers, conducting data query on the data in the current period according to the preset query batch number and the maximum value of the identifiers, updating the maximum value of the identifiers based on the target data after the target data are queried, distributing the target data to a plurality of calculation containers in batches, and clearing the data of the corresponding batch from a cache after the calculation containers finish data calculation of the corresponding batch, so that data overflow can be avoided, the query is restarted under the condition of calculation interruption, and calculation recovery time is shortened.
Inventors
- JIANG YUANSHUN
- YE XINFA
- CHENG JINGTAO
- WANG HAOYI
- Xie Bofei
- WU GUILU
- ZENG SHENGRONG
Assignees
- 金蝶软件(中国)有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251201
Claims (11)
- 1. A big data calculation method, characterized by comprising: obtaining the maximum value of the identifications of a plurality of data queried last time, wherein each data corresponds to a unique identification, and the data are arranged according to the ascending order of the identifications; according to the preset query batch number and the maximum value of the identifier, carrying out data query on the data in the current period, and after querying target data, updating the maximum value of the identifier based on the target data; Issuing the target data batchwise to a plurality of computing containers; And after the calculation container finishes the data calculation of the corresponding batch, clearing the data of the corresponding batch from the cache.
- 2. The big data calculation method according to claim 1, wherein the data query on the current period data according to the preset number of query batches and the identified maximum value includes: acquiring the number of data which are not calculated in the data which are queried last time; and if the number of the data which is not calculated is smaller than or equal to the preset query triggering number, carrying out data query on the data in the current period according to the preset query batch number and the maximum value of the identification.
- 3. The big data calculation method according to claim 2, wherein obtaining the number of data not yet calculated among the data of the last query includes: Reading the total amount of data queried last time and the amount of data which has been calculated from a record table, wherein the amount of data which has been calculated is updated by the calculation container after each calculation is completed; and determining the number of the data which is not yet calculated according to the total number of the data which is queried last time and the number of the data which is already calculated.
- 4. The big data calculation method according to claim 1, wherein before acquiring the maximum value of the identifications of the plurality of data queried last time, the method further comprises: Inquiring the total data in the current period, and performing pre-verification on the total data, wherein the pre-verification is used for performing authority verification; If the total data in the current period passes the pre-verification, determining a maximum identifier in the total data, wherein the maximum identifier is used for limiting the query range of the target data; and if the total data in the current period does not pass the pre-verification, ending the query.
- 5. The big data calculation method according to claim 1, wherein obtaining the maximum value of the identifications of the plurality of data queried last time, comprises: acquiring state data recorded in a record table, wherein the state data comprises one or more of a calculation state, a recalculation frequency and a calculation identifier, and the calculation identifier is used for recording the identifier of the data after calculation; if the current normal calculation scene is determined according to the state data, reading the maximum value of the identifications of the plurality of data queried last time from the record table; And if the current abnormal recalculation scene is determined according to the state data, taking the maximum value of the identifications of the data in the warehouse recorded in the record table as the maximum value of the identifications of the data queried last time, and updating the maximum value of the identifications of the data in the warehouse by the calculation container after each calculation, wherein the maximum value of the identifications of the data in the warehouse is represented by the maximum value of the identifications of the data which have already been calculated.
- 6. The big data calculation method according to claim 5, wherein the plurality of calculation containers share one of the record tables, and after the target data is issued to the plurality of calculation containers in batches, the method further comprises: after the calculation container finishes the data calculation of the corresponding batch, initiating a locking request according to the data identifier of the corresponding batch; And after successful locking, recording the maximum value of the identification of the data in the warehouse in the record table.
- 7. The big data computing method of claim 1, wherein issuing the target data in batches to a plurality of computing containers comprises: And distributing the data of each batch to the corresponding computing container according to the number of each batch after the target data is batched, the load state of each computing container and the distribution weight of each computing container, wherein the distribution weight is determined according to the historical computing efficiency of the computing container.
- 8. A big data computing device, comprising: The acquisition module is used for acquiring the maximum value of the identifiers of the plurality of data queried last time, wherein each data corresponds to a unique identifier, and the data are arranged in ascending order according to the identifiers; The query module is used for carrying out data query on the data in the current period according to the preset query batch quantity and the maximum value of the identifier, and updating the maximum value of the identifier based on the target data after the target data is queried; a distribution module for batch-issuing the target data to a plurality of computing containers; And the calculation module is used for clearing the data of the corresponding batch from the cache after the calculation container finishes the data calculation of the corresponding batch.
- 9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the big data calculation method according to any of claims 1 to 7 when executing the computer program.
- 10. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the big data calculation method according to any one of claims 1 to 7.
- 11. A computer program product, characterized in that the computer program product, when run on an electronic device, causes the electronic device to perform the big data calculation method of any of claims 1 to 7.
Description
Big data calculation method, big data calculation device, electronic equipment, storage medium and program product Technical Field The present application relates to the field of computer technologies, and in particular, to a big data computing method, a big data computing device, an electronic device, a storage medium, and a program product. Background In the prior art, for large-scale data calculation, data to be calculated is generally queried first, all the data to be calculated is stored in a cache, and after the calculation is completed, the data in the cache is cleared uniformly. Because the data volume is large, if all the data to be calculated are stored in the cache, memory overflow caused by insufficient memory is easy to occur. Moreover, since the amount of calculation data is excessively large, calculation congestion is also liable to occur in the calculation process. Meanwhile, in the case of a failure causing a calculation interrupt, it is generally necessary to re-execute the query and calculation of data, resulting in an excessively long calculation recovery time. Disclosure of Invention In view of the above, embodiments of the present application provide a big data computing method, apparatus, electronic device, storage medium, and program product, which can solve the problems in the prior art that memory overflow is easy to occur when data is cached, computing congestion is easy to occur when data is computed, and recovery time is too long after computation is interrupted. A first aspect of the embodiment of the present application provides a method for calculating big data, including: obtaining the maximum value of the identifications of a plurality of data queried last time, wherein each data corresponds to a unique identification, and the data are arranged according to the ascending order of the identifications; according to the preset query batch number and the maximum value of the identifier, carrying out data query on the data in the current period, and after querying target data, updating the maximum value of the identifier based on the target data; Issuing the target data batchwise to a plurality of computing containers; And after the calculation container finishes the data calculation of the corresponding batch, clearing the data of the corresponding batch from the cache. In an embodiment, according to the preset batch number of the query and the maximum value of the identification, performing data query on the data in the current period includes: acquiring the number of data which are not calculated in the data which are queried last time; and if the number of the data which is not calculated is smaller than or equal to the preset query triggering number, carrying out data query on the data in the current period according to the preset query batch number and the maximum value of the identification. In one embodiment, obtaining the number of data that has not been calculated among the data of the last query includes: Reading the total amount of data queried last time and the amount of data which has been calculated from a record table, wherein the amount of data which has been calculated is updated by the calculation container after each calculation is completed; and determining the number of the data which is not yet calculated according to the total number of the data which is queried last time and the number of the data which is already calculated. In an embodiment, before obtaining the maximum value of the identifications of the plurality of data queried last time, the method further includes: Inquiring the total data in the current period, and performing pre-verification on the total data, wherein the pre-verification is used for performing authority verification; If the total data in the current period passes the pre-verification, determining a maximum identifier in the total data, wherein the maximum identifier is used for limiting the query range of the target data; and if the total data in the current period does not pass the pre-verification, ending the query. In an embodiment, obtaining the maximum value of the identifications of the plurality of data queried last time includes: acquiring state data recorded in a record table, wherein the state data comprises one or more of a calculation state, a recalculation frequency and a calculation identifier, and the calculation identifier is used for recording the identifier of the data after calculation; if the current normal calculation scene is determined according to the state data, reading the maximum value of the identifications of the plurality of data queried last time from the record table; And if the current abnormal recalculation scene is determined according to the state data, taking the maximum value of the identifications of the data in the warehouse recorded in the record table as the maximum value of the identifications of the data queried last time, and updating the maximum value of the identifications of the d