CN-121996726-A - Data consumption method, device, equipment and medium
Abstract
The application discloses a data consumption method, a device, equipment and a medium, which relate to the technical field of computers and are applied to a cross-cluster synchronization tool of a distributed event stream platform, wherein the platform also comprises a source cluster, a target cluster, a database and a consumption group; the method comprises the steps of synchronizing current data to a target cluster, determining a mapping relation between an original source offset of the current data in the source cluster and an original target offset of the target cluster, determining a predicted target offset of the current data in the target cluster based on the mapping relation, transferring the predicted target offset to the target cluster, determining next data consumed by a consumption group to obtain new current data, and jumping to a step of synchronizing the current data to the target cluster until synchronization is stopped, so that when the consumption group is converted from the data of the consumption source cluster to the data of the consumption target cluster, continuing to consume according to a consumption progress before conversion based on the predicted target offset. The consumption accuracy can be improved.
Inventors
- ZHAN YANAN
- WANG GANG
- WANG XINGEN
- WANG XINYU
- WANG LEI
Assignees
- 浙江邦盛科技股份有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20241108
Claims (10)
- 1. A method of data consumption, characterized by a cross-cluster synchronization tool applied to a distributed event stream platform, the distributed event stream platform further comprising a source cluster, a target cluster, a database, and a consumption group, the method comprising: Determining current data in the source cluster consumed by the consumption group based on configuration information; Synchronizing the current data to the target cluster, and determining a mapping relation between an original source offset of the current data in the source cluster and an original target offset of the current data in the target cluster; Determining a predicted target offset of the current data in the target cluster based on the mapping relation to replace the original target offset, and transferring the predicted target offset to an internal theme of the target cluster; Determining the next data in the source cluster consumed by the consumption group based on the configuration information to obtain new current data, and jumping to the step of synchronizing the current data to the target cluster until synchronization is stopped, so that when the consumption group is converted from consuming the data in the source cluster to consuming the data in the target cluster, continuing to consume according to the consumption progress before conversion based on the predicted target offset.
- 2. The data consumption method according to claim 1, wherein the mapping determining a predicted target offset of the current data in the target cluster to replace the original target offset comprises: Constructing a linear relation diagram between the original source offset and the original target offset based on the mapping relation; Dividing the linear relation line segments based on the slopes of all the partial line segments of the linear relation line segments in the linear relation graph to obtain all the divided line segments with different slopes; And determining a predicted target offset corresponding to each original source offset based on the divided line segments to replace the original target offset corresponding to the original source offset.
- 3. The data consumption method according to claim 2, wherein the determining, based on the divided line segments, a predicted target offset corresponding to each of the original source offsets to replace the original target offset corresponding to the original source offset includes: Determining a target line segment comprising a single original source offset in the divided line segments; Judging whether dirty data is written into the target cluster in the data synchronization process corresponding to the target line segment; if the original source offset exists, the original target offset in the target line segment is taken as a predicted target offset corresponding to the original source offset; if the predicted target offset corresponding to the original source offset is not found, the predicted target offset corresponding to the original source offset is calculated based on an original target offset interval and an original source offset interval corresponding to the target line segment.
- 4. The method for consuming data according to claim 3, wherein the determining whether dirty data is written into the target cluster in the data synchronization process corresponding to the target line segment includes: judging whether a first value corresponding to the target line segment is equal to a second value or not, wherein the first value is the absolute value of the difference value between the lowest source offset and the highest source offset corresponding to the target line segment; if the data synchronization processes are equal, dirty data are not written into the target cluster in the data synchronization process corresponding to the target line segment; if the data synchronization processes are not equal, dirty data are written into the target cluster in the data synchronization process corresponding to the target line segment.
- 5. The data consumption method according to claim 3, wherein the calculating the predicted target offset corresponding to the original source offset based on the original target offset interval and the original source offset interval corresponding to the target line segment includes: Calculating a model according to a predicted target offset, and calculating the predicted target offset corresponding to the original source offset based on an original target offset interval and an original source offset interval corresponding to the target line segment; wherein, the prediction target offset calculation model is: ; Wherein, the Representing the predicted target offset amount of the object, Representing the lowest original target offset of the original target offset interval, Representing the amount of source offset to be processed, Representing the lowest original source offset of the original source offset interval, Representing the highest original target offset of the original target offset interval, Representing the highest original source offset of the original source offset interval.
- 6. The data consumption method according to claim 1, wherein after determining the mapping relationship between the original source offset of the current data in the source cluster and the original target offset of the current data in the target cluster, further comprising: Storing the mapping relation into a source-target offset mapping table; Correspondingly, after determining the predicted target offset of the current data in the target cluster based on the mapping relationship to replace the original target offset, the method further includes: And removing the mapping relation in the source-target offset mapping table.
- 7. The data consumption method according to claim 1, wherein the determining a mapping relationship between an original source offset of the current data in the source cluster and an original target offset of the current data in the target cluster comprises: setting a target mark for a mapping relation between an original source offset of the current data in the source cluster and an original target offset of the current data in the target cluster, wherein the target mark is obtained by adding 1 to a previous target mark; and determining the target label and the corresponding mapping relation.
- 8. A data consumption apparatus characterized by a cross-cluster synchronization tool applied to a distributed event stream platform, the distributed event stream platform further comprising a source cluster, a target cluster, a database, and a consumption group, the apparatus comprising: a confirmation module for determining current data in the source cluster consumed by the consumption group based on configuration information; The storage module is used for synchronizing the current data to the target cluster and determining the mapping relation between the original source offset of the current data in the source cluster and the original target offset of the current data in the target cluster; A transfer module, configured to determine a predicted target offset of the current data in the target cluster based on the mapping relationship to replace the original target offset, and transfer the predicted target offset to an internal topic of the target cluster; And the jump module is used for determining the next data in the source cluster consumed by the consumption group based on the configuration information to obtain new current data, and jumping to the step of synchronizing the current data to the target cluster until synchronization is stopped, so that when the consumption group is converted from consuming the data in the source cluster to consuming the data in the target cluster, the consumption is continued according to the consumption progress before conversion based on the predicted target offset.
- 9. An electronic device, comprising: A memory for storing a computer program; A processor for executing the computer program to implement the data consumption method of any one of claims 1 to 7.
- 10. A computer-readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements the data consumption method according to any one of claims 1 to 7.
Description
Data consumption method, device, equipment and medium Technical Field The present invention relates to the field of computer technologies, and in particular, to a data consumption method, apparatus, device, and medium. Background Currently, in a two-living, two-place, three-center construction scheme, the MirrorMaker (Kafka cross-cluster synchronization tool) technique of Kafka (an open-source distributed event stream platform) is commonly used for data synchronization in different places. However, if the kafka client consumption group is used in the downstream service, when the local kafka cluster fails, and the service is switched to the remote cluster to consume data, the consumption group cannot continue to execute according to the previous consumption progress because of no synchronization of the consumption group offset consumption information, so that the accuracy of data consumption is reduced or repeated consumption is caused. At present, the high version MirrorMaker provides a consumer group offset synchronization function, but the offset information of the consumer group needs to map historical synchronous data offset information of a local cluster and a different cluster in the synchronization process, and the current official network is that the mapping information is reserved in a topic (theme), so that a data deleting mode adopts a compression mode to ensure the performance, and after a certain time, the offset mapping information of the data among the clusters can be compressed and deleted. If the rate of the data image synchronization task is greater than the rate of consumer group consumption, the consumer offset mapping effort is stopped. Because only the latest offset mapping information of the local and the remote clusters is reserved in the topic, there is insufficient history information to map the offset of the consumption group history. In a real scene, the consumption progress of a consumption group is generally later than the progress of a data mirror synchronous task. Therefore, after the clusters are switched, the consumption groups cannot accurately complete continuous consumption of the data according to the original progress due to insufficient consumption offset mapping information, so that the consumption accuracy is reduced. In summary, how to improve the consumption accuracy is a current urgent problem to be solved. Disclosure of Invention In view of the above, the present invention aims to provide a data consumption method, device, apparatus and medium, capable of improving consumption accuracy, and the specific scheme is as follows: In a first aspect, the present application discloses a data consumption method, applied to a cross-cluster synchronization tool of a distributed event stream platform, where the distributed event stream platform further includes a source cluster, a target cluster, a database, and a consumption group, the method includes: Determining current data in the source cluster consumed by the consumption group based on configuration information; Synchronizing the current data to the target cluster, and determining a mapping relation between an original source offset of the current data in the source cluster and an original target offset of the current data in the target cluster; Determining a predicted target offset of the current data in the target cluster based on the mapping relation to replace the original target offset, and transferring the predicted target offset to an internal theme of the target cluster; Determining the next data in the source cluster consumed by the consumption group based on the configuration information to obtain new current data, and jumping to the step of synchronizing the current data to the target cluster until synchronization is stopped, so that when the consumption group is converted from consuming the data in the source cluster to consuming the data in the target cluster, continuing to consume according to the consumption progress before conversion based on the predicted target offset. Wherein the determining, based on the mapping relationship, a predicted target offset of the current data in the target cluster to replace the original target offset includes: Constructing a linear relation diagram between the original source offset and the original target offset based on the mapping relation; Dividing the linear relation line segments based on the slopes of all the partial line segments of the linear relation line segments in the linear relation graph to obtain all the divided line segments with different slopes; And determining a predicted target offset corresponding to each original source offset based on the divided line segments to replace the original target offset corresponding to the original source offset. The determining, based on the divided line segments, a predicted target offset corresponding to each original source offset to replace the original target offset corresponding to the original source offset i