EP-3792779-B1 - MANAGING DATASET EDITS
Inventors
- JIANG, JIAHUI
- ELLIOT, MARK
- Souza, Samuel
- Dalgleish, Alexander
- GOENKA, AAKASH
- Gupta, Vidit
- De Holanda, Diogo
- BAKER, JAMES
- INOUE, JIM
- DUFFIELD, BENJAMIN
Dates
- Publication Date
- 20260513
- Application Date
- 20191119
Claims (11)
- A method, performed by one or more processors, comprising: receiving (11.1), from a first user, a request to create a staging edit to a particular data object stored in a database to test the edit against one or more transformations of a processing pipeline; responsive to receiving the request, determining an available part of a memory space and reserving it for the first user, creating a user staging version of the particular data object, being a new version of the particular data object including the edit to change one or more data elements of the new version of the particular data object without changing the particular data object stored in the database and generating metadata indicating the changed one or more data elements; storing (11.3) the user staging version in the memory space reserved for the first user; indexing (11.4) the user staging version in an index for enabling user searching and retrieval of the user staging version responsive to the first user requesting the particular data object; receiving, from the first user or another user, a base edit, the base edit being an edit to be applied directly to one or more data elements of the particular data object stored in the database; updating the one or more data elements of the particular data object stored in the database with the base edit; wherein, by use of the metadata indicating the changed one or more data elements of the user staging version: if the base edit is for editing one or more data elements of the particular data object that was or were edited by the staging edit, the method comprises not updating the user staging version with the base edit, and if the base edit is for editing one or more data elements of the particular data object that was not or were not edited by the staging edit, the method comprises updating the user staging version with the base edit; executing one or more data transforms on the user staging version and producing staging output resulting from the execution which does not cause modification of the particular data object stored in the database; and storing the produced staging output in the memory space reserved for the first user.
- The method of claim 1, wherein indexing the user staging version comprises adding a document to an index already associated with the particular data object.
- The method of any preceding claim, further comprising: maintaining first, second and third queues for the particular data object, each queue comprising a sequence of slots, wherein received base edits and staging edits are respectively entered into the first and second queues in slots, the first queue maintaining a global view of a time-ordered sequence of edits made to the particular data object stored in the database, the second queue maintaining a user-specific view of staging edits offset in the second queue based on a number of prior base edits on the particular data object, wherein the third queue comprises a merged version of the first and second queues which is user-specific to the first user; and wherein the index is based on the third queue.
- The method of claim 3, wherein the third queue gives priority for staging edits in the second queue over base edits in the first queue in the corresponding slot, a said base edit in the corresponding slot being entered into the next slot of the third queue.
- The method of any preceding claim, further comprising: receiving a search request for the particular data object from the first user; determining from the index if there are any staging versions of the particular data object for the first user; and responsive to a positive determination, returning search results which include one or more staging versions of the particular data object for the first user.
- The method of claim 5, wherein responsive to a negative determination, the method comprises returning the particular data object, or a search result which includes the particular data object.
- The method of claim 5 or claim 6, further comprising: receiving a search request for the particular data object from a second user; and determining from the index if there are any staging versions of the particular data object for the second user, ignoring any staging versions for the first user; and responsive to a positive determination, returning search results which include one or more staging versions of the particular data object for the second user.
- The method of claim 7, wherein responsive to a negative determination, returning the particular data object, or a search result which includes the particular data object.
- The method of any preceding claim, further comprising generating metadata for the particular data object and its one or more staging versions including an identifier field, wherein the one or more staging versions comprise an identifier indicative of a staging version.
- A computer program, optionally stored on a non-transitory computer readable medium program which, when executed by one or more processors of a data processing apparatus, causes the data processing apparatus to carry out a method according to any preceding claim.
- Apparatus configured to carry out a method according to any of claims 1 to 9, the apparatus comprising one or more processors or special-purpose computing hardware.
Description
Field of the disclosure The present disclosure relates to methods and systems for managing dataset edits in relation to datasets in a database, which may include resolution of editing conflicts. Example embodiments may also relate to the indexing of datasets including datasets visible to multiple users of the database and also one or more staging versions of datasets visible to one or a subset of users. Background Cloud computing is a computing infrastructure for enabling ubiquitous access to shared pools of servers, storage, computer networks, applications and other data resources, which can be rapidly provisioned, often over a network, such as the Internet. For example, a "data resource" as used herein may include any item of data or code (e.g., a data object representing an entity) that can be used by one or more computer programs. In example embodiments, data resources may be stored in one or more network databases and are capable of being accessed by applications hosted by servers that share common access to the network database. A data resource may, for example, be a data analysis application, a data transformation application, a report generating application, a machine learning process, a spreadsheet or a database, or part of a spreadsheet or part of a database, e.g. records or data objects. Some companies provide cloud computing services for registered organizations, for example, organizations such as service providers, to create, store, manage and execute their own resources via a network. Users within the organization's domain, and other users outside of the customer's domain, e.g., support administrators of the provider company, may perform one or more actions on one or more data resources, which database actions may vary from reading, authoring, editing, transforming, merging, or executing. Sometimes, these resources may interact with other resources, for example, those provided by the cloud platform provider. Certain data resources may be used to control external systems. In the context of editing datasets in databases, some database management systems (DMSs) require that the relevant dataset be retrieved, edited and then written back before another user can edit that dataset. This can be resource expensive and time consuming if the size or number of datasets is large. Other DMSs may allow users to directly edit datasets in the database, not requiring the above stages, but this can lead to problems if the same dataset is being edited by two users at the same time and/or if one of the users introduces an edit that adversely affects other processes, e.g. the operation of a technical process, manufacturing task or security system that is dependent on the data being edited. US2007/0124317 discloses a how to allow results of a query to an operational datastore to be augmented with relevant data stored in a staging area. Summary The invention is set forth in the claims. A first aspect provides a method according to claim 1 of the appended claims. A second aspect provides a computer program according to claim 10 of the appended claims. A third aspect provides an apparatus according to claim 11 of the appended claims. Brief Description of the Drawings: Example embodiments will now be described by way of non-limiting example with reference to the accompanying drawings, in which: FIG. 1 is a block diagram illustrating a network system comprising a group of application servers of a data processing platform according to some embodiments of this specification;FIG. 2 is a block diagram of a computer system according to embodiments of this specification;FIG. 3 is a representational view of part of a database, comprising a dataset;FIG. 4 is block diagram of functional elements of part of the FIG. 1 network system, including a database application according to example embodiments;FIG. 5 is a schematic diagram of a data object and a plurality of example edits that may be made to the data object through the database application according to example embodiments;FIG. 6 is a schematic diagram of a tree structure, indicative of how the FIG. 5 example edits may be managed and stored by the database application according to example embodiments;FIG. 7 is a schematic view of how properties of base and workstate versions on the data object may change, responsive to the FIG. 5 edits;FIG. 8 is a schematic view representing the status of the data object and workstate subsequent to edits mentioned with regard to FIG. 7;FIG. 9 is a block diagram showing functional elements of the database application according to example embodiments;FIG. 10 is a schematic view of queues employed by the database application according to example embodiments; andFIG. 11 is a flow diagram indicating processing operations performed by the database application according to example embodiments. Detailed Description of Certain Embodiments Embodiments herein relate to methods and systems for managing dataset edits in relation to datasets in a database.