CN-117255992-B - Efficient storage and querying of schema-less data
Abstract
A method (300) of storing semi-structured data (12U) includes receiving user data (12) including semi-structured user data from a user (10) of a query system (150). The method includes receiving an indication that the semi-structured user data fails to include a fixed pattern (14). In response, the method further includes parsing the semi-structured user data into a plurality of data paths (210) and extracting a data type (220) associated with each respective data path of the plurality of data paths. The method additionally includes storing the semi-structured user data as row entries in a table (204) of a database in communication with the query system, wherein each column value associated with a row entry corresponds to a respective one of the plurality of data paths and a data type associated with the respective data path.
Inventors
- Luis Alonso
- Vladislav Grachev
- Hussein Ahmadi
- Francis orchid
- Srinidi Lagawan
- Vinay Barasulbramaniam
- Oleksandr Breynuchenko
- Sinagash Susala
Assignees
- 谷歌有限责任公司
Dates
- Publication Date
- 20260508
- Application Date
- 20220428
- Priority Date
- 20210505
Claims (18)
- 1. A computer-implemented method, which when executed by data processing hardware, causes the data processing hardware to perform operations comprising: receiving user data from a user of a query system, the user data comprising semi-structured user data; receiving an indication that the semi-structured user data does not include a fixed pattern; Responsive to the indication that the semi-structured user data does not include the fixed pattern: Parsing the semi-structured user data into a plurality of data paths, and Extracting a data type associated with each respective data path of the plurality of data paths, and Storing the semi-structured user data as row entries in a table of a database in communication with the query system, Wherein each column value associated with the row entry corresponds to a respective one of the plurality of data paths and the data type associated with the respective data path, Wherein the user data further comprises structured user data having a corresponding fixed pattern, and The table of the database includes one or more row entries corresponding to the structured user data.
- 2. The method of claim 1, wherein the semi-structured user data comprises JavaScript object notation JSON.
- 3. The method of claim 2, wherein the respective column associated with the row entry comprises an explicit null.
- 4. The method of claim 2, wherein the respective column values of the respective columns associated with the row entries comprise a null array.
- 5. The method of claim 1, wherein storing the semi-structured user data as the row entry in the table of the database further comprises: Identifying a first data type and a second data type associated with a first data path among the plurality of data paths parsed from the semi-structured user data; Storing a first value of the semi-structured user data corresponding to the first data type of the first data path in a first row entry of a first column of the table, and Storing a second value of the semi-structured user data corresponding to the second data type of the first data path in a second row entry of a first column of the table, Wherein the second data type is a different data type than the first data type.
- 6. The method of claim 1, wherein the operations further comprise, during query run-time: receiving a query from the user of the query system for data associated with the stored semi-structured user data; determining a corresponding data path for stored semi-structured user data responsive to the query, and In response to the query, a query response is generated that includes respective column values of the row entries corresponding to the respective data paths of the stored semi-structured user data.
- 7. The method of claim 1, wherein the user data corresponds to log data of cloud computing resources associated with the user.
- 8. The method of claim 1, wherein each data path of the plurality of data paths corresponds to a key of a key-value pair of the semi-structured user data.
- 9. The method of any of claims 1-8, wherein the respective column values of the row entries comprise nested arrays.
- 10. A system, comprising: data processing hardware, and Memory hardware in communication with the data processing hardware, the memory hardware storing instructions that, when executed on the data processing hardware, cause the data processing hardware to perform operations comprising: receiving user data from a user of a query system, the user data comprising semi-structured user data; receiving an indication that the semi-structured user data does not include a fixed pattern; Responsive to the indication that the semi-structured user data does not include the fixed pattern: Parsing the semi-structured user data into a plurality of data paths, and Extracting a data type associated with each respective data path of the plurality of data paths, and Storing the semi-structured user data as row entries in a table of a database in communication with the query system, Wherein each column value associated with the row entry corresponds to a respective one of the plurality of data paths and the data type associated with the respective data path, Wherein the user data further comprises structured user data having a corresponding fixed pattern, and The table of the database includes one or more row entries corresponding to the structured user data.
- 11. The system of claim 10, wherein the semi-structured user data comprises JavaScript object notation JSON.
- 12. The system of claim 11, wherein the respective column associated with the row entry comprises an explicit null.
- 13. The system of claim 11, wherein the respective column values of the respective columns associated with the row entries comprise a null array.
- 14. The system of claim 10, wherein storing the semi-structured user data as the row entry in the table of the database further comprises: Identifying a first data type and a second data type associated with a first data path among the plurality of data paths parsed from the semi-structured user data; Storing a first value of the semi-structured user data corresponding to the first data type of the first data path in a first row entry of a first column of the table, and Storing a second value of the semi-structured user data corresponding to the second data type of the first data path in a second row entry of a first column of the table, Wherein the second data type is a different data type than the first data type.
- 15. The system of claim 10, wherein the operations further comprise, during query run-time: receiving a query from the user of the query system for data associated with the stored semi-structured user data; determining a corresponding data path for stored semi-structured user data responsive to the query, and In response to the query, a query response is generated that includes respective column values of the row entries corresponding to the respective data paths of the stored semi-structured user data.
- 16. The system of claim 10, wherein the user data corresponds to log data of cloud computing resources associated with the user.
- 17. The system of claim 10, wherein each data path of the plurality of data paths corresponds to a key of a key-value pair of the semi-structured user data.
- 18. The system of any of claims 10-17, wherein the respective column values of the row entries comprise nested arrays.
Description
Efficient storage and querying of schema-less data Technical Field The present disclosure relates to efficient storage and querying of schema-less data. Background As today's applications generate large amounts of data, query systems and other analysis tools continue to evolve to support data analysis. Furthermore, if a user is unable to perform analysis of his or her data in an efficient and/or cost-effective manner, the value of generating large amounts of user data may be significantly reduced. To ensure that a user is able to analyze his or her data, the query processing system has begun to operate in conjunction with the data storage system. Through collaboration, user data may be stored in a storage structure that facilitates analysis operations, such as queries or other analyses. Unfortunately, data storage structures that facilitate these operations are often limited in their ability to support semi-structured or schema-less data. Disclosure of Invention One aspect of the present disclosure provides a computer-implemented method of storing semi-structured data. The method, when executed by data processing hardware, causes the data processing hardware to perform operations. The operations include receiving user data from a user of a query system, wherein the user data includes unstructured user data. The operations also include receiving an indication that unstructured user data fails to include a fixed pattern. In response to the unstructured user data failing to include an indication of a fixed pattern, the operations further include parsing the unstructured user data into a plurality of data paths and extracting a data type associated with each respective data path of the plurality of data paths. The operations additionally include storing the unstructured user data as row entries in a table of a database in communication with the query system, wherein each column value associated with a row entry corresponds to a respective one of the plurality of data paths and a data type associated with the respective data path. Another aspect of the present disclosure provides a system capable of storing unstructured data. The system includes data processing hardware and memory hardware in communication with the data processing hardware. The memory hardware stores instructions that, when executed on the data processing hardware, cause the data processing hardware to perform operations. The operations include receiving user data from a user of a query system, wherein the user data includes unstructured user data. The operations also include receiving an indication that unstructured user data fails to include a fixed pattern. In response to the unstructured user data failing to include an indication of a fixed pattern, the operations further include parsing the unstructured user data into a plurality of data paths and extracting a data type associated with each respective data path of the plurality of data paths. The operations additionally include storing the unstructured user data as row entries in a table of a database in communication with the query system, wherein each column value associated with a row entry corresponds to a respective one of the plurality of data paths and a data type associated with the respective data path. Embodiments of the methods or systems of the present disclosure may include one or more of the following optional features. In some implementations, the user data further includes structured user data having respective fixed patterns, and the table of the database includes one or more row entries corresponding to the structured user data. In some examples, the unstructured user data includes JavaScript object notation (JSON). In these examples, the respective column associated with the row entry may include an explicit null. Also in these examples, the respective column values of the respective columns associated with the row entries may include a null array. In some configurations, storing unstructured user data as row entries in a table of a database further includes identifying a first data type and a second data type associated with a first data path among a plurality of data paths parsed from the unstructured data. In these configurations, storing unstructured user data as row entries in a table of the database further includes storing a first value of unstructured user data of a first data type corresponding to the first data path in a first row entry of a first column of the table and storing a second value of unstructured user data of a second data type corresponding to the first data path in a second row entry of the first column of the table. Here, the second data type is a different data type from the first data type. In some examples, operations for the method or system further include, during query runtime, receiving a query from a user of the query system for data associated with the stored unstructured data, determining a respective data path of the stored unstructure