US-12619403-B2 - Regular expression processor
Abstract
Apparatuses, systems, and techniques to process regular expressions. In at least one embodiment, a regular expression that includes at least one zero-length assert is refactored and processed as an equivalent plurality of regular expressions that do not contain zero-length asserts.
Inventors
- John Hurley
- Xiaolin Cao
- Rafiullah Khan
Assignees
- NVIDIA CORPORATION
Dates
- Publication Date
- 20260505
- Application Date
- 20220411
Claims (20)
- 1 . A non-transitory machine-readable medium having stored thereon a set of instructions, which if performed by one or more processors, cause the one or more processors to at least: obtain source code describing a first regular expression with a zero-length assert; determine a second regular expression that represents a condition associated with the zero-length assert; determine a third regular expression based at least in part on the first regular expression without the zero-length assert; generate processor-executable instructions to evaluate the second regular expression and the third regular expression, wherein the processor-executable instructions comprise an order of execution to perform the second regular expression and the third regular expression to generate a result corresponding to the first regular expression; modify the source code to replace the first regular expression with the zero-length assert with the instructions generated to evaluate the second regular expression and the third regular expression; and compile the modified source code without the first regular expression with the zero-length assert.
- 2 . The non-transitory machine-readable medium of claim 1 , wherein: the zero-length assert is a look-behind assert; and the third regular expression includes a portion of a match pattern of the zero-length assert followed by a portion of the first regular expression that follows the zero-length assert.
- 3 . The non-transitory machine-readable medium of claim 1 , wherein: the zero-length assert is a look-ahead assert; and the third regular expression includes a portion of the first regular expression that occurs before the zero-length assert followed by a portion of a match pattern of the zero-length assert.
- 4 . The non-transitory machine-readable medium of claim 1 , wherein the order of execution indicated by the processor-executable instructions to evaluate the second regular expression and the third regular expression comprises logic that causes the second regular expression and the third regular expression to be evaluated in parallel.
- 5 . The non-transitory machine-readable medium of claim 1 , wherein the processor-executable instructions to evaluate the second regular expression and the third regular expression allow the second regular expression and the third regular expression to be evaluated for a character sequence without retrieving the character sequence more than once.
- 6 . The non-transitory machine-readable medium of claim 1 , wherein the order of execution indicated by the processor-executable instructions further comprises a set of Boolean logic usable to combine a first result produced by evaluating the second regular expression and a second result produced by evaluating the third regular expression.
- 7 . The non-transitory machine-readable medium of claim 1 , wherein the source code is text data stored in a file on computer-readable media.
- 8 . The non-transitory machine-readable medium of claim 1 , wherein: the first regular expression includes a plurality of zero-length asserts; and the set of instructions further cause the one or more processors to determine and generate one or more fourth regular expressions that does do not include an assert for each zero-length assert of the plurality of asserts, wherein the processor-executable instructions further comprise additional logic indicating an order of execution to perform the one or more fourth regular expressions and generate the result corresponding to the first regular expression.
- 9 . A computer-implemented method comprising: obtaining source code describing a first regular expression with a zero-length assert; obtaining a first result of evaluating a second regular expression and a second result of evaluating a third regular expression, the second regular expression and the third regular expression generated from a first regular expression that includes a zero-length assert; combining the first result and the second result to produce a third result for the third regular expression, the combination based at least in part on processor-executable instructions that indicate an order of execution to perform the second regular expression and the third regular expression to generate a result corresponding to the first regular expression that represents a value associated with the zero-length assert; and modifying the source code to replace the first regular expression that includes the zero-length assert with the instructions to obtain the first result and the second result and to combine the first result and result to produce the third result; and compiling the modified source code without the first regular expression that includes the zero-length assert.
- 10 . The computer-implemented method of claim 9 , wherein: the zero-length assert is a look-behind assert; and the third regular expression includes a portion of a match pattern of the zero-length assert followed by a portion of the first regular expression that follows the zero-length assert.
- 11 . The computer-implemented method of claim 9 , wherein: the zero-length assert is a look-ahead assert; and the third regular expression includes a portion of the first regular expression that occurs before the zero-length assert followed by a portion of a match pattern of the zero-length assert.
- 12 . The computer-implemented method of claim 9 , wherein the processor-executable instructions indicate the order of execution comprises the first regular expression and the second regular expression being evaluated in parallel.
- 13 . The computer-implemented method of claim 9 , wherein the processor-executable instructions indicate the third regular expression and the second regular expression are evaluated for a character sequence without reading the character sequence more than once.
- 14 . The computer-implemented method of claim 9 , wherein the first result and the second result are binary values that are combined using logic indicated by the processor-executable instructions to produce the third result.
- 15 . The computer-implemented method of claim 9 , wherein the second regular expression and the third regular expression are evaluated using source data obtained over a computer network.
- 16 . The computer-implemented method of claim 9 , further comprising evaluating one or more fourth regular expressions for each zero-length assert in the first regular expression.
- 17 . A system comprising one or more circuits to: obtain source code describing a first regular expression with a zero-length assert; identify a second regular expression that represents a condition associated with the zero-length assert; identify a third regular expression based at least in part on the first regular expression without the zero-length assert; evaluate the second regular expression to produce a first result; at least partly in parallel with the evaluation of the second regular expression, evaluate the third regular expression to produce a second result; and use the first result and the second result to determine a value of the first regular expression, according to processor-executable instructions that indicate an order of execution to perform the second regular expression and the third regular expression and combine the first result and the second result to generate the value corresponding to the first regular expression; modify the source code to replace the first regular expression with the zero-length assert with the instructions to obtain the first result and the second result and to combine the first result and result to produce the third result; and compile the modified source code without the first regular expression with the zero-length assert.
- 18 . The system of claim 17 , wherein the system provides the first result and the second result in association with an identifier of the first regular expression.
- 19 . The system of claim 17 , wherein: the zero-length assert is a look-behind assert; and the third regular expression includes a portion of a match pattern of the zero-length assert followed by a portion of the third regular expression that follows the zero-length assert.
- 20 . The system of claim 17 , wherein: the zero-length assert is a look-ahead assert; and the third regular expression includes a portion of the first regular expression that occurs before the zero-length assert followed by a portion of a match pattern of the zero-length assert.
Description
FIELD At least one embodiment pertains to processing resources used to evaluate a regular expression. For example, at least one embodiment pertains to processors or computing systems used to evaluate a regular expression that includes one or more zero-length assertions. BACKGROUND Regular expressions are a useful tool used by programmers, system administrators, and Information Technology professionals to configure devices and process data. Some applications process large amounts of data with complex expressions, requiring substantial computing resources. In response, various hardware acceleration techniques have been created to speed the processing of regular expressions. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates an example of a regular expression with a zero-length assert element, in accordance with at least one embodiment; FIG. 2 illustrates an example of a system that processes a regular expression, in accordance with at least one embodiment; FIG. 3 illustrates an example of a system that processes a regular expression with one or more zero-length asserts, in accordance with at least one embodiment; FIG. 4 illustrates an example of generating a number of regular expressions from a regular expression with two zero-length assert elements, in accordance with at least one embodiment; FIG. 5 illustrates an example of a process that, as a result of being performed by a computer system, evaluates a regular expression that includes one or more zero-length assertions, in accordance with at least one embodiment; FIG. 6 illustrates an exemplary data center, in accordance with at least one embodiment; FIG. 7 illustrates a processing system, in accordance with at least one embodiment; FIG. 8 illustrates a computer system, in accordance with at least one embodiment; FIG. 9 illustrates a system, in accordance with at least one embodiment; FIG. 10 illustrates an exemplary integrated circuit, in accordance with at least one embodiment; FIG. 11 illustrates a computing system, according to at least one embodiment; FIG. 12 illustrates an APU, in accordance with at least one embodiment; FIG. 13 illustrates a CPU, in accordance with at least one embodiment; FIG. 14 illustrates an exemplary accelerator integration slice, in accordance with at least one embodiment; FIGS. 15A-15B illustrate exemplary graphics processors, in accordance with at least one embodiment; FIG. 16A illustrates a graphics core, in accordance with at least one embodiment; FIG. 16B illustrates a GPGPU, in accordance with at least one embodiment; FIG. 17A illustrates a parallel processor, in accordance with at least one embodiment; FIG. 17B illustrates a processing cluster, in accordance with at least one embodiment; FIG. 17C illustrates a graphics multiprocessor, in accordance with at least one embodiment; FIG. 18 illustrates a graphics processor, in accordance with at least one embodiment; FIG. 19 illustrates a processor, in accordance with at least one embodiment; FIG. 20 illustrates a processor, in accordance with at least one embodiment; FIG. 21 illustrates a graphics processor core, in accordance with at least one embodiment; FIG. 22 illustrates a PPU, in accordance with at least one embodiment; FIG. 23 illustrates a GPC, in accordance with at least one embodiment; FIG. 24 illustrates a streaming multiprocessor, in accordance with at least one embodiment; FIG. 25 illustrates a software stack of a programming platform, in accordance with at least one embodiment; FIG. 26 illustrates a CUDA implementation of a software stack of FIG. 25, in accordance with at least one embodiment; FIG. 27 illustrates a ROCm implementation of a software stack of FIG. 25, in accordance with at least one embodiment; FIG. 28 illustrates an OpenCL implementation of a software stack of FIG. 25, in accordance with at least one embodiment; FIG. 29 illustrates software that is supported by a programming platform, in accordance with at least one embodiment; FIG. 30 illustrates compiling code to execute on programming platforms of FIGS. 25-28, in accordance with at least one embodiment; FIG. 31 illustrates in greater detail compiling code to execute on programming platforms of FIGS. 25-28, in accordance with at least one embodiment; FIG. 32 illustrates translating source code prior to compiling source code, in accordance with at least one embodiment; FIG. 33A illustrates a system configured to compile and execute CUDA source code using different types of processing units, in accordance with at least one embodiment; FIG. 33B illustrates a system configured to compile and execute CUDA source code of FIG. 33A using a CPU and a CUDA-enabled GPU, in accordance with at least one embodiment; FIG. 33C illustrates a system configured to compile and execute CUDA source code of FIG. 33A using a CPU and a non-CUDA-enabled GPU, in accordance with at least one embodiment; FIG. 34 illustrates an exemplary kernel translated by CUDA-to-HIP translation tool of FIG. 33C, in accordance with a