US-20260127187-A1 - TIME EFFICIENT DECODING OF SERIES-VARIANT DATA SEQUENCE
Abstract
A time efficient, data sequence decode process is provided. The process includes decoding, by a decoder, a series-variant data sequence, where the decoding includes storing, by the decoder in intermediate storage, received timestep values of the series-variant data sequence, and lookahead data processing, by the decoder, a specified number of future timestep values of the series-variant data sequence from a particular timestep. The lookahead data processing for the particular timestep proceeds based on the specified number of future timestep values from the particular timestep being stored by the decoder in the intermediate storage.
Inventors
- William Andrew Simon
- Irem Boybat Kara
- Elena Ferro
- Riselda KODRA
Assignees
- INTERNATIONAL BUSINESS MACHINES CORPORATION
Dates
- Publication Date
- 20260507
- Application Date
- 20241106
Claims (20)
- 1 . A method of enhancing data processing, the method comprising: decoding, by a decoder, a series-variant data sequence, the decoding comprising: storing, by the decoder in intermediate storage, received timestep values of the series-variant data sequence; and lookahead data processing, by the decoder, a specified number of future timestep values of the series-variant data sequence from a particular timestep, wherein the lookahead data processing for the particular timestep proceeds based on the specified number of future timestep values from the particular timestep being stored by the decoder in the intermediate storage.
- 2 . The method of claim 1 , wherein the specified number of future timestep values of the series-variant data sequence being processed by the lookahead data processing of the decoder is less than all future timestep values of the series-variant data sequence from the particular timestep.
- 3 . The method of claim 1 , wherein the decoding, by the decoder, further comprises lookbehind data processing prior timestep values from the particular timestep to produce a lookbehind data processing result for the particular timestep, and storing, by the decoder in the intermediate storage, the lookbehind data processing result.
- 4 . The method of claim 3 , wherein the decoding further comprises outputting, by the decoder, a stream of discrete output values based, in part, on the lookbehind data processing result and a lookahead data processing result produced by the lookahead data processing, wherein a time-to-first output from the decoder for the series-variant data sequence is dependent on a size of the specified number of future timestep values to be processed by the lookahead data processing of the decoder.
- 5 . The method of claim 1 , wherein a size of the intermediate storage is dependent on a size of the specified number of future timestep values to be processed by the lookahead data processing of the decoder.
- 6 . The method of claim 1 , wherein timestep values of the series-variant data sequence are obtained by the decoder sequentially, and wherein the storing comprises storing the timestep values sequentially in the intermediate storage for access by the lookahead data processing.
- 7 . The method of claim 6 , wherein the lookahead data processing comprises lookahead data processing the specified number of future timestep values from the particular timestep in descending timestep order.
- 8 . The method of claim 1 , wherein the specified number of future timestep values being processed by the lookahead data processing is a number specified for a particular decode process implemented in the decoding of the series-variant data sequence.
- 9 . The method of claim 8 , wherein the particular decode process comprises a Conditional Random Fields (CRF) process.
- 10 . A computer program product comprising: a set of one or more computing readable storage media; and program instructions, collectively stored in the set of one or more storage media, for causing at least one processor to perform operations comprising: decoding, by a decoder, a series-variant data sequence, the decoding comprising: storing, by the decoder in intermediate storage, timestep values of the series-variant data sequence; and lookahead data processing, by the decoder, a specified number of future timestep values of the series-variant data sequence from a particular timestep, wherein the lookahead data processing for the particular timestep proceeds based on the specified number of future timestep values from the particular timestep being stored by the decoder in the intermediate storage.
- 11 . The computer program product of claim 10 , wherein the specified number of future timestep values of the series-variant data sequence being processed by the lookahead data processing of the decoder is less than all future timestep values of the series-variant data sequence from the particular timestep.
- 12 . The computer program product of claim 10 , wherein the decoding, by the decoder, further comprises lookbehind data processing prior timestep values from the particular timestep to produce a lookbehind data processing result, and storing, by the decoder in the intermediate storage, the lookbehind data processing result.
- 13 . The computer program product of claim 11 , wherein the decoding further comprises outputting, by the decoder, a stream of discrete output values based, in part, on the lookbehind data processing result and a lookahead data processing result produced by the lookahead data processing, wherein a time-to-first output from the decoder based on the series-variant data sequence is dependent on a size of the specified number of future timestep values to be processed by the lookahead data processing of the decoder.
- 14 . The computer program product of claim 10 , wherein the size of the intermediate storage is dependent on a size of the specified number of future timestep values to be processed by the lookahead data processing of the decoder.
- 15 . The computer program product of claim 10 , wherein timestep values of the series-variant data sequence are obtained by the decoder sequentially, and wherein the storing comprises storing the timestep values sequentially in the intermediate storage for access by the lookahead data processing.
- 16 . The computer program product of claim 14 , wherein the lookahead data processing comprises lookahead data processing the specified number of future timestep values from the particular timestep in descending timestep order.
- 17 . A system comprising: at least one processor set; a set of one or more computing-readable storage media; and program instructions, collectively stored in the set of one or more storage media, for causing the at least one processor set to perform operations comprising: decoding, by a decoder, a series-variant data sequence, the decoding comprising: storing, by the decoder in intermediate storage, received timestep values of the series-variant data sequence; and lookahead data processing, by the decoder, a specified number of future timestep values of the series-variant data sequence from a particular timestep, wherein the lookahead data processing for the particular timestep proceeds based on the specified number of future timestep values from the particular timestep being stored by the decoder in the intermediate storage.
- 18 . The system of claim 17 , wherein the specified number of future timestep values of the series-variant data sequence being processed by the lookahead data processing of the decoder is less than all future timestep values of the series-variant data sequence from the particular timestep. the decoding, by the decoder, further comprises: lookbehind data processing prior timestep values from the particular timestep to produce a lookbehind data processing result for the particular timestep, and storing, by the decoder in the intermediate storage, the lookbehind data processing result; and outputting, by the decoder, a stream of discrete output values based, in part, on the lookbehind data processing result and a lookahead data processing result produced by the data lookahead data processing, wherein a time-to-first output from the decoder for the series-variant data sequence is dependent on a size of the specified number of the subset of future timestep values to be processed by the lookahead data processing of the decoder.
- 19 . The system of claim 17 , wherein a size of the intermediate storage is dependent on a size of the specified number of future timestep values to be processed by the lookahead data processing of the decoder.
- 20 . The system of claim 17 , wherein: timestep values of the series-variant datastream sequence are obtained by the decoder sequentially; the storing comprises storing the timestep values sequentially in the intermediate storage for access by the lookahead data processing; and the lookahead data processing comprises lookahead data processing the specified number of future timestep values from the particular timestep in descending timestep order.
Description
BACKGROUND One or more aspects relate, in general, to facilitating data processing, and more particularly, to improving decode processing of series-variant data, such as series-variant datastream sequences. Series-variant signals or datastreams, occur in a variety of data processing applications today, such as in the domain of natural language processing (e.g., processing of audio data, handwriting data, etc.), as well as medical processing (e.g., genome sequencing, electrocardiogram processing, etc.). As part of the data processing, the series-variant signal is often converted, or decoded, into discrete units. Processing of a series-variant datastream can take various forms, such as a Conditional Random Fields (CRF) process, which in embodiments can be paired with Connectionists Temporal Classification (CTC) to predict, for instance, when a data sequence transitions between characters or remains on a current character. CRFs are often applied to position of speech tagging and named-entity recognition, while CTC is useful in domains involving transforming a time variant or analog signal to a series of classifications, such as audio or handwriting classification, or genome sequencing. CRF-CTC decoders have recently been applied to genome sequencing to good effect. In practice, Conditional Random Fields (CRF) decoding with higher state lengths improve the decoder's ability to differentiate between states, particularly when a single timestep can be composed in multiple tokens (e.g., characters), and Connectionist Temporal Classification (CTC) processing facilitates the decoder's ability to differentiate between successive timesteps that refer to the same token. SUMMARY Certain shortcomings of the prior art are overcome, and additional advantages are provided herein through the provision of a method of enhancing data processing. The method includes decoding, by a decoder, a series-variant data sequence. The decoding includes storing, by the decoder in intermediate storage, received timestep values of the series-variant data sequence. Further, the decoding includes lookahead data processing, by the decoder, a specified number of future timestep values of the series-variant data sequence from a particular timestep, where the lookahead data processing for the particular timestep proceeds based on the subset of future timestep values from the particular timestep being stored by the decoder in the intermediate storage. Computer program products and systems relating to one or more aspects are also described and claimed herein. Further, services relating to one or more aspects are also described and may be claimed herein. Additional features and advantages are realized through the techniques described herein. Other embodiments and aspects are described in detail herein and are considered a part of the disclosed inventive aspects. BRIEF DESCRIPTION OF THE DRAWINGS One or more aspects are particularly pointed out and distinctly claimed as examples in the claims at the conclusion of the specification. The foregoing and objects, features, and advantages of one or more aspects are apparent from the following detailed description taken in conjunction with the accompanying drawings in which: FIG. 1 depicts one example of a computing environment to include and/or use one or more aspects of the present disclosure; FIG. 2 depicts one embodiment of a program product with decode code, in accordance with one or more aspects of the present disclosure; FIG. 3 depicts one embodiment of a decode process, in accordance with one or more aspects of the present disclosure; FIG. 4 depicts one example of a portion of a series-variant data sequence to be decoded into discrete text, in accordance with one or more aspects of the present disclosure; FIG. 5 depicts one embodiment of a Conditional Random Fields (CRF) process, a variation of which can be implemented by decode code or a decode process, in accordance with one or more aspects of the present disclosure; FIG. 6 depicts a schematic of one embodiment of a decoder, in accordance with one or more aspects of the present disclosure; FIG. 7 depicts a simplified state transition step as a potential input timestep value to a decoder, such as the decoder of FIG. 6, in accordance with one or more aspects of the present disclosure; FIG. 8A is a block diagram of a lookbehind data process of a decoder, such as the decoder of FIG. 6, in accordance with one or more aspects of present disclosure; FIG. 8B depicts further details of one embodiment of the lookbehind data process of FIG. 8A, in accordance with one or more aspects of the present disclosure; FIG. 9A depicts a block diagram of one embodiment of intermediate storage of a decoder, such as the decoder of FIG. 6, in accordance with one or more aspects of the present disclosure; FIG. 9B depicts further details of one embodiment of intermediate storage of a decoder with (in part) received input timestep values for forwarding to (or retrieval by) a l