CN-122021911-A - Manifold geometry-based large language model semantic data processing method, manifold geometry-based large language model semantic data processing device, storage medium and application
Abstract
The invention provides a manifold geometry-based large language model semantic data processing method, a manifold geometry-based large language model semantic data processing device, a storage medium and application. The method comprises the steps of receiving discrete symbol signals of an input text sequence through a processor, improving real number characteristics to manifold space characteristic signals through low-rank projection, constructing evolution operators in manifold space through combined action of semantic depth parameters and direction parameters, executing nonlinear space transformation comprising radial scaling and angular rotation on the characteristic signals, introducing a numerical stability protection mechanism in evolution calculation to inhibit floating point overflow and zero removal abnormality near singular points, and finally fusing the transformed characteristics back to a data stream of a backbone neural network in a residual superposition mode. The invention also constructs the position coding data stream of the interlayer recursion to relieve the time sequence dependence deadlock of the same-layer calculation. By utilizing the exponential capacity characteristic of the non-Euclidean space, the accuracy of model processing level logic and long-chain reasoning is obviously improved while the memory overhead is reduced.
Inventors
- JI YANDA
- ZHANG XIXIN
- GENG HAO
- YANG HAO
Assignees
- 南京航空航天大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260203
Claims (15)
- 1. A manifold geometry-based large language model semantic data processing method, which is used for brokering any existing large language model, is characterized by comprising the following steps: a processor receives a discrete symbol signal of an input text sequence and converts the discrete symbol signal into a real number eigenvector of Euclidean space; calling a tensor transformation module to map the real number feature vector projection into a feature signal on a manifold space; a hierarchical prediction step, namely generating corresponding semantic depth parameters based on the current real number feature vector by using a hierarchical prediction network; The processor executes manifold-based linear or nonlinear transformation logic, builds an evolution operator by utilizing the combined action of the semantic depth parameters and the learnable direction parameters, and executes space transformation comprising radial scaling and angular rotation on the characteristic signals to generate an evolution matrix; Performing numerical stability protection processing on index mapping or hyperbolic function calculation of an evolution operator in a neighborhood with singular points or modulo lengths smaller than a preset threshold value so as to inhibit zero removal abnormality or floating point overflow; And projection fusion, namely mapping the evolved characteristic signals back to the characteristic vectors of the real space, merging the characteristic vectors into the data flow of the trunk neural network of the large language model in a residual superposition mode, and generating target characteristic data containing hierarchical geometric constraint.
- 2. The manifold geometry-based large language model semantic data processing method according to claim 1, wherein the generating process of the evolution matrix in the geometry evolution step comprises: The processor calculates the modular length of the complex angle parameters formed by the rotation operator and the acceleration operator; Judging whether the module length is smaller than a preset machine precision threshold value or not; If the value is smaller than the threshold value, calling a preset taylor expansion approximate logic or a unitized logic to replace division operation of the hyperbolic sine function so as to inhibit zero-dividing overflow of floating point number operation; And if the evolution matrix is not smaller than the threshold value, performing standard exponential mapping operation to generate the evolution matrix.
- 3. The manifold geometry-based large language model semantic data processing method according to claim 1, further comprising a hierarchical aware recursive position coding processing step, specifically comprising: Before the N layer of the neural network executes the attention mechanism calculation, reading posterior semantic depth parameters output by the N-1 layer; Taking the posterior semantic depth parameter of the N-1 layer as an priori level parameter of the N layer; Constructing a hyperbolic position coding matrix by using the prior level parameters, and executing coordinate transformation on the query vector and the key vector of the N layer; Thereby, a recursively updated data stream of semantic hierarchy information is formed in the hierarchy depth direction of the neural network, and the calculation timing deadlock in the same layer is relieved.
- 4. A manifold geometry based large language model semantic data processing method according to claim 1 or 3, wherein the generation process of the semantic depth parameters is regulated by a hyperbolic implication geometry topology constraint mechanism: In the model training stage, calculating a hyperbolic distance difference value between two semantic objects with logic implication relations; The semantic depth parameter modular length of the contained object is forcefully constrained to be larger than that of the contained object through the loss function, so that the characteristic signals of the abstract concept are converged to the origin in the geometric space.
- 5. The manifold geometry-based large language model semantic data processing method according to claim 1, wherein the manifold used in the geometry evolution step is a special linear group manifold or an equivalent representation form thereof in a complex domain, the characteristic signal can be represented as a complex vector, and the spatial transformation comprises a radial scaling transformation and an angular rotation transformation.
- 6. The method of claim 1, wherein the geometric evolution step further comprises a real-number domain approximation processing mode that, when hardware computing resources are limited or efficient complex operations are not supported, degenerates the feature signal into a real vector and replaces complex domain manifold transforms with limited Lorentz transforms or equivalent radial transform logic.
- 7. The manifold geometry-based large language model semantic data processing method according to claim 1, wherein in the feature lifting step, high-dimensional real feature vectors are compressed and mapped into low-dimensional manifold space feature signals through low-rank linear projection, and the manifold space feature signals are restored to the original dimensions after the geometric evolution step is performed.
- 8. A semantic data processing computing device, comprising: A memory for storing computer program instructions and structured semantic data; The processor is electrically connected with the memory, and a tensor processing unit TPU or a graphic processing unit GPU is configured in the processor; The processor is configured to perform the steps of the manifold geometry based large language model semantic data processing method according to any one of claims 1 to 7.
- 9. The semantic data processing computing apparatus of claim 8, wherein the processor is internally integrated or configured with a geometric logic adapter execution unit that may be: physical hardware circuit modules including, but not limited to, ASICs, FPGAs, special purpose acceleration units, or Software execution logic implemented by a microcode/instruction set/operator fusion kernel, or A general purpose GPU/TPU calculation unit configured to perform the geometric logic; The geometric logic adapter execution unit is configured to run in parallel or in plug-in independent of the attention mechanism of the backbone network for executing the feature lifting step, the hierarchy predicting step, the geometric evolution step and the projection fusion step.
- 10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the manifold geometry based large language model semantic data processing method according to any one of claims 1 to 7.
- 11. Use of a manifold geometry based large language model semantic data processing method according to any of claims 1 to 7 in large language model inference, characterized by: The method is used for suppressing logic errors caused by semantic relation directional confusion in the text generation or semantic reasoning process, wherein the suppression is realized by the following technical means: Constraining the hierarchical position of the semantic object in manifold space by utilizing semantic depth parameters, and distinguishing different semantic relation types by utilizing direction parameters; And fusing the characteristic signals containing the level constraint and the direction constraint back to the backbone neural network in a residual mode, so that the occurrence probability of the inversion type error of the semantic relation is reduced on the premise of not changing the backbone attention calculation mechanism.
- 12. The use according to claim 11, wherein the method suppresses the logical illusion by jointly constraining radial position and angular distribution of semantic objects in non-euclidean manifold space, enabling the model to geometrically distinguish between containment relationships and correlation relationships.
- 13. Use of the manifold geometry based large language model semantic data processing method according to any of claims 1 to 7 in hierarchical knowledge pushing, characterized in that: The method is used for processing semantic objects with parent-child hierarchical relationships or partial order relationships, wherein the hierarchical relationships are expressed through ordered distribution of semantic depth parameters in manifold space so as to improve consistency of long-chain logic reasoning.
- 14. The use according to claim 13, wherein the method uses posterior hierarchy parameters outputted by a previous network layer as prior hierarchy parameters of a next network layer through a hierarchy-aware recursive position coding mechanism to maintain dynamic consistency of hierarchy parameters in a multi-layer reasoning process, and is applicable to at least one scenario of legal clause reasoning, program code structure generation or medical knowledge graph pushing.
- 15. Use of the manifold geometry based large language model semantic data processing method according to any of claims 1 to 7 in large language model reasoning deployment, characterized in that: According to the method, the recursive updating of the hierarchical parameters is limited in the depth direction of the network layer, so that dynamic hierarchical perception is realized on the premise of not introducing cross-time-step cyclic dependence, the method can work cooperatively with a key value caching mechanism, and the method is suitable for incremental decoding or streaming reasoning scenes.
Description
Manifold geometry-based large language model semantic data processing method, manifold geometry-based large language model semantic data processing device, storage medium and application Technical Field The invention relates to the technical field of artificial intelligence and computer data processing, and more particularly, to a manifold geometry-based large language model semantic data processing method, apparatus, storage medium and application. Background In existing Large Language Model (LLM) techniques, such as the Transformer architecture, semantic features are typically embedded in a flat Euclidean Space (R n). Although this approach performs well in general text generation tasks, it faces a double bottleneck in computer hardware resources and logic expression capabilities when processing data with strict logical structures: The "crowding problem" (Crowding Problem) of storage and computing efficiency, concepts in natural language present a tree-like hierarchical structure (e.g., entity- > animal- > mammal- > canine- > halftoning). In graph theory and topology, the number of nodes of a tree structure grows exponentially with depth. However, the conventional Euclidean space volume only increases polynomial with radius (V. Alpha. R. Sup. Sup.n). In order to distinguish the deep concept in the limited euclidean space, the prior art has to greatly increase the dimension of the feature vector (for example, from 4096 dimension to 16384 dimension), so that the video memory (VRAM) occupation is too high, and the floating point operation number (FLOPs) of the processor increases in square level, which causes huge waste of hardware resources. The "anisotropy" of the logical reasoning is missing (Logical Anisotropy) the existing model computes the attention score mainly in terms of Dot Product. The dot product is essentially calculated as cosine similarity, with symmetry (i.e., A is similar to B is similar to A). However, the logical implications in human language tend to be asymmetric (e.g., "sparrow is a bird" is true, but "bird is a sparrow" is not necessarily true). In the case of Euclidean similarity alone, the model more easily mistakes "correlation" as "implication", thereby creating a directionality error in long-chain reasoning (TRANSITIVE INFERENCE) or relation extraction. For example, in relatives, "A is the parent of B" implies strict direction and role constraints. The euclidean similarity can only express that "a is strongly related to B", and is easily confused as "B is father/parent of a" or other similar relationship when generating or reasoning, resulting in the illusion of facts. The invention enables the hierarchy/role/relation type to have separable geometric characterization by introducing radial (Boost) and angular (Rotation/Phase) decomposition, thereby being easier to inhibit the directivity illusion. Timing dependency deadlock (Temporal Paradox) in order to introduce hierarchical information, the prior art attempts to incorporate semantic depth parameters in position coding. However, semantic depth depends on context understanding, which depends on attention mechanism calculations, which in turn depend on position coding. The circular dependence of the 'chicken eggs and chicken eggs' causes 'deadlock' on a calculation time sequence, so that the existing serial processor cannot effectively execute dynamic level coding, can only be degenerated into static coding, and loses level perception capability under a dynamic context. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a data processing scheme based on manifold geometry (especially SL (2, C) group manifold), and the processing precision and the robustness of a model to hierarchical logic are improved on the premise of not increasing or even reducing characteristic dimension by constructing a hybrid architecture of an Euclidean host-hyperbolic visitor and utilizing the exponential capacity characteristic of a hyperbolic space. In order to achieve the above technical object, a first aspect of the present invention provides a manifold geometry-based semantic data processing method for a large language model, which mainly includes executing the following signal processing procedures by a processor: a processor receives a discrete symbol signal of an input text sequence and converts the discrete symbol signal into a real number eigenvector of Euclidean space; calling a tensor transformation module to map the real number feature vector projection into a feature signal on a manifold space; a hierarchical prediction step, namely generating corresponding semantic depth parameters based on the current real number feature vector by using a hierarchical prediction network; The processor executes manifold-based linear or nonlinear transformation logic, builds an evolution operator by utilizing the combined action of the semantic depth parameters and the learnable direction parameters, and executes space tra