CN-122018853-A - Subset relation judging method based on logarithmic prime number coding

CN122018853ACN 122018853 ACN122018853 ACN 122018853ACN-122018853-A

Abstract

A subset relation judging method based on logarithmic prime number coding relates to the field of large-scale set operation and data processing. In order to solve the problems of data overflow risk, high calculation complexity, low parallelization efficiency and difficult realization under a floating point system existing in a prime number product type subset judging method in the prior art, the method comprises the steps of establishing a prime number mapping table, distributing a unique prime number for each basic element in a whole set, carrying out logarithmic coding on the set based on the mapping table, calculating a difference value of logarithmic coding of the two sets, carrying out index reduction on the difference value and obtaining an integer candidate value, and judging a subset relation through comparison with dynamic self-adaptive tolerance. The method effectively avoids integer overflow, remarkably improves the calculation speed and the parallelism performance, and is suitable for reliability analysis, cut set solving, data mining and rapid judgment of the set inclusion relation in large-scale set calculation.

Inventors

DING MING
YANG YONGYONG
CAO XIAXIN
MENG ZHAOMING
GUO ZEHUA
WANG YANKAI
HAO XIAOTIAN

Assignees

哈尔滨工程大学

Dates

Publication Date: 20260512
Application Date: 20251223

Claims (10)

1. A subset relation judging method based on logarithmic prime number coding is characterized by comprising the following steps: determining a complete set of the problem to be processed, assigning a unique prime number to each basic element in the complete set, establishing a prime number mapping table and outputting prime number index data for subsequent set coding; Searching corresponding prime numbers for each basic element in an input set according to the prime number mapping table, calculating natural logarithms of each prime number, accumulating to obtain logarithmic codes representing set characteristics, and outputting logarithmic code data; Receiving the logarithmic coding data of the two sets and calculating the difference value of the logarithmic coding data to obtain a logarithmic difference value representing the ratio of the prime numbers of the two sets and outputting the difference value data; Performing exponential reduction operation on the logarithmic difference value to obtain a reduction result, performing rounding operation on the result to obtain an integer candidate value, and outputting the integer candidate value for subsequent judgment; and calculating the deviation between the logarithm difference value and the logarithm corresponding to the integer candidate value, comparing the deviation with the dynamic self-adaptive tolerance, judging that the previous set is a subset of the next set when the deviation is smaller than the tolerance threshold value, and outputting a judging result.
2. The method for judging the subset relation based on the logarithmic prime number coding according to claim 1, wherein when the prime number mapping table is established, the basic elements in the whole set are ordered according to a preset sequence, and are sequentially distributed from the minimum prime number, so that each basic element and the unique prime number form a one-to-one correspondence.
3. The method for judging the subset relation based on the logarithmic prime number coding according to claim 1, wherein natural logarithmic results of prime numbers are pre-calculated and cached into high-precision floating point numbers in the logarithmic coding process, and cached data are directly called through indexes in the coding process.
4. A subset relation judging method based on logarithmic prime number codes according to claim 1, wherein a pair-wise summation algorithm or a compensation summation algorithm is adopted in calculating the logarithmic codes of the sets.
5. The method according to claim 1, wherein the integer candidate obtained by rounding the exponent reduction result of the logarithmic difference is used as a theoretical estimate of the multiple relationship between sets, and is used in the subsequent step to determine the set inclusion relationship.
6. The method of claim 1, wherein the dynamic adaptive tolerance is automatically adjusted according to the magnitude of the exponent return result, the tolerance threshold decreases when the exponent result is larger, and the tolerance threshold increases when the exponent result is smaller.
7. A subset relation judgment device based on logarithmic prime number coding, comprising: A module for determining a complete set of the problem to be processed and assigning a unique prime number to each basic element in the complete set, establishing a prime number mapping table and outputting prime number index data for subsequent set coding; according to the prime number mapping table, searching the corresponding prime number for each basic element in the input set, calculating the natural logarithm of each prime number, accumulating to obtain the logarithm code representing the set characteristic, and outputting the logarithm code data; A module for receiving the logarithmic coding data of the two sets and calculating the difference value thereof to obtain a logarithmic difference value representing the ratio of the prime numbers of the two sets and outputting the difference value data; The module is used for carrying out exponential reduction operation on the logarithmic difference value to obtain a reduction result, carrying out rounding operation on the result to obtain an integer candidate value and outputting the integer candidate value for subsequent judgment; and the module is used for calculating the deviation between the logarithmic difference value and the corresponding logarithmic of the integer candidate value, comparing the deviation with the dynamic self-adaptive tolerance, judging that the previous set is a subset of the next set when the deviation is smaller than the tolerance threshold value, and outputting a judging result.
8. Computer storage medium for storing a computer program, characterized in that the computer performs the method of claim 1 when the computer program is read by the computer.
9. A computer comprising a processor and a storage medium, characterized in that the computer performs the method of claim 1 when the processor reads a computer program stored in the storage medium.
10. Computer program product, as a computer program, characterized in that the method of claim 1 is implemented when the computer program is executed.

Description

Subset relation judging method based on logarithmic prime number coding Technical Field The method relates to the field of large-scale set operation and data processing, in particular to a high-efficiency subset relation judging method. Background In the technical fields of reliability engineering, system security analysis, data mining, artificial intelligence and the like, collective operation is one of the most basic calculation forms. The rapid judgment of the containing relation (namely, subset relation) among the sets is a core step of various computing tasks such as system modeling, feature extraction, cut-and-gather simplification, pattern recognition, database retrieval and the like. Particularly, in the scenes of Fault Tree Analysis (FTA), cut set solving, minimum cut set screening, set feature compression, high-dimensional feature combination analysis and the like, the system often needs to process the judgment of the inclusion relation among thousands of sets, and the performance of the whole algorithm is directly determined by the calculation efficiency. In the prior art, researchers have proposed various set coding and subset judgment schemes, mainly including the following categories: Set representation method based on Boolean vector The method sequentially maps the whole set elements into all bits of the Boolean vector, and judges the set relation through logical operations such as AND, OR, NOT and the like. The method has clear structure and easy realization, but when the set scale is increased, the Boolean vector dimension is increased sharply, a large amount of storage space is occupied, vector comparison and bitwise operation are difficult to perform efficiently in large-scale parallel calculation, and the calculation efficiency is obviously reduced. Encoding method based on hash or bit set compression To reduce the storage overhead of boolean vectors, some studies have employed hash mapping or bit set compression techniques to quickly determine the set relationship by computing element hash values or bit masks. However, the hash algorithm may result in collision, does not have strict uniqueness, and cannot realize accurate inclusion judgment among sets, and the bit set compression can reduce space complexity to a certain extent, but decoding and comparison between different platforms (CPU/GPU) are relatively large. Unique coding method based on prime number product In the field of reliability engineering, prime number product method is a classical set coding scheme. The basic principle is that each element in the whole set is assigned a unique prime number by utilizing the arithmetic basic theorem, and one set is expressed as the product value of the prime numbers. And judging whether the set A is a subset of the set B, and judging only by verifying whether the codes of the set B can be divided by the codes of the set A. The method has good mathematical uniqueness and logic simplicity, and is widely used in applications such as cut set analysis, combination reliability calculation, data dependence judgment and the like. Although prime number product encoding has elegant mathematical properties in theory, significant computational bottlenecks and accuracy problems are exposed in engineering implementations. The concrete steps are as follows: The prime number product increases extremely fast, and when the number of elements contained in the set is large, the product result exceeds the representation range of the conventional integer data type (such as 64-bit integer), so that the numerical value overflow problem is caused; In order to solve overflow, a high-precision large database is adopted to carry out multiply-divide operation, but the large database is completely dependent on software layer simulation, the execution efficiency is far lower than that of a hardware original instruction, and the performance is greatly reduced especially in a high concurrency or GPU environment; The integer divide judgment itself belongs to high-complexity operation, and the time required by a single instruction is often tens times of that of multiplication, addition and the like, so that the integer divide judgment can greatly reduce the calculation efficiency of an algorithm. ; In order to improve efficiency, some researches try to reduce the calculated amount by means of module partitioning, segment integer dividing, hash checking and the like, but none of the methods fundamentally solve the overflow and performance bottleneck problems caused by product coding. Meanwhile, the balance still exists between the mathematical uniqueness and the calculation precision, and the speed, the precision and the realization simplicity cannot be considered. In summary, the prime number product type subset judging method in the prior art has the defects of data overflow risk, high calculation complexity, low parallelization efficiency and difficulty in efficient implementation under a floating point system. Disclosure