KR-20260063657-A - APPARATUS AND METHODS FOR PREDICTING CODE COMPLEXITY USING CONTRASTIVE LEARNING

KR20260063657AKR 20260063657 AKR20260063657 AKR 20260063657AKR-20260063657-A

Abstract

The present invention relates to an apparatus and method for predicting code complexity using contrast learning, wherein the apparatus comprises: a problem description contrast learning unit that performs a first contrast learning by forming a problem description and a solution code pair; a reference code contrast learning unit that performs a second contrast learning by selecting one of the solution codes for a given problem as a reference code and forming a pair between the reference code and each of the other solution codes; an integrated learning unit that learns an integrated learning model to perform clustering of solution codes by learning similarity using a contrast loss obtained through the first and second contrast learnings and to determine the time complexity of the clustered solution codes using a cross-entropy loss; and a code time complexity prediction unit that predicts the time complexity of a new solution code through the integrated learning model.

Inventors

한요섭
박신우
한중혁

Assignees

연세대학교 산학협력단

Dates

Publication Date: 20260507
Application Date: 20241030

Claims (11)

A problem description contrast learning unit that performs first contrast learning by configuring a problem description and solution code pair; A reference code comparison learning unit that selects one of the solution codes for a given problem as a reference code and forms pairs between the reference code and each of the other solution codes to perform a second comparison learning; An integrated learning unit that learns a learning model to perform clustering of solution codes by learning similarity with a contrast loss obtained through the first and second contrast learnings and to determine the time complexity of the clustered solution codes with a cross-entropy loss; and A code complexity prediction device using contrastive learning, comprising a code time complexity prediction unit that predicts the time complexity of a new solution code through the above-mentioned integrated learning model.
In paragraph 1, the problem description comparison learning unit A code complexity prediction device using contrastive learning, characterized by generating a first embedding for the above problem description and a second embedding for the above solution code, and learning the first and second embeddings as positive pairs.
In claim 1, the above reference code comparison learning unit A code complexity prediction device using contrastive learning, characterized by calculating the internal similarity between solution codes for the given problem and determining the solution code with the highest internal similarity as the reference code.
In paragraph 3, the above reference code comparison learning unit A code complexity prediction device using contrastive learning, characterized by generating embeddings of solution codes for the given problem and determining the reference code through the dot product between each embedding.
In paragraph 3, the above reference code comparison learning unit A code complexity prediction device using contrastive learning, characterized by forming pairs of other solution codes for the same problem as the above reference code as positive examples and forming pairs of solution codes for a different problem from the above reference code as negative examples to perform the above second contrastive learning.
In paragraph 1, the integrated learning unit A code complexity prediction device using contrastive learning, characterized by performing clustering by placing similar codes among the above solution codes relatively close together and dissimilar codes among the above solution codes relatively far apart.
In paragraph 6, the above-mentioned integrated learning unit A code complexity prediction device using contrastive learning characterized by controlling the relative distance of the number of the above solution codes by adjusting the similarity sensitivity.
In paragraph 1, the integrated learning unit A code complexity prediction device using contrastive learning characterized by predicting a time complexity class by calculating the probability distribution of the predicted time complexity of the clustered solution codes using a softmax function.
In paragraph 8, the above integrated learning unit A code complexity prediction device using contrastive learning, characterized by optimizing the integrated learning model to reduce the difference between the time complexity class of the clustered solution codes and the actual time complexity.
In claim 1, the code time complexity prediction unit A code complexity prediction device using contrastive learning characterized by generating an embedding for the above-mentioned new solution code and predicting the above-mentioned time complexity through multi-class classification.
In a method for predicting code complexity using contrastive learning performed in a code complexity prediction device using contrastive learning, A problem description contrast learning step that performs a first contrast learning by constructing a problem description and solution code pair; A reference code comparison learning step of selecting one of the solution codes for a given problem as a reference code and forming pairs between the reference code and each of the other solution codes to perform a second comparison learning; An integrated learning step of learning an integrated learning model to perform clustering of solution codes by learning similarity with a contrast loss obtained through the first and second contrast learning steps above, and to determine the time complexity of the clustered solution codes with a cross-entropy loss; and A method for predicting code complexity using contrastive learning, comprising a code time complexity prediction step that predicts the time complexity of a new solution code through the above-mentioned integrated learning model.

Description

Apparatus and Methods for Predicting Code Complexity Using Contrastive Learning The present invention relates to a technique for predicting the time complexity of a given source code, and more specifically, to an apparatus and method for predicting code complexity using contrastive learning capable of performing clustering of solution codes and determining the time complexity of the clustered solution codes using cross-entropy loss. Time complexity is considered a crucial metric for measuring algorithm efficiency. Calculating the time complexity of a given piece of code is a fundamental task for assessing efficiency in various situations. In particular, worst-case time complexity is essential in scenarios where system reliability and predictability are critical. However, the problem of determining the time complexity of a given code is known to be theoretically undecidable. This is proven by reducing it to the halting problem. Therefore, recent studies are attempting to develop approximate solutions based on machine learning and deep learning. FIG. 1 is a diagram illustrating the functional configuration of a code complexity prediction device according to one embodiment of the present invention. Figure 2 is a diagram illustrating the system configuration of a code complexity prediction device. FIG. 3 is a flowchart illustrating a method for predicting code complexity according to the present invention. The description of the present invention is merely an example for structural or functional explanation, and therefore the scope of the present invention should not be interpreted as being limited by the examples described in the text. That is, since the examples are subject to various modifications and may take various forms, the scope of the present invention should be understood to include equivalents capable of realizing the technical concept. Furthermore, the objectives or effects presented in the present invention do not imply that a specific example must include all of them or only such effects; therefore, the scope of the present invention should not be understood as being limited by them. Meanwhile, the meaning of the terms described in this application should be understood as follows. Terms such as "first," "second," etc., are intended to distinguish one component from another, and the scope of rights shall not be limited by these terms. For example, the first component may be named the second component, and similarly, the second component may be named the first component. When it is stated that one component is "connected" to another component, it should be understood that it may be directly connected to that other component, or that there may be other components in between. Conversely, when it is stated that one component is "directly connected" to another component, it should be understood that there are no other components in between. Meanwhile, other expressions describing the relationships between components, such as "between" and "exactly between," or "adjacent to" and "directly adjacent to," should be interpreted in the same way. A singular expression should be understood to include a plural expression unless the context clearly indicates otherwise, and terms such as "include" or "have" are intended to specify the existence of the implemented features, numbers, steps, actions, components, parts, or combinations thereof, and should be understood not to preclude the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. In each step, identifiers (e.g., a, b, c, etc.) are used for convenience of explanation and do not describe the order of the steps; the steps may occur differently from the specified order unless a specific order is clearly indicated in the context. That is, the steps may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order. The present invention may be implemented as computer-readable code on a computer-readable recording medium, and the computer-readable recording medium includes all types of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. Additionally, the computer-readable recording medium may be distributed across networked computer systems, so that computer-readable code can be stored and executed in a distributed manner. Unless otherwise defined, all terms used herein have the same meaning as generally understood by those skilled in the art to which this invention pertains. Terms defined in commonly used dictionaries should be interpreted as having meanings consistent with the context of the relevant technology and should not be interpreted as having an ideal or overly formal meaning unless explicitly defined in this application. FIG. 1 is a diagram illustrating the func