CN-122018906-A - Method and system for converting natural language instruction into operating system command

CN122018906ACN 122018906 ACN122018906 ACN 122018906ACN-122018906-A

Abstract

The invention discloses a method and a system for converting natural language instructions into operating system commands, and belongs to the technical field of natural language processing and operating system interaction. The method comprises the steps of realizing instruction conversion by means of a layered architecture, firstly deploying a lightweight model and a knowledge base containing regular and semantic templates to provide basic support for the architecture, obtaining user input and completing preprocessing, completing instruction classification and mixed mode matching and distribution through an intention coarse screening and quick matching layer, resolving ambiguous instructions through a fuzzy arbitration layer, mapping into executable instructions of a target operating system after combining three-level security verification, and finally feeding back results and forming an interactive closed loop. The method is suitable for the terminal with limited resources, delay and privacy risk of cloud dependency are avoided, coverage of structured and spoken instructions is considered through mixed matching, ambiguity scene accuracy is improved through fuzzy arbitration, full-link safety protection is achieved through three-level verification, and the interaction effect of low delay and safety and controllability of natural language instructions is achieved.

Inventors

JI BIN
HUANG JIANGJIE
GAN ZHIYI
ZHANG MENGLIN
LIU XIAODONG
YU JIE
LI HANHUA
PENG LONG
GAO LONG
ZHANG YI
LI ZHUO
LI LINBO

Assignees

中国人民解放军国防科技大学

Dates

Publication Date: 20260512
Application Date: 20260414

Claims (10)

1. A method of converting natural language instructions to operating system commands, the method comprising: Step S1, deploying a lightweight model and constructing a task knowledge base, deploying a lightweight large model in a local environment, and constructing a normalized instruction template library containing a regular expression template set and a semantic vector template set offline to provide basic support for a layered processing architecture; step S2, obtaining user input, executing preprocessing, and outputting regular preprocessing text, wherein the preprocessing text is used as input data of the hierarchical processing architecture; Step S3, executing processing by means of an intention coarse screening layer and a quick matching layer of the layered processing architecture, wherein the method specifically comprises the steps of carrying out intention classification on the preprocessed text based on the lightweight model, converting the preprocessed text into a normalized instruction character string if the preprocessed text is judged to be a system operation request, executing quick matching in the task knowledge base through a mixed mode, and executing shunting processing according to matching confidence level; step S4, executing processing by means of a fuzzy arbitration layer and a safety execution layer of the hierarchical processing architecture, wherein the step comprises the steps of starting two-channel parallel reasoning arbitration on candidate instructions in a fuzzy interval, mapping the candidate instructions into specific executable commands of a target operating system after three-level safety verification, and executing the specific executable commands; Step S5, feeding back to a user according to an execution result of the safety execution layer or a classification result of the intention coarse screening layer to form an interactive closed loop; The hierarchical processing architecture comprises an intention coarse screening layer, a quick matching layer, a fuzzy arbitration layer and a safety execution layer, the information flow of each level is transmitted through the steps S2 to S4, the preprocessed text output by the step S2 is classified by the intention coarse screening layer and then is transmitted into the quick matching layer, the matching result of the quick matching layer is shunted to the fuzzy arbitration layer according to confidence level or directly enters the safety execution layer, and the execution result of the safety execution layer and the arbitration result of the fuzzy arbitration layer are both connected into the feedback link of the step S5.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises, In the step S1, the number of parameters of the lightweight model is 10-40 billions, the parameters are converted into an INT4 or INT8 format by adopting GPTQ or AWQ post-training quantization technology, and the parameters are deployed through an ilama. The lightweight model adopts a dynamic on-demand loading and unloading mechanism, loads the model when complex understanding tasks are executed for the first time, and unloads the model after entering the safe execution layer processing stage.
3. The method of claim 1, wherein the step of determining the position of the substrate comprises, The task knowledge base constructed in the step S1 comprises a plurality of normalized instruction character strings, each normalized instruction character string corresponds to a unique instruction ID and a function description, and two types of matching templates are associated with each other: the regular expression template set is generated by manually writing or expanding a small amount of samples; the semantic vector template set is constructed through manual writing, large model generation, synonym expansion and real corpus collection, is coded into 768-dimensional semantic vectors through DmetaSoul semantic vector models, and is stored in a Milvus2.0 vector database.
4. The method according to claim 1, wherein in the step S2, the user input is obtained by text input, real-time voice input and file input, wherein the real-time voice input is converted into text by whisper. Cpp automatic voice recognition technology, and submitted after user confirmation or correction, and the file input is analyzed by mupdf to obtain image text by analyzing PDF document or OpenCV in combination with OCR technology; step S2, obtaining user input and performing preprocessing, and outputting regular preprocessed text, including: Removing redundant words, mood words and stop words, unifying full-angle characters to be half angles, filtering illegal control characters and special characters or code fragments, merging continuous blank spaces, removing head and tail redundant blank spaces, and removing redundant repeated punctuation; and taking the preprocessed regular preprocessed text as the input of the intention coarse screening layer.
5. The method according to claim 1, wherein in the step S3, the intention classification executed by the intention coarse screening layer is classified into a ternary classification including a system operation request, a knowledge/information question-answer, an intention blindness/boring, and only the system operation request enters the process flow of the quick matching layer; The mixed mode is executed in parallel for regular matching and vector similarity matching, the fast matching is executed in the task knowledge base through the mixed mode, and the shunting processing is executed according to the matching confidence, and the method comprises the following steps: Preferentially matching the regular expression template set through a regular expression engine, and giving the highest confidence degree if the matching is successful; when the regular matching fails, encoding the preprocessed text into a semantic vector, and executing approximate nearest neighbor search in the semantic vector template set to obtain Top-K most similar templates and corresponding normalized instruction strings; The method comprises the steps of directly outputting an instruction and transmitting the instruction to the safety execution layer when the confidence coefficient threshold value is larger than 0.95, starting processing of the fuzzy arbitration layer when the confidence coefficient threshold value is in a range of 0.65-0.95, and transferring to interactive confirmation in the step S5 when the confidence coefficient threshold value is smaller than 0.65.
6. The method of claim 1, wherein the step of determining the position of the substrate comprises, In the step S4, the dual channel includes: The channel A is used for quantitatively scoring Top-K candidate instructions one by one according to a four-dimensional scoring system, outputting an instruction X with a first score greater than 80, wherein the weight of the four-dimensional scoring system is distributed to be 30% of action consistency, 40% of object matching degree, 20% of core intention coverage and 10% of expression equivalence; the judging logic is used for outputting the instruction and transmitting the instruction into the safety execution layer if X is consistent with Y, executing the four-dimensional grading verification on Y if the instruction is inconsistent with Y, adopting Y only when the grading of Y is not less than 80 and the original semantic similarity is similar to X, otherwise selecting X; In the step S4, the third-level security check includes: the first-stage authority verification, binding user roles and operation authorities based on the RBAC model, wherein sensitive or high-risk operation needs secondary authorization, and returning an error code E401 after verification failure; the second-stage parameter filtering and purifying, performing boundary correction and unit standardization on the digital parameters, filtering and injecting attack special characters into the character string parameters, and returning error codes E402 by abnormal input; And intercepting a third-level blacklist, performing dynamic risk assessment by adopting double-mode detection of regular matching and semantic analysis and combining real-time running states of equipment, performing real-time blocking on a high-risk operation sequence, recording logs, and supporting hot updating by using an interception rule.
7. The method according to claim 1, wherein in the step S4, after outputting a final instruction via the fuzzy arbitration layer, the method further comprises: Invoking a platform-related instruction mapping table, wherein fields of the mapping table comprise an instruction ID, a function description, a target platform instruction, a parameter range, required permission, execution timeout and remarks, and the target platform comprises openKylin, windows, ubuntu; and the normalized instruction character string is converted into an executable command or API call of the corresponding platform by inquiring the mapping table, and if no corresponding item exists in the mapping table, the preset information feedback of the step S5 is triggered.
8. The method according to claim 1, wherein the step S5 of feeding back to the user according to the execution result of the security execution layer or the classification result of the intention coarse screen layer, forms an interactive closed loop, and includes: When the instruction is successfully executed by the security execution layer, an execution result is fed back, and the mapping pair of the original input-normalized instruction of the user is desensitized and then added to a local semantic vector library or a rule template library; when the instruction fails to execute, feeding back specific failure reasons, wherein the failure reasons comprise that the safety check fails, the mapping table has no corresponding item, and the bottom layer executor is wrong; And when the intention coarse screening layer judges that the confidence of the intention is too low, generating a confirmation problem with an explicit option through a lightweight model, interacting with a user in a text, voice or graphical form, and re-jumping to the fuzzy arbitration layer or directly entering the safety execution layer after the user confirms.
9. The method of claim 3, wherein the step of, The regular expression template set is constructed for each normalized instruction character string and comprises a mode with a fixed structure; after the natural language expression of the semantic vector template set is expanded by synonyms, millisecond-level approximate nearest neighbor search is performed through a mixed index of Milvus 2.0.
10. A system for converting natural language instructions to operating system commands, the system comprising: The first system module is used for deploying a lightweight model and constructing a task knowledge base, deploying a lightweight large model in a local environment, and constructing a normalized instruction template library containing a regular expression template set and a semantic vector template set offline to provide basic support for a layered processing architecture; The second system module is used for acquiring user input, executing preprocessing and outputting regular preprocessing text, wherein the preprocessing text is used as input data of the hierarchical processing architecture; The third system module is used for executing processing by depending on an intention coarse screening layer and a quick matching layer of the layered processing architecture, and concretely comprises the steps of carrying out intention classification on the preprocessed text based on the lightweight model, converting the preprocessed text into a normalized instruction character string if the preprocessed text is judged to be a system operation request, executing quick matching in the task knowledge base through a mixed mode, and executing shunting processing according to matching confidence level; the fourth system module is used for executing processing by depending on a fuzzy arbitration layer and a safety execution layer of the hierarchical processing architecture, and specifically comprises the steps of starting two-channel parallel reasoning arbitration on candidate instructions in a fuzzy interval, mapping the candidate instructions into specific executable commands of a target operating system after three-level safety verification, and executing the specific executable commands; The fifth system module is used for feeding back to a user according to an execution result of the safety execution layer or a classification result of the intention coarse screening layer to form an interactive closed loop; The hierarchical processing architecture comprises an intention coarse screening layer, a quick matching layer, a fuzzy arbitration layer and a safety execution layer, wherein information flows of all levels are transmitted through the second system module to the fourth system module, preprocessed texts output by the second system module are classified by the intention coarse screening layer and then are transmitted into the quick matching layer, matching results of the quick matching layer are distributed to the fuzzy arbitration layer according to confidence coefficient or directly enter the safety execution layer, and execution results of the safety execution layer and judgment results of the fuzzy arbitration layer are connected into a feedback link of the fifth system module.

Description

Method and system for converting natural language instruction into operating system command Technical Field The embodiment of the application relates to the technical field of natural language processing and operating system interaction, in particular to a method and a system for converting natural language instructions into operating system commands. Background With the rapid development of artificial intelligence technology, users have put higher demands on the naturalness and high efficiency of man-machine interaction, and are expected to directly control a desktop operating system through fuzzy and diversified natural languages to execute complex tasks such as file management, application program control, system setting and the like. The core of the interaction process is to accurately and efficiently analyze and map the natural language expression of the user into a structured and semantically accurate operating system command (such as Shell command and system API call), so as to realize the seamless conversion from the natural language instruction to the system executable operation. However, in desktop environments and resource-constrained local terminals (e.g., edge computing devices, internet of things terminals, low profile AIPCs), achieving reliable, responsive, and secure natural language interactions still faces the technical bottlenecks as exemplified below. Most of the existing schemes rely on cloud large models to finish complex semantic understanding, although certain understanding precision can be guaranteed, network delay and user privacy leakage risks are not negligible, the existing schemes are not available at all in an offline state, local control scenes with high requirements on real-time performance and data security cannot be met, and if the large models are directly deployed on a terminal, the large models are limited by computing capacity, memory capacity and power consumption budget of the terminal, and low-delay response is difficult to realize. In order to adapt to terminal resources, the prior art often adopts a highly tailored lightweight model, but the significant reduction of the model parameter scale leads to significant reduction of semantic understanding capability and generalization capability, and the method has low intention recognition accuracy on natural language expression with user diversification, spoken language and ambiguity, can not effectively cover rich user instruction scenes, and has poor user experience. In addition, another scheme adopts a rule matching mode based on keywords or fixed templates, has the advantages of high speed and low power consumption, but can only process limited and preset expressions, has extremely poor flexibility, synonymous substitution and spoken expression adaptability to natural language, has extremely low coverage (recall rate), can not process complex or multi-step compound instructions, and is difficult to cope with personalized expression habits of users. In summary, the prior art solution fails to systematically solve the comprehensive requirements of efficiency, accuracy, low power consumption and local security of natural language instruction conversion in a resource-constrained environment, and a method for converting natural language instructions into operating system commands, which is specially designed for the resource-constrained environment and combines high-precision semantic understanding, low-delay response and high security, is needed. Disclosure of Invention The present invention aims at the problems of the prior art mentioned in the background above, and the prior art solution fails to systematically solve the comprehensive demands of efficiency, accuracy, low power consumption and local security of natural language instruction conversion in resource-constrained environments. The technical scheme is as follows: In one aspect, a method for converting natural language instructions to operating system commands is provided, the method comprising: Step S1, deploying a lightweight model and constructing a task knowledge base, deploying a lightweight large model in a local environment, and constructing a normalized instruction template library containing a regular expression template set and a semantic vector template set offline to provide basic support for a layered processing architecture; step S2, obtaining user input, executing preprocessing, and outputting regular preprocessing text, wherein the preprocessing text is used as input data of the hierarchical processing architecture; Step S3, executing processing by means of an intention coarse screening layer and a quick matching layer of the layered processing architecture, wherein the method specifically comprises the steps of carrying out intention classification on the preprocessed text based on the lightweight model, converting the preprocessed text into a normalized instruction character string if the preprocessed text is judged to be a system operation r