CN-121979805-A - Mobile terminal application automatic dial testing method and system based on intelligent agent
Abstract
The invention provides a mobile terminal application automatic dial testing method and system based on an intelligent agent, which belong to the technical field of mobile terminal application automatic testing, and the method is executed by the intelligent agent of a control terminal and comprises the steps of acquiring a current interface image of a mobile terminal in real time; the method comprises the steps of carrying out multi-mode analysis on an interface image, detecting UI components in the interface, extracting boundary boxes and category labels of the components, extracting text content in the interface, associating and fusing the UI components and the text content into a structured UI representation, inputting the representation, a preset task instruction and a history operation memory into a large language model, generating a next operation instruction in a sequence decision mode by the model, executing the next operation instruction by a mobile terminal to complete automatic interaction operation, and then returning to the step of obtaining the current interface image, and executing the next operation until the task completion or termination condition is met. Based on the method, a corresponding system is also provided. The invention realizes high coverage and strong automatic dial testing for mobile terminal application.
Inventors
- ZHANG BEICHEN
- ZHANG WEIGANG
- QI ZHAOBO
Assignees
- 哈尔滨工业大学(威海)
Dates
- Publication Date
- 20260505
- Application Date
- 20260409
Claims (10)
- 1. The mobile terminal application automatic dial testing method based on the intelligent agent is characterized by being executed by the intelligent agent of the control terminal and comprising the following steps of: Acquiring a current interface image of a mobile terminal in real time; carrying out local multi-mode analysis on the current interface image, detecting UI components in the interface, extracting boundary boxes and category labels of the components, extracting text contents in the interface through optical character recognition, and carrying out association fusion on the UI components and the text contents to obtain a structured UI representation; The structured UI characterization, a preset task instruction and a history operation memory are input into a large language model together, and a next operation instruction is generated by the large language model in a sequence decision mode; And then returning to the step of acquiring the current interface image of the mobile terminal in real time, and continuing to execute the next round of operation until the task completion condition or the trigger termination condition is met.
- 2. The intelligent agent-based mobile terminal application automatic dial testing method according to claim 1, wherein the local multi-modal analysis is performed on the current interface image, specifically: Carrying out light preprocessing on the current interface image to obtain a preprocessed interface image; performing UI component detection on the preprocessed interface image by adopting a target detection algorithm, identifying each UI component in the interface, and extracting a bounding box and a category label of each component; performing optical character recognition on the preprocessed interface image, and extracting text content and corresponding text position information in the interface; and according to the bounding boxes and text position information of the components, carrying out association fusion on the identified text content and the corresponding UI components, and associating the text content to which each UI component belongs with each other to form a structured UI representation.
- 3. The agent-based mobile end application automatic dial-up method according to claim 2, wherein the structured UI token is expressed as: ; Wherein, the Representation of Structured UI characterization of the moment interface; Representing the first of the current interfaces A plurality of UI elements; representing the coordinate center of the boundary frame; expressed as semantic category labels; Expressed as text content; represented as interactable properties; representing the total number of UI elements.
- 4. The intelligent agent-based mobile terminal application automatic dial testing method according to claim 1, wherein the large language model generates the next operation instruction in a sequence decision mode, specifically: Analyzing category labels and text contents of all UI elements in the current structured UI characterization, and identifying functional attributes of all UI elements to form a semantic understanding result of the current interface; Semantic understanding result of current interface and preset task instruction And UI characterization of time of day Together as contextual inputs, each candidate operation in the action space A is calculated by a large language model Conditional probability distribution of (2) ; Selecting candidate operation with maximum conditional probability As the output of the next operation instruction, the following conditions are satisfied: ; Wherein, the For the current moment Is a UI representation of (c).
- 5. The agent-based mobile terminal application automatic dial testing method according to claim 4, wherein the operation instruction comprises one or more of clicking a designated coordinate, sliding from a start point to an end point, inputting a designated text, returning to a previous interface, and returning to a main interface.
- 6. The method for automatically dialing and measuring an application on a mobile terminal based on an agent according to claim 4, further comprising a state anomaly detection step before the next operation instruction is executed by the mobile terminal: Inputting the current interface image into a multi-modal large model to perform state judgment, and detecting whether an abnormal popup window or interface blockage exists in the current interface to obtain an abnormal detection result ; Determining an executed operation instruction according to the abnormality detection result: ; Wherein, the Representing a resume operation instruction; Representing the next operation instruction generated.
- 7. The method for automatically dialing and measuring a mobile terminal application based on an intelligent agent according to claim 6, wherein the state anomaly detection step further includes task timeout control, specifically: and when the execution time length exceeds a preset threshold value, forcibly terminating the current task and switching to the next task.
- 8. The agent-based mobile terminal application automatic dial-up testing method according to claim 1, further comprising: Recording the operation type, operation parameters, time stamp, interface change information and execution result of each operation to form an operation record; When an abnormality occurs, recording an abnormality type and corresponding recovery operation information to form an abnormality record; And generating a dial testing report according to the operation record and the abnormal record.
- 9. The method for automatically dialing and measuring a mobile terminal application based on an intelligent agent according to claim 1, wherein the task completion condition is to complete all operation steps corresponding to a preset task instruction; the trigger termination condition includes a task execution timeout, detection of an unrecoverable exception, or receipt of an external stop instruction.
- 10. An intelligent agent-based mobile terminal application automatic dial testing system is characterized by comprising: The control end is provided with an intelligent agent for executing, wherein the intelligent agent is used for acquiring a current interface image of the mobile end in real time, carrying out local multi-mode analysis on the current interface image, detecting UI components in the interface, extracting bounding boxes and class labels of the components, simultaneously carrying out association fusion on the UI components and the text contents into a structured UI representation through optical character recognition, inputting the structured UI representation, a preset task instruction and a history operation memory into a large language model together, generating a next operation instruction in a sequence decision mode by the large language model, executing the next operation instruction through the mobile end, completing automatic interaction operation, and returning to the step of acquiring the current interface image of the mobile end in real time, and continuing to execute the next operation until a task completion condition or a trigger termination condition is met; and the mobile terminal is in communication connection with the control terminal, and is used for responding to the operation instruction sent by the intelligent agent to execute automatic interaction operation and provide an interface image for the intelligent agent to acquire.
Description
Mobile terminal application automatic dial testing method and system based on intelligent agent Technical Field The invention belongs to the technical field of automatic testing of mobile terminal application, and particularly relates to a mobile terminal application automatic dial testing method and system based on an intelligent agent. Background Along with the rapid development of the mobile internet, the number and complexity of smart phone applications continuously rise, application interaction logics are more diversified, interface structures are more complex, and the version iteration frequency is remarkably improved. In order to test and collect application traffic, the need for automated testing of mobile-side applications is becoming increasingly stringent. However, the mainstream automatic test scheme in the current industry still has significant limitations, and is difficult to meet the test requirements of multi-scenario and multi-type applications, and also difficult to support the high-frequency dial testing and continuous monitoring tasks caused by the rapid evolution of the application ecology. Traditional script-based automation relies on manual programming of test scripts or recording of operational steps. Such methods are highly dependent on specific interface structures, and once upgrades, layout adjustments, or interactive logic changes are applied, scripts are prone to failure, require extensive manual maintenance, are costly, and are difficult to expand. The UI automation framework based on the control tree can directly operate the control, but still depends on explicit control ID, hierarchical structure and other information, and in order to perform interface performance, many business applications often have the conditions of dynamic control change, control confusion and even self-drawing control use, so that the automation framework is difficult to identify and cause dial testing failure. Furthermore, such methods still rely on static rules, and do not have the ability to understand task intent or autonomously plan operational steps. Graphical user interfaces (GRAPHICAL USER INTERFACE, GUI) have been the heart of man-machine interaction, providing users with an intuitive, visually driven way to access and operate digital systems. Conventional GUI automatic interactions rely on script-based or rule-based methods, such as Monkey Testing, to find potential problems by generating random input operations on the interface, commonly used for robustness Testing of mobile apps. The rule-based method then generates an operational sequence with a target directionality by explicitly modeling GUI state transition logic. Although the system improves the degree of test automation, the system has the problems of poor flexibility, weak generalization capability and heavy manual dependence, and is difficult to adapt to dynamically-changed interface content and complex user tasks. With the wide application of large language models and multi-modal models, intelligent interactive systems based on intelligent agents gradually become an important direction of intelligent automation. The Agent refers to an intelligent entity capable of sensing environment, understanding task intention, making autonomous decision and executing operation. The multi-mode intelligent agent has the recognition capability of the visual elements of the interface, the understanding capability of the text and the task intention, and can generate executable operation steps according to reasoning, so that the mobile phone application can be operated like a person. The method has the essential advantages that the method gets rid of dependence on control trees and script rules, and achieves higher-level intellectualization from four links of perception, understanding, decision and operation. The GUI Agent is an intelligent Agent system for simulating the operation behavior of a human user and completing intelligent interaction and task execution of a graphical interface in a clicking, sliding and text input mode. The intelligent agent can understand interface layout and semantics by sensing visual elements in a screen, and makes decisions and operations according to the interface layout and the semantics, so that autonomous control of application programs in platforms such as a desktop and a mobile platform is realized. However, the existing scheme directly processes the image by depending on a cloud large model, and has the problems of high cost, large delay, poor stability and the like. Therefore, a new automatic dial testing mechanism is needed in the art, which can automatically understand interface content, autonomously generate task operation steps, stably execute a full-flow task chain and realize real use scene simulation and reliable dial testing of mobile phone application under the condition of no script and control dependence. Disclosure of Invention In order to solve the technical problems, the invention provides a mobil