Search

KR-20260067721-A - SYSTEM AND METHOD FOR AUTOMATED WEB NAVIGATION AND USABILITY EVALUATION USING MULTIMODAL LARGE LANGUAGE MODELS

KR20260067721AKR 20260067721 AKR20260067721 AKR 20260067721AKR-20260067721-A

Abstract

The present invention relates to an automated system and method for web navigation and usability evaluation utilizing a multimodal large-scale language model. The automated web navigation and usability evaluation system utilizing a multimodal large-scale language model according to the present invention comprises an input data setting module for collecting data from a web page, an automated web navigation module for performing operations within a web page using a multimodal large-scale language model, a usability evaluation module for performing tests on user scenarios and evaluating the usability of said web page, and an action log storage module for storing log data generated during the web navigation and usability evaluation process.

Inventors

  • 김도균
  • 손준성
  • 김수윤
  • 이승현

Assignees

  • 주식회사 인핸스

Dates

Publication Date
20260513
Application Date
20241106

Claims (11)

  1. Input data setting module for collecting data from web pages; An automated web navigation module that performs operations within the web page using a multimodal large-scale language model; A usability evaluation module that performs tests on user scenarios and evaluates the usability of the web page; and Action log storage module that stores log data generated during web navigation and usability evaluation processes A web navigation and usability evaluation automation system utilizing a multimodal large-scale language model including
  2. In paragraph 1, The above input data setting module captures image data and HTML source code of the web page and provides them as input values for the multimodal large-scale language model. Automated web navigation and usability evaluation system utilizing a multimodal large-scale language model.
  3. In paragraph 2, The above input data setting module collects multiple screenshots considering screen size and resolution, and analyzes the above HTML source code to extract elements necessary for navigation. Automated web navigation and usability evaluation system utilizing a multimodal large-scale language model.
  4. In paragraph 1, The above-mentioned automated web navigation module interacts with the interface of the web page using the above-mentioned multimodal large-scale language model, and receives data again from the above-mentioned input data setting module when moving to another web page through an action. Automated web navigation and usability evaluation system utilizing a multimodal large-scale language model.
  5. In paragraph 4, The above-mentioned automated web navigation module analyzes the current state of the web page, generates an action plan for achieving the user's goal, and generates feedback information by analyzing the action plan, action, reason for action selection, and change information within the page. Automated web navigation and usability evaluation system utilizing a multimodal large-scale language model.
  6. In paragraph 5, The above-mentioned automated web navigation module stores the history of the operation plan, operation, reason for operation selection, and feedback. Automated web navigation and usability evaluation system utilizing a multimodal large-scale language model.
  7. In paragraph 1, The above usability evaluation module defines criteria for evaluating the usability of the web page and measures the performance of the web page according to the criteria. Automated web navigation and usability evaluation system utilizing a multimodal large-scale language model.
  8. In a method for automating web navigation and usability evaluation using a multimodal large-scale language model, performed by a system for automating web navigation and usability evaluation using a multimodal large-scale language model, (a) A step of collecting and preprocessing data from web pages according to test scenarios; (b) a step of automatically performing an action within the above web page; and (c) A step of evaluating the usability of the above web page and storing log data A method for automating web navigation and usability evaluation using a multimodal large-scale language model including
  9. In paragraph 8, The above step (a) involves collecting the data including image data and HTML source code of the web page. Automated web navigation and usability evaluation method using a multimodal large-scale language model.
  10. In paragraph 8, The above step (b) involves analyzing the current state of the web page, generating an action plan, executing the action, and generating feedback information using page transformation information using a multimodal large-scale language model. Automated web navigation and usability evaluation method using a multimodal large-scale language model.
  11. In paragraph 8, Step (c) above measures the performance of the web page according to criteria for evaluating the usability of the web page. Automated web navigation and usability evaluation method using a multimodal large-scale language model.

Description

System and Method for Automated Web Navigation and Usability Evaluation Using Multimodal Large Language Models The present invention relates to an automated system and method for web navigation and usability evaluation utilizing a multimodal large-scale language model. Script-based web crawling according to conventional technology operates based on static data, which has the problem of making it difficult to accurately recognize and process dynamic and complex user interfaces. Conventional usability evaluation methods are performed manually, which not only consumes a significant amount of time and cost but also suffers from a lack of consistency in collected data and requires additional analysis time, making it difficult to rapidly improve web pages. FIG. 1 illustrates a web navigation and usability evaluation automation system utilizing a multimodal large-scale language model according to an embodiment of the present invention. FIG. 2 illustrates the configuration of an input data setting module according to an embodiment of the present invention. FIG. 3 illustrates the configuration of an automated web navigation module according to an embodiment of the present invention. FIG. 4 illustrates the configuration of a usability evaluation module according to an embodiment of the present invention. FIG. 5 illustrates a method for automating web navigation and usability evaluation using a multimodal large-scale language model according to an embodiment of the present invention. FIG. 6 is a block diagram showing a computer system for implementing a method according to an embodiment of the present invention. The aforementioned objectives of the present invention, as well as other objectives, advantages, and features, and the methods for achieving them, will become clear from the embodiments described in detail below together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below but can be implemented in various different forms, and the following embodiments are provided merely to easily inform those skilled in the art of the purpose, structure, and effects of the invention, and the scope of the rights of the present invention is defined by the description in the claims. Meanwhile, the terms used in this specification are for describing the embodiments and are not intended to limit the invention. In this specification, the singular form includes the plural form unless specifically stated otherwise in the text. As used in this specification, "comprises" and/or "comprising" do not exclude the presence or addition of one or more other components, steps, actions, and/or elements to the mentioned components, steps, actions, and/or elements. Below, the background of the proposed invention is explained, followed by a description of a preferred embodiment of the invention. With the advancement of artificial intelligence and computer vision technologies, research and technological development aimed at automation and efficiency improvement are actively underway across various industrial sectors. As the need for automated web page navigation and usability evaluation increases, conventional technologies related to simple script-based web crawling and user simulation have been proposed. According to conventional technology, there are limitations in interacting with the complex visual elements of web pages. Conventional technology, which operates based on static data such as HTML source code, faces difficulties in accurately recognizing and processing dynamic and complex user interfaces. Furthermore, it fails to effectively handle screen images on web pages, resulting in deficiencies in the analysis and interaction of visual elements. Additionally, it is limited to being suitable only for clearly defined user goals, making it unsuitable for processing ambiguous commands or complex user scenarios. Conventional usability evaluation methods are primarily conducted manually, which results in significant time and cost consumption. In other words, not only is a large workforce required for usability evaluation, but manually collected data also lacks consistency and requires additional time for analysis, making it difficult to rapidly improve web pages. The present invention is proposed to solve the aforementioned problems and proposes a system and method for automating web page navigation and usability evaluation based on a multimodal large-scale language model, which is capable of automatically performing various operations of a web page using visual elements of the web page and HTML source code as input data. According to an embodiment of the present invention, by utilizing a multimodal large-scale language model, there is an advantage in that automatic analysis using AI and automatic execution of web operations are possible even for ambiguous goals that are not clearly defined by the user. The AI accurately recognizes and processes complex user interfaces by combining image