Search

CN-122019305-A - Remote browser monitoring and intervention method and system based on real-time video stream

CN122019305ACN 122019305 ACN122019305 ACN 122019305ACN-122019305-A

Abstract

The invention belongs to the technical field of browser control, and discloses a remote browser monitoring and intervention method and a remote browser monitoring and intervention system based on a real-time video stream, wherein the method comprises the steps of processing a detected natural language instruction to obtain a standardized command; the method comprises the steps of acquiring and encoding a current display picture of a browser to obtain a video to be transmitted, carrying out importance classification on a current display picture frame by utilizing a pre-configured importance classification model, dynamically adjusting transmission parameters of the picture frame according to an importance classification result, carrying out binary conversion transmission on the video to be transmitted, and displaying the video on a mobile terminal in real time. The invention reduces the dependence of the user on specific operation flow and browser operation details, ensures that the user can finish remote operation without grasping specific scripting language, improves the usability of the system, intuitively senses the execution state of the remote browser through a video picture, and intervenes in time when abnormality is found, thereby improving the reliability of monitoring and operation of the remote browser.

Inventors

  • DAI GUANHONG
  • ZHAO LEI
  • WANG CHAO

Assignees

  • 北京赖耶信息科技有限公司

Dates

Publication Date
20260512
Application Date
20260129

Claims (10)

  1. 1. A method for monitoring and intervening a remote browser based on a real-time video stream, comprising: Acquiring a natural language instruction input by a mobile terminal in real time through a client, and detecting the natural language instruction; Selecting local processing or processing by using a preset artificial intelligent model to obtain a standardized command; controlling a browser in a server to execute corresponding operation according to the standardized command, and acquiring and encoding a current display picture of the browser to obtain a video to be transmitted; In the current display picture collecting process, carrying out importance grading on the current display picture frame by utilizing a pre-configured importance grading model, and dynamically adjusting the transmission parameters of the picture frame according to an importance grading result; And carrying out binary conversion transmission on the video to be transmitted, and displaying the video on the mobile terminal in real time.
  2. 2. The method for monitoring and intervening a remote browser based on a real-time video stream according to claim 1, wherein the steps of acquiring the natural language instruction input by the mobile terminal through the client in real time and detecting the natural language instruction include: acquiring a natural language instruction input by a user at a mobile terminal in real time; And detecting keywords of the natural language instruction through a preset keyword feature library, and dividing the natural language instruction into simple instructions if the natural language instruction does not comprise artificial intelligent keywords, and otherwise dividing the natural language instruction into complex instructions.
  3. 3. The method for monitoring and intervening a remote browser based on a real-time video stream according to claim 1, wherein the keyword detection is performed on a natural language instruction through a preset keyword feature library, and if the natural language instruction does not include an artificial intelligence keyword, dividing the natural language instruction into simple instructions, and conversely into complex instructions includes: The keyword feature library is constructed by multi-step operation keywords, auxiliary request words, analysis keywords and intelligent judgment words.
  4. 4. The method for monitoring and intervening a remote browser based on a real-time video stream according to claim 1, wherein selecting a local process or a process using a preset artificial intelligence model for the detected natural language instruction, obtaining a standardized command comprises: Performing traversal matching on a preset regular expression rule base by a simple instruction, extracting key parameters in the matched simple instruction, and converting the key parameters into standardized commands; forwarding the complex instruction to a preset artificial intelligent model, identifying the intention of the complex instruction by utilizing the artificial intelligent model, disassembling the identified complex instruction into a plurality of atomic operations, and carrying out standardized processing on each atomic operation to obtain a standardized command.
  5. 5. The method for monitoring and intervening a remote browser based on a real-time video stream according to claim 1, wherein controlling the browser in the server to execute the corresponding operation according to the standardized command and performing acquisition encoding on the current display screen of the browser, to obtain the video to be transmitted comprises: The method comprises the steps of carrying out screenshot on content rendered in a browser in real time through a preset screenshot interface, and continuously outputting corresponding picture frames; visual feature extraction is carried out on the current display picture of the browser, and the visual feature extraction is used for judging whether the standardized command is successfully executed or not; and inputting the continuously output picture frames into a preset video encoder for encoding to obtain the video to be transmitted.
  6. 6. The method for monitoring and intervening a remote browser based on real-time video stream according to claim 5, wherein the capturing the content rendered in the browser in real time through the preset capturing interface and continuously outputting the corresponding frame comprises: Continuously acquiring a currently rendered picture frame of the browser according to a preset cycle acquisition period; when each acquisition period starts, acquiring a preset backpressure mark, and detecting whether the acquisition of the previous picture frame is completed; when the fact that the previous picture frame is not acquired is detected, skipping the current acquisition period and directly entering the next acquisition period; When the last picture frame is detected to be collected, a screen capturing interface is utilized to collect a page currently rendered by the browser, and the current page is output as a picture frame.
  7. 7. The method for monitoring and intervening a remote browser based on real-time video stream according to claim 5, wherein said extracting visual characteristics of the current display screen of the browser, for judging whether the standardized command is successfully executed, comprises: Before executing the standardized command, obtaining a visual snapshot of a current display picture of the browser, and reducing the resolution of the snapshot; Generating corresponding visual fingerprint features for the processed visual snapshot by using a perceptual hash algorithm; Detecting a boundary frame, a center coordinate, a current rolling position and a view port range of the interactable element in the visual snapshot, and storing a webpage structure snapshot of the interactable element; After the standardized command is executed, a visual snapshot after the command is executed by the browser is obtained, and multi-dimensional verification is carried out on the visual snapshot before the command is executed; And if the verification does not pass, the verification indicates that the standardized command is not successfully executed, and the repair strategy is automatically selected and/or manual intervention processing is triggered according to the abnormal type obtained by the verification.
  8. 8. The method for monitoring and intervening a remote browser based on real-time video stream according to claim 1, wherein the step of classifying the importance of the current display frame by using a pre-configured importance classification model during the current display frame acquisition process, and dynamically adjusting the transmission parameters of the frame comprises: Dividing a current display picture frame into a key frame, an active frame and a static frame by using an importance hierarchical model, and optimizing the key frame, the active frame and the static frame by using an increment perception hash algorithm; constructing a priority queue according to the key frames, the active frames and the static frames, wherein the priority queue comprises a high priority, a medium priority and a low priority; And detecting semantic events of the user in real time through the browser, and dynamically adjusting transmission parameters of the picture frames according to the priority queue.
  9. 9. The method for monitoring and intervening a remote browser based on real-time video stream according to claim 8, wherein the steps of detecting semantic events of a user in real time by the browser and dynamically adjusting transmission parameters of a frame according to the semantic events in combination with a pre-constructed priority queue include: When a key semantic event is detected, stopping encoding of a current frame, capturing a current browser picture frame with preset high-quality parameters, taking the current browser picture frame as a key frame, inserting the key frame into a high-priority queue for transmission, and triggering a visual prompt at a mobile terminal; When the browser picture is converted from static state to dynamic state, the picture frame in the conversion process is used as an active frame and is used as a medium priority queue for transmission; When the browser picture is still, the corresponding picture frame is marked as a still frame and transmitted as a low priority queue in a manner that reduces the transmission frequency and/or reduces the coding quality.
  10. 10. A real-time video stream based remote browser monitoring and intervention system, comprising: The instruction detection module is used for acquiring a natural language instruction input by the mobile terminal in real time through the client terminal and detecting the natural language instruction; the standardized module is used for selecting local processing of the detected natural language instruction or processing by using a preset artificial intelligent model to obtain a standardized command; The picture acquisition module is used for controlling the browser in the server to execute corresponding operation according to the standardized command, and acquiring and encoding the current display picture of the browser to obtain a video to be transmitted; the picture adjusting module is used for carrying out importance classification on the current display picture frame by utilizing a pre-configured importance classification model in the current display picture collecting process and dynamically adjusting the transmission parameters of the picture frame according to an importance classification result; And the video transmission module is used for binary conversion transmission of the video to be transmitted and real-time display on the mobile terminal.

Description

Remote browser monitoring and intervention method and system based on real-time video stream Technical Field The invention relates to the technical field of browser control, in particular to a remote browser monitoring and intervention method and system based on real-time video streaming. Background With the rapid development of internet service, for a work station which needs to frequently operate a browser to acquire data, the work task is time-consuming and labor-consuming, meanwhile, the data needs to be monitored at any time, the data needs to be processed in time after discovery, and the work personnel is often required to continuously operate under the non-work time or cross-region conditions and respond to abnormal conditions in time, however, the existing manual mode depends on the personnel to be on duty for a long time, the efficiency is low, and the mobile office and remote monitoring requirements are difficult to meet. To improve efficiency, browser automation schemes, such as Selenium, puppeteer, playwright and natural language browser control techniques based on large language models, are increasingly used in the prior art. While the technology can realize automatic operation to a certain extent, the technology is mainly oriented to technicians with programming capability, common business personnel learn and use a higher threshold, meanwhile, most of the schemes are mainly based on a script or instruction-driven black box execution mode, visual display of the actual operation process of a browser is lacking, a user cannot confirm the execution state in real time, once deviation occurs, only post-investigation is realized, timely correction is difficult, in addition, the existing browser automation scheme usually operates in a local environment, remote real-time monitoring capability is lacking, long-time operation task remote viewing and management is difficult to support, and even if a remote desktop or screen sharing technology is combined, the problems of high bandwidth consumption, high delay, inflexibility in interaction, difficulty in deep fusion with automatic control and the like exist. Therefore, how to provide a method and a system for monitoring and intervening a remote browser based on a real-time video stream is a problem to be solved at present. Disclosure of Invention The embodiment of the invention provides a remote browser monitoring and intervention method and a remote browser monitoring and intervention system based on a real-time video stream, which are used for solving the problems of the prior art. According to a first aspect of an embodiment of the present invention, a method and a system for monitoring and intervening a remote browser based on a real-time video stream are provided. In one embodiment, the remote browser monitoring and intervention method based on the real-time video stream comprises the following steps: Acquiring a natural language instruction input by a mobile terminal in real time through a client, and detecting the natural language instruction; Selecting local processing or processing by using a preset artificial intelligent model to obtain a standardized command; controlling a browser in a server to execute corresponding operation according to the standardized command, and acquiring and encoding a current display picture of the browser to obtain a video to be transmitted; In the current display picture collecting process, carrying out importance grading on the current display picture frame by utilizing a pre-configured importance grading model, and dynamically adjusting the transmission parameters of the picture frame according to an importance grading result; And carrying out binary conversion transmission on the video to be transmitted, and displaying the video on the mobile terminal in real time. In one embodiment, acquiring, by the client, a natural language instruction input by the mobile terminal in real time, and detecting the natural language instruction includes: acquiring a natural language instruction input by a user at a mobile terminal in real time; And detecting keywords of the natural language instruction through a preset keyword feature library, and dividing the natural language instruction into simple instructions if the natural language instruction does not comprise artificial intelligent keywords, and otherwise dividing the natural language instruction into complex instructions. In one embodiment, keyword detection is performed on a natural language instruction through a preset keyword feature library, if the natural language instruction does not include an artificial intelligent keyword, the natural language instruction is divided into simple instructions, and otherwise, the division into complex instructions includes: The keyword feature library is constructed by multi-step operation keywords, auxiliary request words, analysis keywords and intelligent judgment words. In one embodiment, selecting local processing or processing by