Search

CN-115994358-B - Website detection method and system based on rule base

CN115994358BCN 115994358 BCN115994358 BCN 115994358BCN-115994358-B

Abstract

The invention belongs to the technical field of network security, and provides a website detection method and system based on a rule base. According to the method, a browser is controlled to render a website, a section of JavaScript script is injected into the rendered website, harmful features shown in the website are detected and identified based on a rule base, the website with hit rules is preprocessed through strategies in the rule base, detection results are summarized and reported to a harmful information detection and perception platform, and further processing is carried out by operation and maintenance personnel, so that the problem that the harmful website can easily escape from content examination due to incomplete sensitive word base, mixed words and the like in the sensitive word detection technology is solved.

Inventors

  • XU XIAOQUAN
  • MA CHENGJIE
  • LIN TING
  • ZHENG ZIFAN
  • ZHANG WEIWEI

Assignees

  • 厦门三五互联科技股份有限公司

Dates

Publication Date
20260512
Application Date
20221223

Claims (6)

  1. 1. A method for detecting a website based on a rule base, the method comprising: The program renders the website by controlling the browser; the program injects a section of JavaScript script into the rendered website, and detects and identifies harmful features shown in the website based on a rule base, wherein the program injects a section of JavaScript script into the rendered website, and the JavaScript script comprises: the JavaScript script comprises a rule base for detecting harmful contents and an execution engine, and can detect the harmful characteristics displayed by the website; wherein the rule base for detecting harmful contents comprises: detecting and identifying the characters of the text variation in the title label and meta label of the website; detecting and identifying the characteristics of the website for displaying the harmful information by using the pure picture page; the method comprises the steps of detecting and identifying the characteristic that a website utilizes an iframe frame to load and evade content examination; detecting and identifying the frequently-changed content characteristics in the website title label and the meta label; the program preprocesses the website hit with the rule through the strategy in the rule base; And the program reports the detection result to a harmful information detection and perception platform.
  2. 2. The method of claim 1, wherein the program rendering the website by controlling the browser comprises: the program automatically starts an interface-free browser to render the website by using an open source browser test framework.
  3. 3. The method of claim 2, wherein the program automatically launching an interface-free browser to render the website using an open source browser test framework comprises: after the program finishes rendering the website, firstly obtaining the content of the website and identifying harmful information by using a sensitive word detection technology.
  4. 4. The method of claim 1, wherein the program pre-processing the website hit with the rule by a policy in a rule base comprises: The program obtains corresponding text contents by calling OCR image recognition on a pure picture website and detects the text contents by using sensitive words; Aiming at the rule with extremely low misjudgment rate, the program directly closes the website hit with the rule and sends a mail to inform the safety responsible person.
  5. 5. A rule base based website detection system, the system comprising: the rendering module is used for rendering the website by controlling the browser; The detection module is used for injecting a section of JavaScript script into the rendered website, detecting and identifying the harmful features shown in the website based on a rule base, wherein the detection module is used for the JavaScript script to contain the rule base for detecting the harmful contents and executing a engine for detecting the harmful features shown in the website, The rule base for detecting the harmful content is used for detecting and identifying the characteristics of the character varieties in the title label and meta label of the website; detecting and identifying the characteristics of the website for displaying the harmful information by using the pure picture page; the method comprises the steps of detecting and identifying the characteristic that a website utilizes an iframe frame to load and evade content examination; detecting and identifying the frequently-changed content characteristics in the website title label and the meta label; The preprocessing module is used for preprocessing the website hit with the rule through the strategy in the rule base; And the reporting module is used for reporting the detection result to the harmful information detection and perception platform.
  6. 6. The system of claim 5, wherein the policies in the rule base comprise: the method comprises the steps of obtaining corresponding text contents through calling OCR image recognition on a pure picture website and detecting the text contents by using sensitive words; aiming at the rule with extremely low misjudgment rate, the program directly closes the website hit with the rule and sends a mail to inform the security responsible person.

Description

Website detection method and system based on rule base Technical Field The invention relates to the technical field of network security, in particular to a website detection method and system based on a rule base. Background In the prior art, sensitive word libraries are generally adopted to detect whether the website content has sensitive words such as pornography, gambling, politics, riot and the like, so that the website content needs to have a complete sensitive word library. On one hand, the technical means of detecting and matching harmful contents through sensitive words has the problems of hysteresis and passivity, and the action of collecting harmful information is delayed from the generation of harmful information. After some common sensitive words are effectively intercepted, new vocabularies corresponding to the common sensitive words become new dark words. After a large amount of media exposure, new words are commonly collected into a sensitive word stock by the industry. On the other hand, websites often use complex variant texts such as meaningless characters or garbled characters to easily evade content censorship. For example, the variant words of "Add WeChat" are "Jia Weixin", "╇v", etc. The network operators carry out management and service activities and should fulfill network security protection obligations, and the value-added telecom service provider needs to monitor and identify harmful information of the access website of the customer, discover security risks in time and take corresponding disposal measures to ensure the compliance access of the website. However, the technology of detecting and matching harmful contents through sensitive words can easily avoid the problem of content examination due to incomplete sensitive word stock, text confusion and other reasons, and cannot discover security risks in time and take corresponding disposal measures. Disclosure of Invention The invention provides a website detection method and system based on a rule base, and aims to solve the problem that harmful websites can easily escape from content examination due to incomplete sensitive word bases, text confusion and the like in the sensitive word detection technology. In order to achieve the above object, the present invention provides a website detection method based on rule base, the method comprising: The program renders the website by controlling the browser; the program injects a section of JavaScript script into the rendered website, and detects and identifies the harmful features shown by the website based on a rule base; the program preprocesses the website hit with the rule through the strategy in the rule base; And the program reports the detection result to a harmful information detection and perception platform. Further, the program rendering the website by controlling the browser includes: the program automatically starts an interface-free browser to render the website by using an open source browser test framework. Further, the program automatically launching an interface-free browser to render the website by using an open source browser test framework comprises: after the program finishes rendering the website, firstly obtaining the content of the website and identifying harmful information by using a sensitive word detection technology. Further, the program injects a section of JavaScript script into the rendered website, including: the JavaScript script comprises a rule base for detecting harmful contents and an execution engine, and can detect the harmful characteristics displayed on the website. Further, the rule base for detecting harmful contents includes: detecting and identifying the characters of the text variation in the title label and meta label of the website; detecting and identifying the characteristics of the website for displaying the harmful information by using the pure picture page; the method comprises the steps of detecting and identifying the characteristic that a website utilizes an iframe frame to load and evade content examination; And detecting and identifying the frequently-changing content characteristics in the website title label and the meta label. Further, the program pre-processing the website hit with the rule through the policy in the rule base comprises: The program obtains corresponding text contents by calling OCR image recognition on a pure picture website and detects the text contents by using sensitive words; Aiming at the rule with extremely low misjudgment rate, the program directly closes the website hit with the rule and sends a mail to inform the safety responsible person. In order to achieve the above object, the present invention further provides a website detection system based on rule base, the system comprising: the rendering module is used for rendering the website by controlling the browser; The detection module is used for injecting a section of JavaScript script into the rendered website and detecting and identifying the h