CN-122019907-A - Webpage updating analysis method, device, medium and product
Abstract
The invention discloses a webpage updating analysis method, a webpage updating analysis device, a webpage updating analysis medium and a webpage updating analysis product, relates to the technical field of Internet, and can be used in the technical field of finance and technology. The method comprises the steps of comparing webpage contents of a target website in a fixed time window to obtain webpage updated contents, judging the type of the webpage updated contents according to webpage label characteristics to obtain updated types, wherein the updated types comprise at least one type of content addition, content deletion, data updating, style adjustment and irrelevant change, determining the updating times and updating weights of the updated types in the fixed time window, and carrying out webpage activity analysis on the target website according to the updated types, the updating times and the updating weights. Through the technical scheme, the web page update in the website can be effectively monitored, and the web page activity can be effectively analyzed.
Inventors
- HE SHIJIA
Assignees
- 中国工商银行股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260126
Claims (10)
- 1. A web page update analysis method, comprising: Comparing the webpage content of the target website in the fixed time window to obtain webpage updating content; Judging the type of the web page updated content according to the web page tag characteristics to obtain an updated type, wherein the updated type comprises at least one type of content addition, content deletion, data update, style adjustment and irrelevant change; Determining the update times and update weights of the update types in the fixed time window; And carrying out web page activity analysis on the target website according to the update type, the update times and the update weight.
- 2. The method of claim 1, wherein comparing the web page content of the target web site within the fixed time window to obtain web page updated content comprises: acquiring a current version webpage and a historical version webpage of a target website in a fixed time window; Performing HTML label segmentation on the webpage source code of the current version webpage and the webpage source code of the historical version webpage respectively to obtain a current label code sequence and a historical label code sequence; determining a longest common code subsequence between the current tag code sequence and the historical tag code sequence; And determining the webpage updating content according to the longest public code subsequence, the current tag code sequence and the historical tag code sequence.
- 3. The method of claim 2, wherein determining web page updates based on the longest common code subsequence, the current tag code sequence, and the historical tag code sequence comprises: Respectively aligning the current tag code sequence and the historical tag code sequence with the longest public code subsequence in position, and marking corresponding positions of current elements in the current tag code sequence and historical elements in the historical tag code sequence in the longest public code subsequence; and respectively comparing element contents at corresponding positions in the current tag code sequence, the historical tag code sequence and the longest public code subsequence to obtain web page updating contents.
- 4. The method of claim 3, wherein comparing the element content at the corresponding positions in the current tag code sequence, the historical tag code sequence and the longest common code subsequence to obtain web page update content comprises: If the current element is not in the longest public code subsequence, determining that the current element is deleted content; If the history element is not in the longest public code subsequence, determining that the history element is newly added content; If the elements in the longest public code subsequence are simultaneously present at the corresponding positions in the current tag code sequence and the historical tag code sequence and the content of the elements is consistent, determining that the content at the corresponding position of the current tag code sequence is not updated; If the current tag code sequence and the historical tag code sequence have contents at the same position relative to the longest public code subsequence and the contents are inconsistent, determining that the contents at the corresponding position in the current tag code sequence are modified contents; and taking the deleted content, the newly added content and the modified content as web page updating content.
- 5. The method of claim 1, wherein performing web page activity analysis on the target web site based on the update type, the update times, and the update weights comprises: determining update quality scores according to the update weights and the update times; determining basic active points according to the continuous active days of the update type; Determining web page liveness according to the basic liveness score and the updated quality score; And determining a webpage activity level according to the webpage activity level and the activity threshold value so as to analyze the webpage activity of the target website.
- 6. The method of claim 5, wherein determining web page liveness based on the base liveness score and the updated quality score further comprises: determining a website key plate of the target website, and determining key plate detection deduction according to the updating condition of the website key plate; And determining the web page liveness according to the basic liveness score, the updated quality score and the key plate detection deduction.
- 7. The method of claim 1, wherein comparing the web page content of the target web site within the fixed time window to obtain web page updated content comprises: Determining whether page content of the target website is updated according to the current webpage hash value of the current version webpage and the historical webpage hash value of the historical version webpage of the target website in the fixed time window; And if the web page is updated, comparing the current version web page with the historical version web page to determine web page updating content.
- 8. A web page update analysis apparatus, comprising: The updating content determining module is used for comparing the webpage content of the target website in the fixed time window to obtain webpage updating content; The updating type determining module is used for judging the type of the web page updating content according to the web page tag characteristics to obtain an updating type, wherein the updating type comprises at least one type of content addition, content deletion, data updating, style adjustment and irrelevant change; The update times weight determining module is used for determining the update times and the update weights of the update types in the fixed time window; And the webpage activity analysis module is used for carrying out webpage activity analysis on the target website according to the update type, the update times and the update weight.
- 9. A computer readable storage medium storing computer instructions for causing a processor to implement the web page update analysis method of any one of claims 1-7 when executed.
- 10. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the web page update analysis method according to any of claims 1-7.
Description
Webpage updating analysis method, device, medium and product Technical Field The invention relates to the technical field of Internet, which can be used in the field of financial science and technology, in particular to a webpage updating and analyzing method, device, medium and product. Background The enterprise website is not only an information release platform, but also an important window for expanding brand images, customer services and businesses. Timely and effective content updating is the key to maintaining vitality and value of websites. The web site is used as an enterprise digital portal, and the updating behavior of the web site can reflect the operation health condition of the web site to a certain extent. Therefore, how to effectively detect the website update and share the website liveness is important to grasp the operation health condition of the website. Disclosure of Invention The invention provides a webpage updating analysis method, a webpage updating analysis device, a webpage updating analysis medium and a webpage updating analysis product, which can automatically, accurately and efficiently monitor content changes of an enterprise website. According to an aspect of the present invention, there is provided a web page update analysis method, including: Comparing the webpage content of the target website in the fixed time window to obtain webpage updating content; Judging the type of the web page updated content according to the web page tag characteristics to obtain an updated type, wherein the updated type comprises at least one type of content addition, content deletion, data update, style adjustment and irrelevant change; Determining the update times and update weights of the update types in the fixed time window; And carrying out web page activity analysis on the target website according to the update type, the update times and the update weight. According to another aspect of the present invention, there is provided a web page update analysis apparatus including: The updating content determining module is used for comparing the webpage content of the target website in the fixed time window to obtain webpage updating content; The updating type determining module is used for judging the type of the web page updating content according to the web page tag characteristics to obtain an updating type, wherein the updating type comprises at least one type of content addition, content deletion, data updating, style adjustment and irrelevant change; The update times weight determining module is used for determining the update times and the update weights of the update types in the fixed time window; And the webpage activity analysis module is used for carrying out webpage activity analysis on the target website according to the update type, the update times and the update weight. According to another aspect of the present invention, there is provided an electronic apparatus including: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the web page update analysis method according to any one of the embodiments of the present invention. According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the web page update analysis method according to any one of the embodiments of the present invention when executed. According to another aspect of the present invention, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a web page update analysis method according to any of the embodiments of the present invention. The technical scheme of the embodiment of the invention comprises the steps of comparing webpage contents of a target website in a fixed time window to obtain webpage updated contents, judging the type of the webpage updated contents according to webpage label characteristics to obtain an updated type, determining the update times and update weights of the updated type in the fixed time window, and carrying out webpage activity analysis on the target website according to the updated type, the update times and the update weights, wherein the updated type comprises at least one type of content addition, content deletion, data update, style adjustment and irrelevant change. According to the technical scheme, the enterprise website content change can be automatically, accurately and efficiently monitored, and enterprise website change information is converted into quantifiable and analyzable operation activity indexes, so that website activity analysis efficiency is improved. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineat