CN-121996856-A - Resource acquisition and reinforcement learning method, device, storage medium and program product

CN121996856ACN 121996856 ACN121996856 ACN 121996856ACN-121996856-A

Abstract

Embodiments of the present specification provide a resource acquisition, reinforcement learning method, apparatus, storage medium, and program product. The method comprises the steps of obtaining resource obtaining requirements, generating a first task queue aiming at the resource obtaining requirements, wherein the first task queue is used for storing uniform resource identifiers URI to be explored, obtaining the first URI from the first task queue by a resource obtaining module and initiating a resource access request based on the first URI to obtain first network resources, obtaining first reference URIs embedded in the first network resources, determining whether the first reference URIs need to be explored or not by utilizing a decision model obtained based on reinforcement learning optimization according to resource obtaining states corresponding to the first URIs, and adding the first reference URIs into the first task queue when the first reference URIs need to be explored so as to obtain corresponding network resources later.

Inventors

Zhan Wanke
LIU XIUTING
CAI JIANSHENG

Assignees

蚂蚁区块链科技(上海)有限公司

Dates

Publication Date: 20260508
Application Date: 20251231

Claims (17)

1. A method for obtaining resources, comprising: acquiring a resource acquisition requirement; generating a first task queue aiming at the resource acquisition requirement, wherein the first task queue is used for storing Uniform Resource Identifiers (URIs) to be explored; Obtaining, by a resource obtaining module, a first URI from the first task queue and initiating a resource access request based on the first URI to obtain a first network resource; Obtaining a first reference URI embedded in the first network resource; Determining whether the first reference URI needs to be explored or not by utilizing a decision model obtained based on reinforcement learning optimization according to the resource acquisition state corresponding to the first URI; when the first reference URI needs to be explored, the first reference URI is added into the first task queue for later acquisition of corresponding network resources.
2. The method of claim 1, wherein the reward function used by the decision model during the reinforcement learning phase is configured to generate a reward based on a resource acquisition success rate of a second reference URI, a computational effort consumption cost of initiating access based on the second reference URI, and/or a content semantic score of a network resource obtained based on the second reference URI, the second reference URI referring to a reference URI that is determined to be explored by the decision model during the reinforcement learning phase.
3. The method of claim 1, wherein the reinforcement learning process of the decision model comprises: Creating a second task queue; acquiring, by the resource acquisition module, a second URI from the second task queue, and initiating a resource access request based on the second URI to acquire a second network resource; obtaining a second reference URI embedded in the second network resource; Determining whether the second reference URI needs to be explored or not by utilizing a decision model to be optimized according to the resource acquisition state corresponding to the second URI; when the second reference URI needs to be explored, the second reference URI is added into the second task queue for subsequent acquisition of corresponding network resources; Determining rewards according to the resource acquisition success rate of the second reference URI; and optimizing the decision model to be optimized based on the rewards.
4. A method according to claim 3, further comprising: Acquiring resource acquisition response information corresponding to the second reference URI, wherein the resource acquisition response information comprises at least two of response status codes, response time, response content size and content analysis quality; And determining the resource acquisition success rate according to the resource acquisition response information corresponding to the second reference URI.
5. A method according to claim 3, wherein determining the reward based on the resource acquisition success rate of the second reference URI comprises: Determining rewards according to the resource acquisition success rate of the second reference URI and the computational effort consumption cost of initiating access based on the second reference URI.
6. The method of claim 3, wherein determining the reward based on the resource acquisition success rate corresponding to the second reference URI comprises: Determining rewards according to the resource acquisition success rate corresponding to the second reference URI and the content semantic score of the second network resource; Wherein the content semantic score is determined based on a degree of matching of the second network resource with the resource acquisition requirement and/or a degree of repetition of the second network resource with the acquired network resource.
7. The method of any one of claims 1 to 6, wherein the resource acquisition status corresponding to the first URI includes one or more of resource acquisition response information corresponding to the first URI, a depth of exploration of the first URI, a website to which the first URI belongs, an access frequency for the website, an operating status of the resource acquisition module, historical decision information of the decision model for the website, and a content semantic score of a resource body parsed from the first network resource.
8. The method of any of claims 1 to 6, wherein generating a first task queue for the resource acquisition requirement comprises: acquiring a plurality of keywords related to the resource acquisition requirement; determining a plurality of seed URIs according to the keywords; A first task queue is created and the plurality of seed URIs are added to the first task queue.
9. The method according to any one of claims 1 to 6, further comprising: Executing cleaning treatment on the obtained first network resource by using a preset cleaning rule to obtain a cleaned resource; And analyzing the cleaned resource by using a pre-training language model to obtain a resource main body and a first reference URI embedded in the first network resource.
10. The method as recited in claim 9, further comprising: And determining whether to store the resource main body according to the matching degree of the resource main body and the resource acquisition requirement and the repeatability degree between the resource main body and other resource main bodies acquired aiming at the resource acquisition requirement.
11. The method of any of claims 1 to 6, wherein the resource acquisition module comprises a cluster comprising a plurality of working nodes that are consumers of the first task queue.
12. A method of reinforcement learning, comprising: Creating a second task queue; acquiring a second URI from the second task queue by a resource acquisition module, and acquiring a second network resource based on the second URI; obtaining a second reference URI embedded in the second network resource; Determining whether the second reference URI needs to be explored or not by utilizing a decision model to be optimized according to the resource acquisition state corresponding to the second URI; when the second reference URI needs to be explored, the second reference URI is added into the second task queue for subsequent acquisition of corresponding network resources; determining rewards according to the resource acquisition success rate corresponding to the second referential URI; and optimizing the decision model to be optimized based on the rewards.
13. A resource acquisition apparatus, comprising: The first acquisition module is used for acquiring resource acquisition requirements; The generation module is used for generating a first task queue aiming at the resource acquisition requirement, wherein the first task queue is used for storing Uniform Resource Identifiers (URIs) to be explored; the resource acquisition module is used for acquiring a first URI from the first task queue and initiating a resource access request based on the first URI so as to acquire a first network resource; A second acquisition module, configured to acquire a first reference URI embedded in the first network resource; The determining module is used for determining whether the first reference URI needs to be explored or not by utilizing a decision model obtained based on reinforcement learning optimization according to the resource acquisition state corresponding to the first URI; and the adding module is used for adding the first reference URI into the first task queue when the first reference URI needs to be explored so as to obtain corresponding network resources later.
14. A reinforcement learning device, comprising: The creation module is used for creating a second task queue; The resource acquisition module is used for acquiring a second URI from the second task queue and acquiring a second network resource based on the second URI; an acquisition module for acquiring a second reference URI embedded in the second network resource; The first determining module is used for determining whether the second reference URI needs to be explored or not by utilizing a decision model to be optimized according to the resource acquisition state corresponding to the second URI; An adding module, configured to add the second reference URI to the second task queue when the second reference URI needs to be explored, so as to obtain a corresponding network resource later; the second determining module is used for determining rewards according to the resource acquisition success rate corresponding to the second reference URI; And the optimizing module is used for optimizing the decision model to be optimized based on the rewards.
15. An electronic device comprising a memory and a processor, wherein, The memory is used for storing programs; The processor, coupled to the memory, for executing the program stored in the memory to implement the method of any one of claims 1 to 12.
16. A computer readable storage medium storing a computer program, wherein the computer program is capable of implementing the method of any one of claims 1 to 12 when executed by a computer.
17. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 12.

Description

Resource acquisition and reinforcement learning method, device, storage medium and program product Technical Field The present disclosure relates to the field of computer technology, and in particular, to a method, apparatus, storage medium, and program product for resource acquisition and reinforcement learning. Background In the current age of deep fusion of artificial intelligence and big data, the data as a core production element has become a key resource for enterprise competition. At various stages of large language model training, including data collection, preprocessing, training, and verification, the data is central. Data is acquired/collected from the Internet, and the data acquisition efficiency and quality directly influence the mining and application effects of the data value. However, the conventional data acquisition system faces challenges of difficult acquisition of dynamic webpage content, complex anti-acquisition strategy and the like, and is difficult to adapt to the effective acquisition requirement of massive heterogeneous data. Accordingly, it is desirable to provide a data acquisition scheme that can improve the efficiency of data acquisition. Disclosure of Invention Aspects of the present specification provide a resource acquisition, reinforcement learning method, apparatus, storage medium, and program product for optimizing a resource exploration path by introducing reinforcement learning, thereby improving resource acquisition efficiency. A first aspect of the present specification provides a resource acquisition method, including: acquiring a resource acquisition requirement; generating a first task queue aiming at the resource acquisition requirement, wherein the first task queue is used for storing Uniform Resource Identifiers (URIs) to be explored; Obtaining, by a resource obtaining module, a first URI from the first task queue and initiating a resource access request based on the first URI to obtain a first network resource; Obtaining a first reference URI embedded in the first network resource; Determining whether the first reference URI needs to be explored or not by utilizing a decision model obtained based on reinforcement learning optimization according to the resource acquisition state corresponding to the first URI; when the first reference URI needs to be explored, the first reference URI is added into the first task queue for later acquisition of corresponding network resources. A second aspect of the present specification provides a reinforcement learning method, including: Creating a second task queue; acquiring a second URI from the second task queue by a resource acquisition module, and acquiring a second network resource based on the second URI; obtaining a second reference URI embedded in the second network resource; Determining whether the second reference URI needs to be explored or not by utilizing a decision model to be optimized according to the resource acquisition state corresponding to the second URI; when the second reference URI needs to be explored, the second reference URI is added into the second task queue for subsequent acquisition of corresponding network resources; determining rewards according to the resource acquisition success rate corresponding to the second referential URI; and optimizing the decision model to be optimized based on the rewards. A third aspect of the present specification provides a resource acquisition apparatus, comprising: The first acquisition module is used for acquiring resource acquisition requirements; The generation module is used for generating a first task queue aiming at the resource acquisition requirement, wherein the first task queue is used for storing Uniform Resource Identifiers (URIs) to be explored; the resource acquisition module is used for acquiring a first URI from the first task queue and initiating a resource access request based on the first URI so as to acquire a first network resource; A second acquisition module, configured to acquire a first reference URI embedded in the first network resource; The determining module is used for determining whether the first reference URI needs to be explored or not by utilizing a decision model obtained based on reinforcement learning optimization according to the resource acquisition state corresponding to the first URI; and the adding module is used for adding the first reference URI into the first task queue when the first reference URI needs to be explored so as to obtain corresponding network resources later. A fourth aspect of the present specification provides a reinforcement learning apparatus, comprising: The creation module is used for creating a second task queue; The resource acquisition module is used for acquiring a second URI from the second task queue and acquiring a second network resource based on the second URI; an acquisition module for acquiring a second reference URI embedded in the second network resource; The first determining module