Search

CN-121980350-A - Data enterprise identification and industrial chain generation method based on big data and AI

CN121980350ACN 121980350 ACN121980350 ACN 121980350ACN-121980350-A

Abstract

The invention provides a data enterprise identification and industry chain generation method based on big data and AI, which belongs to the technical field of big model application, and specifically comprises the steps of determining the type of a warehouse-in enterprise based on analysis results of policy files, determining a matched industry list of the type of the warehouse-in enterprise based on matching data of the type of the warehouse-in enterprise and an industry list, determining a data processing strategy according to identification processing results of multidimensional information of the enterprise matched with the industry list, determining a data set of an identification model corresponding to the type by using the enterprise data matched with the industry list and the keyword of the multidimensional information, and constructing the data set of the identification model corresponding to the type by using the keyword data obtained by the data processing strategy, and training the identification model based on the data set to obtain the trained identification model, thereby improving accuracy of warehouse-in identification processing.

Inventors

  • XIE PANPAN
  • XIE YU
  • YUE BO
  • LIU YAFEI
  • LIAN ZIYI

Assignees

  • 稷量数字科技(北京)有限公司

Dates

Publication Date
20260505
Application Date
20260123

Claims (10)

  1. 1. A data enterprise identification and industry chain generation method based on big data and AI is characterized by comprising the following steps: determining the type of the warehousing enterprise according to the analysis result of the policy file, and determining the matching industry list of the type of the warehousing enterprise based on the matching data of the type of the warehousing enterprise and the industry list; Determining the recognition processing result of the multidimensional information of the enterprises matching the industry list, determining a data processing strategy according to the matching situation of the enterprise data matching the industry list and the keywords of the multidimensional information, constructing a dataset of a recognition model corresponding to the type according to the keyword data obtained by the data processing strategy, and performing training processing of the recognition model based on the dataset to obtain a trained recognition model; And determining the warehousing-in enterprises and the matching types of the warehousing-in enterprises according to the recognition results of the trained recognition models, and determining whether the data processing strategy needs to be optimized or not based on the recognition processing results of the different matching types of the warehousing-in enterprises and the keywords in different platforms when the warehousing-in data of the target matching type of the warehousing-in enterprises meets the requirements.
  2. 2. The method for identifying and generating industrial chains of data enterprises based on big data and AI as set forth in claim 1, wherein the type of the enterprises in warehouse is determined according to the analysis result of the policy file of the national data office.
  3. 3. The data enterprise recognition and industry chain generation method based on big data and AI of claim 1, wherein the matching industry list of the type of the warehouse-in enterprise is determined according to the type of the industry list.
  4. 4. The method for identifying and generating an industrial chain of data enterprises based on big data and AI according to claim 1, wherein the multidimensional information of the enterprises comprises recruitment information, patent information, business information, bidding information, standard formulation information and soft-authoring information of the enterprises.
  5. 5. The method for identifying data enterprises and generating industrial chains based on big data and AI as set forth in claim 1, wherein said method for determining data processing strategy is as follows: Based on enterprise data of the matching industry list, determining the number of enterprises in the matching industry list; Determining the number of enterprises with keywords according to the recognition results of keywords of multidimensional information of different enterprises in the matching industry list; And determining the data processing strategy of the keyword according to the number of enterprises in the matching industry list and the number of enterprises with the keyword.
  6. 6. The method for large data and AI-based data enterprise identification and industry chain generation of claim 5, wherein when the number of enterprises in the matching industry list is less than a preset threshold of the number of enterprises, determining that the data processing policy is that all keywords in the matching industry list are used as keywords for constructing the dataset of the identification model corresponding to the type.
  7. 7. The method for identifying data enterprises and generating industrial chains based on big data and AI according to claim 1, wherein constructing the data set of the identification model corresponding to the type comprises: Based on the keywords, determining enterprises with the keywords in multidimensional information comprising recruitment information, patent information, business information, bidding information, standard formulation information and soft-writing information, and constructing the enterprises by taking the enterprises as training sets; and constructing the data set of the recognition model according to the extraction result of the keywords of the enterprise constructed by the training set.
  8. 8. The method for identifying data enterprises and generating industrial chains based on big data and AI according to claim 7, wherein the matching type of the warehouse-in enterprises is determined according to the identification result of the trained identification model.
  9. 9. The method for identifying and generating an industrial chain of data enterprises based on big data and AI according to claim 7, wherein the data set of the identification model is constructed according to the matching number of the keywords of the enterprise constructed by the training set and the keywords of the construction of the data set of the identification model.
  10. 10. The method for identifying data enterprises and generating industrial chains based on big data and AI as set forth in claim 1, wherein determining whether the data processing policy requires optimization processing comprises: based on the data of the warehousing-capable enterprises with different matching types, determining the quantity of the warehousing-capable enterprises with the matching types being suspected matching enterprises; Determining the warehousing-capable enterprises with the matching type of suspected matching enterprises according to the recognition processing results of the keywords of the warehousing-capable enterprises with the matching type of suspected matching enterprises in different platforms, extracting the platforms of the keywords, and taking the platforms as a matching extraction platform; And determining whether the data processing strategy needs to be optimized or not based on the matching extraction platform data of the warehouse-in enterprises with the matching type of suspected matching enterprises.

Description

Data enterprise identification and industrial chain generation method based on big data and AI The invention belongs to the technical field of large model application, and particularly relates to a data element based on large data and a digital economic industry chain generation method. Background With the rapid development of digital economies, governmental and industry authorities in various places are pressing the need to accurately identify and categorize digital economies and related enterprises of data elements to formulate accurate industry policies and sponsor strategies. The main digital economy and data enterprise identification method at present comprises a traditional manual investigation mode, enterprise information is collected through modes of on-site visit, questionnaire investigation, street filling and the like, and enterprise classification and study judgment are carried out by relying on expert experience. The method is high in accuracy, but large in coverage limitation, limited to the identified enterprises, incapable of being identified and incapable of covering a large number of potential under-specification data enterprises and traditional industry enterprises in transformation, so that large errors exist in enterprise samples, and meanwhile, time cost and labor cost are huge. In order to solve the technical problems, the prior art scheme provides a solution for constructing an industrial chain based on big data technology in an invention patent application CN120087812a, which is a big data-based industrial chain generation method and system, and the patent scheme comprises the steps of designing an industrial chain main structure and each node keyword table, and defining the classification of the upstream, downstream and the industrial chain, including a main node, a sub node, a branch node, each node name and the included keywords; the method comprises the steps of collecting and obtaining multidimensional industrial data such as business data, bidding, public opinion, patents and soft-copy information of an enterprise, establishing an enterprise base database, screening the enterprise from the enterprise base database according to industrial chain link point keywords, incorporating an alternative uplink enterprise base, designing an enterprise uplink evaluation index system comprising four major 14 indexes including business information, intellectual property information, bidding information and public opinion information, evaluating the matching degree of dimensional information such as business operation range, technical achievements, public opinion information and bidding of the enterprise and industrial chain nodes, selecting each node lever enterprise on an industrial chain according to industrial research, calculating each uplink index of each lever enterprise, calculating vector distance between a primary screening selected enterprise and the lever enterprise, adopting a cosine distance measurement method, descending order of similarity calculation results, taking 20% of enterprise links before ordering according to pareto law, and constructing a complete industrial chain, and has the following technical problems: The reliability of screening of the uplink enterprises is poor in the form of keywords, and the problems of poor data quality, time and labor waste in data cleaning, labeling accuracy and the like are faced to the original data collection, so that if the identification model is constructed according to the type of the warehouse-in enterprises and the identification result of the multidimensional information of the enterprises, the reliability of the identification processing of the type of the enterprises is improved, and the technical problem to be solved is urgent. Aiming at the technical problems, the invention provides a data enterprise identification and industry chain generation method based on big data and AI. Disclosure of Invention The invention aims to provide a data enterprise identification and industry chain generation method based on big data and AI. In order to solve the technical problems, the invention provides a data enterprise identification and industrial chain generation method based on big data and AI, which specifically comprises the following steps: determining the type of the warehousing enterprise according to the analysis result of the policy file, and determining the matching industry list of the type of the warehousing enterprise based on the matching data of the type of the warehousing enterprise and the industry list; Determining the recognition processing result of the multidimensional information of the enterprises matching the industry list, determining a data processing strategy according to the matching situation of the enterprise data matching the industry list and the keywords of the multidimensional information, constructing a dataset of a recognition model corresponding to the type according to the keyword data obtained by t