CN-115730598-B - Focal entity attribute prediction technology based on natural language processing

CN115730598BCN 115730598 BCN115730598 BCN 115730598BCN-115730598-B

Abstract

The invention belongs to the field of natural text analysis and prediction, and is used for predicting entity attributes of focus types. The invention provides a focus entity attribute prediction technology based on natural language processing, which not only can output common objective predicted values in a current attribute prediction model, but also can inductive output different expected values or evaluations aiming at different attributes of an entity in text data. According to the method, a larger related entity set or iteration set is constructed according to the entity set and the attribute set provided by the user, and a large amount of text data is crawled for basic attribute prediction and attribute development trend prediction. The expected value and evaluation output by the invention can be used for guiding the optimization of different entities or predicting the development direction of the different entities, and aiming at the problems of low utilization rate of current text data and small mining amount of hidden information, the invention improves the capability of accurately acquiring the information of the focus entity and provides richer and valuable data for downstream application of attribute prediction.

Inventors

FENG YANG
SUN JINGYU
TAN JIAJUN
LI YUYING
LIU ZIXI

Assignees

南京大学

Dates

Publication Date: 20260512
Application Date: 20220909

Claims (4)

1. A focus entity attribute prediction method based on natural language processing is characterized by analyzing emotion orientation views in equipment text data obtained through retrieval based on natural semantic understanding, relating text contents to key attributes of specific entities through positive and negative emotion directions to predict attribute values of the entities, constructing an entity set with high association or iteration relation for a single focus entity, and reasoning whether view evaluation of the existing natural language text on the attributes is reasonable according to the associated entity attribute values so as to judge possible development directions and optimization directions of the entities, wherein the method comprises the following steps: 1) The method comprises the steps of acquiring entity data, providing an entity list and key attributes in a specific field by a user, guiding a focus entity attribute prediction method to collect related data and guiding a focus entity attribute prediction method to analyze basic descriptive attributes in a prediction process, firstly acquiring a complete name of an entity in the entity list, then searching and acquiring a related entity with high correlation degree by a search engine, selecting a front type/level or a subtype/level entity of the entity with iteration relation, reconstructing entity data to be collected according to the iteration relation and the related relation of Huawei mobile phones in the same series, and searching the entity name by a crawler to collect data sources, time, material titles and material text information; 2) Extracting text abstracts A { a1, a2, a3...degree } of news content D, wherein the text abstracts A { a1, a2, a3...degree } are firstly extracted, and based on the requirement of the final data display material support, the obtained equipment data text materials are different in length and have overlong text materials, so that the final display is disordered, the first step is to extract the abstracts of the equipment data text materials, text titles and text contents are firstly confirmed based on a text rank algorithm which is derived from a browser content recommendation algorithm PageRank principle in the text abstracts extraction, the frequency calculation is carried out simultaneously by word segmentation processing on the text titles, the weights of the text titles are recorded to obtain word segmentation weight tables T { w1:r1, w2: r2...wnrn }, wherein wn represents each word in a word segmentation set and is the corresponding weight of the word, the word sentence weight table T is obtained, the word sentence segmentation weight is divided in the text sentence, and the text sentence is classified according to the requirement of the word sentence classification weight set, and the final sentence classification is obtained by the word sentence classification weight, and the final sentence classification is obtained by the weight of the word classification set; 3) Predicting data attribute; the key information extraction based on the material text is realized, the dominant and inferior characteristics, the advantages and disadvantages and other defects of equipment in the text material are extracted, and the emotion orientation and the views of a text writer or a behavior initiator are extracted; in the actual processing process, the original material for extracting key information is different from the traditional comment in volume and destination, so that adjustment is required, the text material is excessively large in content, if the text material is directly taken as an extraction material, the key information cannot be captured, so that the text material is subjected to slicing processing, a sentence set S is constructed for the text material d1, S2 and s3.. Sn, a keyword in different slices is directly extracted for a keyword list given by a user to conduct conventional attribute value prediction, the sentence extraction of each slice is regarded as a single comment, the viewpoint extraction is realized by using a PADDLENLP comment viewpoint extraction tool, the basic principle is that the viewpoint extraction is conducted in a sequential labeling mode, specifically, the attribute in the comment and the viewpoint corresponding to the attribute are extracted, the comment viewpoint is purposefully reviewed, a comment text string is specifically transmitted into a SKEP model, the text string is subjected to semantic coding by using the SKEP model, then the corresponding label is predicted based on the output of each position, the extracted keyword and the comment is subjected to the attribute and the original sentence string is subjected to independent training of the attribute and the viewpoint, the comment is spliced by using the attribute and the viewpoint string and the original sentence is spliced, the viewpoint is spliced by using the attribute and the attribute is subjected to training vector of the method of splicing, and the method of the method is 34 '', finally, obtaining a viewpoint extraction result of each sentence slice of each news text by using the method, screening the extraction result to remove useless information caused by original material errors, and finally constructing advantages, disadvantages, advantages and disadvantages and other defects based on each device; carrying out quantity detection on the final result of each predicted value and the viewpoint attribute, namely, regarding all text materials with the same root and the same text as the same material, and considering that the predicted result is credible if the frequency F of the predicted attribute value is more than 75% of the total number of the materials in all non-repeated materials; 4) The method comprises the steps of analyzing data attributes, predicting the data attributes, obtaining two types of results, namely, attribute values of key attributes given by users and views of the data attributes by text writers, evaluating and verifying the attribute values and evaluation views based on related entities or iterative entities in the step, comparing key values v0 and v1 of the same key attribute A of the entity E0 and the entity E1 which are secondary or highly related entities, finding out key attribute values with larger differences based on the data analysis, and simultaneously comparing the views of the writers of the key attributes, judging whether the views of the writers tend to be smaller or not if the views of the writers are larger than or equal to the views of the writers, and recognizing that the views of the writers have credibility if the views of the writers are smaller, so that the views are possible development directions or optimization directions are considered.
2. The method of claim 1, wherein in step 1), not only is a single isolated entity initialized, but also a complete iterative update path or a highly associative relationship table is constructed so that its evolving or degraded professional attributes are identified for confirming the reliability of subsequent opinion evaluation.
3. The method of claim 1, wherein in step 2), the new abstract is obtained by weighting and finally reorganizing keywords appearing in each sentence in the text based on the frequency weight of the title word segmentation.
4. The method for predicting attributes of a focused entity based on natural language processing as set forth in claim 1, wherein in step 4), after the viewpoint is extracted in step 3), it is evaluated based on objective predicted values of the attributes to verify credibility and feasibility, thereby ensuring that the viewpoint outputted by the present invention conforms to the development direction or expected development direction of the focused entity.

Description

Focal entity attribute prediction technology based on natural language processing Technical Field The invention belongs to the field of natural text analysis and prediction, and is particularly suitable for predicting the entity attribute of a focus type. The method aims to solve the problems of low utilization rate of current Internet text data and poor analysis performance, improve the capability of accurately acquiring the entity information in the focus of a specific field and perform attribute analysis and prediction. The invention combines natural language understanding and analysis to infer attribute characteristics of an entity, points of interest of others, and expected development thereof. Background Predictive models are applied in a number of fields, the property prediction being a popular application. The data sources according to attribute prediction are generally classified into text and images, and the predicted focused entities are usually non-biological entities such as products, articles and biological entities such as human beings and animals. Taking the prediction of user attributes as an example, the detection of a target entity is set as an internet user based on the prediction of text, behavior track information, such as comments, collections, praise and the like, of the target entity in a social network, a social forum or a news website is collected, the behavior information is extracted and the characteristics of the behavior information are analyzed into a prediction model, and the attribute characteristics of the user, such as age, occupation, interest points and the like, are predicted. The user attribute prediction based on the image is carried out by the image of a given person, the details on the image are utilized to analyze and summarize the image, the set target attribute predicted values such as age, height and sex are output, and the attribute characteristics such as behavior habit, physical condition and the like in a certain period can be further analyzed based on the prediction of the video image. The attribute prediction model is characterized in that (1) information sources are generated or provided for behaviors of a focus entity, data of the information sources are objective words or images for describing subjective behaviors, and (2) a prediction result is descriptive text and aims to practically and objectively describe the behaviors and characteristics of the focus entity. Further analysis of attributes is still required when using the predicted outcome by downstream applications. The invention provides an entity attribute prediction technology based on natural language processing. The invention provides a set of flow templates comprising data collection, analysis and prediction for entity attribute prediction in the focus field, focuses on the text of a subjective evaluation focus entity, generates viewpoint evaluation with emotion colors on the focus entity, and finally summarizes and predicts certain attributes of the entity under the network view angle through a statistical method. Aiming at the existing attribute prediction model, the invention further improves the range of the attribute prediction of the Internet data acquisition, digs valuable attribute characteristic information in the existing data and supports more intelligent data analysis technology. Disclosure of Invention The invention effectively solves the problems of low text data utilization rate and poor analysis performance existing at present by providing the focus entity attribute prediction technology based on natural language processing, thereby improving the capability of accurately acquiring focus entity information in a specific field and carrying out attribute analysis and prediction, and providing valuable potential attribute information except objective description for downstream data application. In order to achieve the above object, a technology for predicting a focused entity attribute based on natural language processing is proposed, wherein a specific person or object is not specified, a focused entity list and a focused attribute in a specified field are provided, emotion tendencies and viewpoints are analyzed by collecting news texts and comment texts of the internet, and a predicted value including the focused attribute and information such as a focus, a viewpoint, and a desire of a network user on the entity are extracted. Specifically, the method comprises the following steps: 1) And (5) collecting entity data. The user provides entity lists and key attributes of specific fields, which are used for guiding the invention to collect relevant data and guiding the prediction process to analyze the basic descriptive attributes in a key way. For the entities in the entity list, firstly, the complete names of the entities are obtained, then, the associated entities with high correlation degree are retrieved and obtained through a search engine, for the entities with iteration re