CN-121980012-A - Cell database-oriented data quick retrieval method and system
Abstract
The invention relates to the technical field of database retrieval, in particular to a cell database-oriented data rapid retrieval method and system. Obtaining intention fit degree according to the text difference degree of each searched item and the searched text, the category difference condition of cells in the content text of each item and the rest items, and the item click condition, obtaining interest trend degree according to the appearance condition and the intention fit degree of cell names stored in each node of the cell classification tree in the content text of each searched item, and the click condition of each search result, obtaining the user trend degree of the currently searched item according to the interest trend degree of the cell names stored in all nodes of the cell classification tree before the current search, obtaining the integral fit degree by combining the text difference degree, and arranging the items in sequence by utilizing the integral fit degree to obtain a search display sequence. The invention combines the user behavior and the content difference to avoid the intention inconsistent interference, combines the interest trend of the user and improves the retrieval efficiency.
Inventors
- YAN ZHENGLONG
- CHEN MENGYAO
- ZHAO YUAN
- QIAN MENGJIE
Assignees
- 苏州双洳生物科技有限公司
- 邯郸海豚百川科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260407
Claims (10)
- 1. A cell database-oriented data quick retrieval method is characterized by comprising the following steps: Searching the cell database to obtain an input text of the current search and a plurality of previous searches, a name text and a content text of each item obtained by the search, and a cell classification tree; acquiring the intention fit degree of each item searched each time according to the text difference degree of each item searched each time and the search text, the category difference condition of cells in the content text of each item and the rest items, and the click condition of a user on each item; Acquiring interest tendency of a corresponding node in each search according to the condition that cell names stored in each node of a cell classification tree appear in the content text of each searched item and the intention fitting degree and the click condition of a user on each searched item; And according to the text difference degree of each item currently searched and the search text and the user tendency degree, acquiring the overall fitting degree of each item currently searched, and arranging the item sequence currently searched based on the overall fitting degree to obtain a search display sequence.
- 2. The method for quickly retrieving data from a cell-oriented database according to claim 1, wherein the method for obtaining the text difference between each item retrieved and the retrieved text comprises the steps of: The method comprises the steps of obtaining the editing distance between each item of search text and each item of search text, and taking the ratio of the editing distance to the character length of the search text as the character difference degree between each item of search and the search text; Converting the search text and the name text into semantic vectors respectively, calculating cosine similarity between the two semantic vectors, and recording the cosine similarity between each item searched each time and the search text; And acquiring the text difference degree of each item searched each time and the searched text according to the character difference degree and the semantic similarity.
- 3. The method for quickly retrieving data from a cell-oriented database according to claim 1, wherein the step of obtaining the intended fit of each item retrieved each time comprises: Counting the occurrence times of cell names stored in all nodes of the cell classification tree in the content text of each item searched each time, and selecting a node corresponding to the maximum occurrence times as an analysis node of the corresponding item; Obtaining the category difference degree of the corresponding two items according to the position distribution of the analysis nodes of any two items searched each time in the cell classification tree; And obtaining the intention fit degree of each item searched each time according to the click label value, the category integral difference and the text difference degree of each item searched each time.
- 4. A method for quickly retrieving data from a cell-oriented database according to claim 3, wherein said obtaining a class difference corresponding to two entries comprises: Determining the nearest common ancestor of the analysis node of any two items retrieved at a time in a cell classification tree; The calculation formula of the category difference degree of any two items retrieved at each time is as follows: In the formula (I), in the formula (II), A category difference degree between the a-th item and the b-th item retrieved for each time; depth in the cell classification tree of the nearest common ancestor of the analysis node of the a-th and b-th entries for each retrieval; The depth of the analysis node of item a in the cell classification tree for each retrieval; The depth of the analysis node in the cell classification tree for the b-th entry retrieved at a time.
- 5. The method for quickly retrieving data from a cell-oriented database according to claim 3, wherein the step of obtaining the interest tendency of the corresponding node in each retrieval comprises: selecting one node from the cell classification tree as an example node, and recording the entry of which the cell name stored by the example node exists in the content text in all the entries searched each time as an example node to analyze the entry in each search; The proportion of the analysis items in all items searched each time is recorded as the occurrence ratio of the example node in each search; Taking the average value of the intention fit degrees of all analysis items in each search as the integral fit value of an example node in each search; averaging the click label values of all the items searched each time to obtain a comprehensive click value searched each time; and obtaining the interest tendency of the example node in each retrieval according to the appearance ratio, the integral fitting value and the comprehensive click value.
- 6. The method for quickly retrieving data from a cell-oriented database according to claim 5, wherein said obtaining the user propensity of each item currently retrieved comprises: Normalizing the average value of the interest tendency degrees searched for a plurality of times before the current search of the example node to obtain the overall tendency degree of the example node; And determining the depth of the association node with the overall tendency degree of each item which is not zero in the cell classification tree, wherein the maximum depth corresponds to the overall tendency degree of the association node and is used as the user tendency degree of the corresponding item.
- 7. The method for quickly retrieving data from a cell-oriented database according to claim 2, wherein the step of obtaining the overall fit of each item currently retrieved comprises: And carrying out negative correlation mapping on the text difference degree of each item currently retrieved, and carrying out normalization processing on the product of the mapping result and the user tendency degree to obtain the integral fitting degree of the corresponding item.
- 8. The method of claim 1, wherein the entries in the search display sequence from front to back are arranged in descending order of the overall fitness.
- 9. The method for quickly retrieving data in a cell-oriented database according to claim 3, wherein the click label value is positively correlated with the intended fit, and the category overall difference and the text difference are negatively correlated with the intended fit.
- 10. A cell database-oriented data rapid retrieval system, the system comprising: The data acquisition module is used for searching the cell database to respectively acquire an input text of the current search and the previous search, a name text and a content text of each item obtained by the search, and a cell classification tree; The intention fit analysis module is used for acquiring the intention fit degree of each item searched each time according to the text difference degree of each item searched each time and the searched text, the category difference condition of cells in the content text of each item and the rest items and the click condition of a user on each item; The user trend analysis module is used for acquiring the interest trend of the corresponding node in each search according to the occurrence condition of the cell names stored in each node of the cell classification tree in the content text of each searched item and the intention fitting degree and the click condition of the user on each searched item; the quick retrieval module is used for acquiring the overall fitting degree of each item currently retrieved according to the text difference degree of each item currently retrieved and the retrieval text and the user tendency degree, and obtaining a retrieval display sequence based on the overall fitting degree and the sequential arrangement of the items currently retrieved.
Description
Cell database-oriented data quick retrieval method and system Technical Field The invention relates to the technical field of database retrieval, in particular to a cell database-oriented data rapid retrieval method and system. Background Cell databases refer to specialized databases dedicated to collecting, storing and managing cell-related data, the data types of which typically encompass cell names, sources, culture conditions, cell images, and proteomic and genomic expression profiles, among others. Currently, a user mainly performs inquiry by inputting a cell name in a front-end interface so as to acquire detailed information of a target cell. Along with the rapid increase of the cell data scale and the increasing complexity of the data dimension, the realization of efficient and accurate data retrieval is important to the improvement of scientific research efficiency. The existing search technology based on text matching realizes cell database search and can quickly return literally related cell entries. However, based on text matching, it is difficult to effectively distinguish and exclude interference items with similar text forms and actually far different biological semantics or user interest intentions, so that a large number of irrelevant results are mixed, judgment of users is seriously interfered, and additional manual screening is forced to be performed, so that a large amount of time is consumed, and the retrieval efficiency and experience are reduced. Disclosure of Invention In order to solve the technical problem that the search efficiency is reduced because the text matching cannot effectively eliminate the interference results of similar texts but inconsistent semantics, the invention aims to provide a cell database-oriented data rapid search method, which adopts the following technical scheme: in a first aspect, an embodiment of the present invention provides a method for quickly retrieving data from a cell-oriented database, where the method includes: Searching the cell database to obtain an input text of the current search and a plurality of previous searches, a name text and a content text of each item obtained by the search, and a cell classification tree; acquiring the intention fit degree of each item searched each time according to the text difference degree of each item searched each time and the search text, the category difference condition of cells in the content text of each item and the rest items, and the click condition of a user on each item; Acquiring interest tendency of a corresponding node in each search according to the condition that cell names stored in each node of a cell classification tree appear in the content text of each searched item and the intention fitting degree and the click condition of a user on each searched item; And according to the text difference degree of each item currently searched and the search text and the user tendency degree, acquiring the overall fitting degree of each item currently searched, and arranging the item sequence currently searched based on the overall fitting degree to obtain a search display sequence. Further, the method for obtaining the text difference degree between each item searched each time and the searched text comprises the following steps: The method comprises the steps of obtaining the editing distance between each item of search text and each item of search text, and taking the ratio of the editing distance to the character length of the search text as the character difference degree between each item of search and the search text; Converting the search text and the name text into semantic vectors respectively, calculating cosine similarity between the two semantic vectors, and recording the cosine similarity between each item searched each time and the search text; And acquiring the text difference degree of each item searched each time and the searched text according to the character difference degree and the semantic similarity. Further, the obtaining the intention fit degree of each item retrieved each time includes: Counting the occurrence times of cell names stored in all nodes of the cell classification tree in the content text of each item searched each time, and selecting a node corresponding to the maximum occurrence times as an analysis node of the corresponding item; Obtaining the category difference degree of the corresponding two items according to the position distribution of the analysis nodes of any two items searched each time in the cell classification tree; And obtaining the intention fit degree of each item searched each time according to the click label value, the category integral difference and the text difference degree of each item searched each time. Further, the obtaining the category difference degree corresponding to the two items includes: Determining the nearest common ancestor of the analysis node of any two items retrieved at a time in a cell classification tree; The calculatio