JP-7855780-B1 - Information processing device, information processing method, and program

JP7855780B1JP 7855780 B1JP7855780 B1JP 7855780B1JP-7855780-B1

Abstract

[Problem] To provide an information processing device, information processing method, and program that can reduce the time required to search for content. [Solution] The information processing device comprises a receiving unit that receives content, a content processing unit that retrieves multiple content items from a content pool in which multiple content items are stored with information identifying their physical addresses attached, based on information identifying multiple physical addresses, generates metadata based on the multiple content items, generates a vector space that represents the characteristics of the content based on the multiple metadata, and virtually classifies the content by generating multiple clusters from the multiple content items based on the semantic similarity of the vector space and the content, and a display control unit that displays the results of the virtual classification of the content on a display unit. [Selection Diagram] Figure 1

Inventors

門奈丈博
岡恭典
森脇健太
門岡良昌
齋藤怜雄
木ノ上アーリヤ
大嶋彈

Assignees

株式会社シーイーシー

Dates

Publication Date: 20260508
Application Date: 20251002

Claims (18)

The reception desk that accepts content, A content processing unit virtually classifies content by: obtaining multiple content items from a content pool in which information identifying a physical address is assigned to each of the multiple content items received by the reception unit and stored; obtaining multiple content items based on each of the information identifying a physical address; generating metadata based on each of the obtained multiple content items; generating a vector space representing the characteristics of the content based on the generated multiple metadata; and generating multiple clusters from the multiple content items based on the generated vector space and the semantic similarity of the content. The content processing unit causes the display control unit to display the results of the virtual classification of the content on the display unit, Equipped with , The content processing unit generates k clusters (where k is an integer k > 1) from a plurality of content, and optimizes each of the k clusters based on the average distance between any content and other content included in the cluster containing the arbitrary content, and the average distance between any content and the cluster closest to it .
The aforementioned content processing unit, A content pool crawler that receives information identifying the physical addresses assigned to the content stored in the aforementioned content pool, A metadata generation unit obtains information identifying a physical address from the content pool crawler, obtains content from the content pool based on the obtained information identifying a physical address, and generates metadata based on the obtained content. A vector space generation unit generates an N-dimensional (where N is an integer N > 1) vector space as metadata that represents the characteristics of the content based on multiple metadata, An embedding processing unit that embeds content based on metadata into the N-dimensional vector space generated by the vector space generation unit, A hierarchical structure generation unit generates k clusters (where k is an integer k > 1) from multiple contents embedded in the N-dimensional vector space, based on the semantic similarity of the contents. Equipped with, The N is the number of attribute items to be vectorized among the plurality of metadata. The information processing apparatus according to claim 1.
The hierarchical structure generation unit acquires a plurality of metadata generated based on each of the plurality of contents, vectorizes each of the acquired plurality of metadata, derives the distance between the vectorized plurality of metadata for each metadata, and generates k clusters based on the derived plurality of distances. The information processing apparatus according to claim 2.
The hierarchical structure generation unit creates a distance matrix based on the distances between the vectorized metadata, and generates k clusters based on the created distance matrices. The information processing apparatus according to claim 3.
The hierarchical structure generation unit weights the distances between the vectorized metadata and generates a distance matrix. The information processing apparatus according to claim 4.
The vector space generation unit creates M-dimensional vector data (where M is an integer M > 1) representing the attributes of the content creator, and generates an M-dimensional vector space. The embedding processing unit embeds the content into an N+M dimensional vector space. The information processing apparatus according to claim 2.
The vector space generation unit creates R-dimensional (where R is an integer R > 1) vector data based on management rules, and generates an R-dimensional vector space. The aforementioned embedding processing unit embeds the content into an N+M+R dimension vector space. The information processing apparatus according to claim 6 .
The vector space generation unit creates T-dimensional vector data (where T is an integer T > 1) that represents time attributes , and generates a T-dimensional vector space. The aforementioned embedding processing unit embeds the content into an N+M+R+T dimension vector space. The information processing apparatus according to claim 7 .
The aforementioned content processing unit, A pseudo-metadata search unit that generates pseudo-metadata based on a search prompt and searches for content based on the created pseudo-metadata. Equipped with, The display control unit causes the content retrieved by the pseudo-metadata search unit to be displayed on the display unit. The information processing apparatus according to claim 2.
The metadata generation unit, if there is any missing information in the metadata generated based on the content, creates information to query for the missing information, retrieves the information entered in the created query information, and completes the metadata. The information processing apparatus according to claim 2.
The aforementioned pseudo-metadata search unit derives the characteristics of the content based on the search results for the content, The display control unit causes the display unit to display information indicating the characteristics of the content derived by the pseudo-metadata search unit. The information processing apparatus according to claim 9 .
The aforementioned pseudo-metadata search unit derives questions to narrow down the content based on the characteristics of the derived content, The display control unit causes the display unit to display information indicating a question for narrowing down the content derived by the pseudo-metadata search unit. The information processing apparatus according to claim 11 .
The aforementioned pseudo-metadata search unit, if there is missing information to search for content , creates information to query for the missing information. The information processing apparatus according to claim 9 .
The aforementioned receiving unit receives information indicating a hierarchical structure that represents the content in a hierarchical manner. The hierarchical structure generation unit generates clusters from multiple contents based on the information indicating the hierarchical structure . The information processing apparatus according to claim 2.
The aforementioned hierarchical structure generation unit generates k clusters from multiple content items in the background. The information processing apparatus according to claim 2.
The aforementioned hierarchical structure generation unit uses a large-scale language model to set the name of each of the k clusters. The information processing apparatus according to claim 2.
A method of information processing performed by a computer, Accepting multiple content, From a content pool in which information identifying the physical address is assigned to each of the received content items and stored, multiple content items are retrieved based on each of the pieces of information identifying the physical address. Metadata is generated based on each of the multiple acquired contents, and a vector space representing the characteristics of the contents is generated based on the multiple generated metadata. Based on the semantic similarity between the generated vector space and the content, multiple clusters are generated from multiple content pieces, thereby virtually classifying the content. The results of the virtual classification of the content are displayed on the display unit . When virtually classifying the aforementioned content, k clusters (where k is an integer k > 1) are generated from the multiple content items, and for each of the k clusters, the k clusters are optimized based on the average distance between any content item and other content items included in the cluster containing that content item, and the average distance between any content item and the cluster closest to that content item . Information processing methods.
On the computer, Accept multiple types of content, From a content pool in which information identifying the physical address is assigned to each of the multiple pieces of content that have been received and stored, multiple pieces of content are retrieved based on each of the pieces of information identifying the physical address. Based on each of the multiple retrieved content items, metadata is generated, and based on the generated metadata, a vector space representing the characteristics of the content is generated. Based on the semantic similarity between the generated vector space and the content, multiple clusters are generated from multiple content pieces, thereby virtually classifying the content. The results of the virtual classification of the content are displayed on the display unit . When virtually classifying the aforementioned content, k clusters (where k is an integer k > 1) are generated from multiple content items, and for each of the k clusters, the k clusters are optimized based on the average distance between any content item and other content items included in the cluster containing that content item, and the average distance between any content item and the cluster closest to that content item . program.

Description

This invention relates to an information processing device, an information processing method, and a program. The search server uses the entered keywords to retrieve documents related to those keywords from its stored documents as search results. The search server then provides the search results to the user's terminal device, allowing the user to view the results on their terminal device. A technique is known for clustering content found through search results according to its degree of similarity (see, for example, Patent Document 1). Japanese Patent Publication No. 2005-078245 This figure shows an example of an information processing device according to this embodiment.This figure shows an example of user attribute information.This figure shows an example of content metadata.This is a diagram illustrating an example of the processing performed by the information processing device of this embodiment.This is a diagram illustrating an example of the processing performed by the information processing device of this embodiment.This is a diagram illustrating an example of the processing performed by the information processing device of this embodiment.This is a diagram showing an example of a tendrogram.This figure shows an example of creator attribute information.This is a flowchart showing an example of the operation flow of the information processing device of this embodiment.This is a flowchart showing an example of the operation flow of the information processing device of this embodiment.This is a diagram illustrating an example of the processing performed by the information processing device of this embodiment.This is a flowchart showing an example of the operation flow of the information processing device of this embodiment.This is a flowchart showing an example of the operation flow of the information processing device of this embodiment.This is a flowchart showing an example of the operation flow of the information processing device of this embodiment.This is a flowchart showing an example of the operation flow of the information processing device of this embodiment.This flowchart shows another example of the operation flow of the information processing device of this embodiment.This figure shows an example of a content classification screen.This figure shows an example of an information processing device that is a modified version of the embodiment.This is a diagram illustrating an example of a hierarchical structure.This flowchart shows an example of the operation flow of an information processing device as a modified embodiment.This flowchart shows an example of the operation flow of an information processing device as a modified embodiment.This flowchart shows an example of the operation flow of an information processing device as a modified embodiment.This flowchart shows an example of the operation flow of an information processing device as a modified embodiment. The information processing apparatus, information processing method, and program of the embodiments will be described below with reference to the drawings. The embodiments described below are merely examples, and the embodiments to which the present invention is applied are not limited to the embodiments described below. In all the figures used to illustrate the embodiments, components with the same function are given the same reference numerals, and repeated explanations are omitted. Furthermore, "based on XX" as used in this application means "based on at least XX," and includes cases where it is based on another element in addition to XX. Also, "based on XX" is not limited to cases where XX is used directly, but also includes cases where it is based on something that has been calculated or processed. "XX" is any element (for example, any information). In this application, "to acquire" is not limited to actively acquiring information by sending a transmission request, but may also include acquiring information by passively receiving information transmitted from another device. Furthermore, "to acquire" is not limited to directly acquiring the target information (information to be acquired) from an external source, but may also include acquiring the target information by generating it through calculations or processing of information obtained from an external source. (Embodiment) (Information processing device) The information processing device 100 in this embodiment creates management rules for content such as documents and manages the content based on the created management rules. Furthermore, the information processing device 100 classifies the content being managed based on the management rules. This classification may be performed automatically. Figure 1 shows an example of the information processing device 100 in this embodiment. The information processing device 100 is implemented by a device such as a personal computer, server, smartphone, tablet computer, or industrial computer. The information processing device 100 receives