Search

US-12625887-B2 - System and method for smart categorization of content in a content management system

US12625887B2US 12625887 B2US12625887 B2US 12625887B2US-12625887-B2

Abstract

In accordance with an embodiment, systems and methods described herein can be used, for example with a content management system, to provide recommendations to categorize/classify content into user-defined categories, which in turn provides an opportunity for content managers to place new content into accurate categories effortlessly, based on previously evaluated/categorized content. A recommendation system or tool can use artificial intelligence (AI) techniques to continuously learn from past data, and assist in placing content into a relevant category through automatic categorization/classification of newly created/edited content. The recommendation tool can be implemented and applied across diverse domains by generating feature vectors from contents, creating clusters in the feature space based on previously categorized content, and recommending a category for new content through feature space distance calculation from the clusters.

Inventors

  • Sandip Ghoshal
  • SREEHARSHA KAMIREDDY
  • JASWANTH MARYALA
  • Vivek Peter
  • Hareesh S Kadlabalu

Assignees

  • ORACLE INTERNATIONAL CORPORATION

Dates

Publication Date
20260512
Application Date
20240620
Priority Date
20181018

Claims (20)

  1. 1 . A system for smart categorization of content in a content management system, comprising: a content categorization engine provided at one or more computers, the content categorization engine having access to a taxonomy tree comprising a hierarchical classification structure having a plurality of nodes, wherein the nodes of the taxonomy tree represent categories with which content items can be associated, and wherein each of a plurality of categories or nodes within the taxonomy tree is assigned a cluster that comprises a summarized feature space representation of the feature vectors associated with contents present in that cluster; and a recommendation system that generates feature vectors associated with content items at a content management system, the recommendation system having access to a database of the content categorization engine; wherein the feature vectors associated with the content items are generated based on an evaluation of clusters of content within the taxonomy, including: determining, for a new content item, a plurality of topics associated therewith, and calculating a score for the new content item that represents a probability of the new content item belonging to a particular category represented by the taxonomy tree, and subsequently assessing a similarity between feature vectors associated with the new content item and those of previously determined clusters, and associating, with the new content item, a category associated with a cluster determined in feature space to be similar to that of the new content item.
  2. 2 . The system of claim 1 , wherein the recommendation system creates clusters in feature space based on previously categorized content.
  3. 3 . The system of claim 2 , wherein the recommendation system generates one or more recommendations for the new content into the taxonomy through feature space distance calculation from the clusters.
  4. 4 . The system of claim 1 , wherein the recommendation system is used to create a new taxonomy or modify the taxonomy.
  5. 5 . The system of claim 1 , wherein the database of the content categorization engine comprises a historical record of user acceptances of prior categorization recommendations; and wherein the database of the content categorization engine comprises a historical record of user rejections of prior categorization recommendations.
  6. 6 . The system of claim 5 , wherein the recommendation system generates one or more recommendations for the new content into the taxonomy based upon the historical record of user acceptances of prior categorization records and the historical record of user rejections of prior categorization recommendations.
  7. 7 . The system of claim 1 , wherein the recommendation system generates a recommendation for a creation of a new category within the taxonomy for a plurality of uncategorized content items.
  8. 8 . A method for smart categorization of content in a content management system, comprising: providing a content categorization engine at one or more computers, the content categorization engine having access to a taxonomy tree comprising a hierarchical classification structure having a plurality of nodes, wherein the nodes of the taxonomy tree represent categories with which content items can be associated, and wherein each of a plurality of categories or nodes within the taxonomy tree is assigned a cluster that comprises a summarized feature space representation of the feature vectors associated with contents present in that cluster; and generating, by a recommendation system, feature vectors associated with content items at a content management system, the recommendation system having access to a database of the content categorization engine; wherein the feature vectors associated with the content items are generated based on an evaluation of clusters of content within the taxonomy, including: determining for a new content item a plurality of topics associated therewith, and calculating a score for the new content item that represents a probability of the new content item belonging to a particular category represented by the taxonomy tree, and subsequently assessing a similarity between feature vectors associated with the new content item and those of previously determined clusters, and associating, with the new content item, a category associated with a cluster determined in feature space to be similar to that of the new content item.
  9. 9 . The method of claim 8 , further comprising: creating, by the recommendation system, clusters in feature space based on the previously categorized content within the taxonomy.
  10. 10 . The method of claim 9 , further comprising: generating, by the recommendation system, one or more recommendations for the new content into the taxonomy through feature space distance calculation from the clusters.
  11. 11 . The method of claim 8 , wherein the recommendation system is used to create a new taxonomy or modify the taxonomy.
  12. 12 . The method of claim 8 , wherein the database of the content categorization engine comprises a historical record of user acceptances of prior categorization recommendations; and wherein the database of the content categorization engine comprises a historical record of user rejections of prior categorization recommendations.
  13. 13 . The method of claim 12 , further comprising: generating, by the recommendation system, one or more recommendations for the new content into the taxonomy based upon the historical record of user acceptances of prior categorization records and the historical record of user rejections of prior categorization recommendations.
  14. 14 . The method of claim 8 , further comprising: generating, by the recommendation system, a recommendation for a creation of a new category within the taxonomy for a plurality of uncategorized content items.
  15. 15 . A non-transitory computer readable storage medium having instructions thereon, which when read and executed by one or more computer cause the computer to perform a method comprising: providing a content categorization engine, the content categorization engine having access to a taxonomy tree comprising a hierarchical classification structure having a plurality of nodes, wherein the nodes of the taxonomy tree represent categories with which content items can be associated, and wherein each of a plurality of categories or nodes within the taxonomy tree is assigned a cluster that comprises a summarized feature space representation of the feature vectors associated with contents present in that cluster; and generating, by a recommendation system, feature vectors associated with content items at a content management system, the recommendation system having access to a database of the content categorization engine; wherein the feature vectors associated with the content items are generated based on an evaluation of clusters of content within the taxonomy, including: determining for a new content item a plurality of topics associated therewith, and calculating a score for the new content item that represents a probability of the new content item belonging to a particular category represented by the taxonomy tree, and subsequently assessing a similarity between feature vectors associated with the new content item and those of previously determined clusters, and associating, with the new content item, a category associated with a cluster determined in feature space to be similar to that of the new content item.
  16. 16 . The non-transitory computer readable storage medium of claim 15 , the method further comprising: creating, by the recommendation system, clusters in feature space based on the previously categorized content within the taxonomy.
  17. 17 . The non-transitory computer readable storage medium of claim 16 , the method further comprising: generating, by the recommendation system, one or more recommendations for the new content into the taxonomy through feature space distance calculation from the clusters.
  18. 18 . The non-transitory computer readable storage medium of claim 15 , wherein the recommendation system is used to create a new taxonomy or modify the taxonomy.
  19. 19 . The non-transitory computer readable storage medium of claim 15 , wherein the database of the content categorization engine comprises a historical record of user acceptances of prior categorization recommendations; and wherein the database of the content categorization engine comprises a historical record of user rejections of prior categorization recommendations.
  20. 20 . The non-transitory computer readable storage medium of claim 19 , the method further comprising: generating, by the recommendation system, one or more recommendations for the new content into the taxonomy based upon the historical record of user acceptances of prior categorization records and the historical record of user rejections of prior categorization recommendations.

Description

CLAIM OF PRIORITY AND CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application titled “SYSTEM AND METHOD FOR SMART CATEGORIZATION OF CONTENT IN A CONTENT MANAGEMENT SYSTEM”, application Ser. No. 17/486,524, filed Sep. 27, 2021; which application claims the benefit of priority to U.S. Provisional patent application titled “SYSTEM AND METHOD FOR SMART CATEGORIZATION OF CONTENT IN A CONTENT MANAGEMENT SYSTEM”, Application No. 63/084,174, filed Sep. 28, 2020, and is a continuation-in-part of and claims the benefit of priority to U.S. patent application titled “TECHNIQUES FOR RANKING CONTENT ITEM RECOMMENDATIONS”, application Ser. No. 16/657,395, filed Oct. 18, 2019, issued as U.S. Pat. No. 11,200,240 on Dec. 14, 2021; which is a continuation-in-part of and claims the benefit of priority to U.S. patent application titled “SMART CONTENT RECOMMENDATIONS FOR CONTENT AUTHORS” application Ser. No. 16/581,138, filed Sep. 24, 2019, issued as U.S. Pat. No. 11,163,777 on Nov. 2, 2021; which claims the benefit of priority to India Provisional patent application titled “SMART CONTENT RECOMMENDATIONS FOR AUTHORS”, Application 201841039495, filed Oct. 18, 2018; each of which above applications and their contents are herein incorporated by reference. COPYRIGHT NOTICE A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. FIELD OF TECHNOLOGY This application is generally related to online commerce environments, and management and delivery of content data, and is particularly directed to smart categorization/classification of content in a content management system. BACKGROUND Generators and authors of original content for online publication and/or transmission may use a variety of different software-based tools and techniques for generating, editing, and storing the newly generated content. In a content management system, content of various kinds (e.g., documents, structured content like blogs, articles, press releases; and media files like images and videos) often need to be evaluated/categorized based on their content. Such categorization/classification happens across a hierarchical set of categories or nodes. For example, a contract document for real estate lease may be evaluated/categorized under legal documents→real estate→contracts. It is also possible for the same document (or content) to be in more than one categorization/classification at the same time. For example, the same contract document may exist under active contracts→signed. Categories are grouped under an organizing concept called taxonomy. Organizations tend to have many taxonomies that mirror their business organization for content. When a new document or content item is added, or when new taxonomies come into existence, or there are significant changes in content organization, the task of correctly classifying or re-classifying content falls to end users (or content authors). This can be an expensive and error prone undertaking when the amount of content as well as the number of taxonomies grow. One of the persistent problems in the space of content management is to ease the job of content authors by making recommendations for categorization/classification. SUMMARY In accordance with an embodiment, systems and methods described herein can be used, for example with a content management system, to provide recommendations to categorize/classify content into user-defined categories, which in turn provides an opportunity for content managers to place new content into accurate categories effortlessly, based on previously evaluated/categorized content. Classifying an enormous amount of content in an online fashion is a complex task that involves challenges such as the single pass constraint over the data, and the requirement for fast response. In accordance with an embodiment, content users categorize similar content through a logical cluster such as a hierarchical taxonomy tree, and place the similar content in the same node/category of the taxonomy tree. Over time, as both the number of content entities and the nodes in the taxonomy tree grows, similar content entities will find themselves residing alongside one another in a node. Given this state of the content organization, the content residing inside an already evaluated/categorized taxonomy can be used by a computer algorithm to determine where a newly created/edited may belong. In accordance with an embodiment, a recommendation system or tool can use artificial intelligence (AI) techniques to continuously learn from past data, and assist in placing content into a relevant category through automatic categorization/classification of newly created/edited conte