US-12619888-B2 - End-to-end systems and methods for construct scoring

US12619888B2US 12619888 B2US12619888 B2US 12619888B2US-12619888-B2

Abstract

A construct scoring system may provide construct scores with user understandable explanations of the factors that influenced the construct score determination. The construct scoring system may include a data platform architecture which may include a data layer, a processing layer, a serving layer and a monitoring layer.

Inventors

Robert LOWINGER
Kartik THAKORE

Assignees

Artizan Technologies, Inc

Dates

Publication Date: 20260505
Application Date: 20240916

Claims (13)

1 . A system, comprising: a data layer configured to: use large language models (LLMs) to extract information identifying one or more constructs, identifying one or more constructors that constructed the one or more constructs, and identifying one or more categories of constructs, wherein individual constructs of the one or more constructs are: pre-existing; associated with at least one particular constructor of the one or more constructors that constructed the individual construct; and associated with at least one particular category of the one or more categories of constructs; store a knowledge graph of the constructors, constructs, and categories in a graph database; store structured data and unstructured data identifying the constructors, the constructs, and the categories in a relational database; and generate and store embeddings for images, texts, and metadata in the structured data and unstructured data in an embeddings database; a processing layer configured to: use data mining pipelines to analyze the structured data and unstructured data and the embeddings to update the knowledge graph; use a plurality of scoring algorithms to consume the analysis of the data mining pipelines to generate scoring indicators for the constructs, wherein: the generating the scoring indicators is configured to determine the scoring indicator of a target construct based on a weighted combination of a plurality of current popularity indicators, wherein the current popularity indicators include a percentage change between detected recent crowd sourced scores of intra-constructor constructs and historical crowd sourced scores of the intra-constructor constructs, detected recent crowd sourced scores of the inter-constructor constructs, a detected recent increased interest in intra-constructor constructs and associated sentiments, a detected recent increased interest in the target constructor and associated sentiments, and the detected recent increased interest in the related categories and the associated sentiments, and the intra-constructor constructs are other constructs of the target constructor that are similar to the target construct and the inter-constructor constructs are constructs of other constructors that are similar to the target construct, a memory is further configured to store a specification of influential sources of the structured and unstructured data, and at least a portion of the structured data or unstructured data from the influential sources is weighted more in the generating the scoring indicators; and use expert rules encoded in domain-specific languages to update the scoring indicators; serving layer configured to further fine tune the updated scoring indicators, to periodically write the fine-tuned updated scoring indicators to blob storage, and to use application programming interfaces (APIs) to make the fine-tuned updated scoring indicators to end users, and to display the fine-tuned updated scoring indicators across end user devices; and monitoring layer configured to generate logs and metrics for performance and data quality of other layers, apply feedback loops to improve data ingestion and model weights of the other layers, and provide retraining and benchmarking of models in the other layers, wherein the knowledge graph is further configured to store constructor clusters that group similar constructors, wherein the constructor clusters are based on shared forms of constructs, wherein the constructor clusters are further based on shared categories of constructs, and wherein constructors in a particular constructor cluster are ranked as emerging, developed, and established.
2 . The system of claim 1 , wherein the structured data includes past score records and crowd sourced score data.
3 . The system of claim 1 , wherein the unstructured data includes data from one or more of one or more information networks.
4 . The system of claim 1 , wherein the structured and unstructured data is cleansed from missing values and outliers.
5 . The system of claim 1 , wherein the knowledge graph further identifies clusters of nodes in the knowledge graph and central nodes of the clusters of nodes, wherein the clusters of nodes and the central nodes of the clusters of nodes in the knowledge graph are identified using graph analytics.
6 . The system of claim 1 , further configured to use convolutional neural network (CNNs) embeddings to learn visual features from images to find similar styles and themes of the constructs.
7 . The system of claim 6 , further configured to use the CNNs to compare visual constructs.
8 . The system of claim 1 , further configured to use Word2Vec embeddings to learn vector representations of words and phrases.
9 . The system of claim 8 , further configured to use the Word2Vec embeddings to find similarity between construct titles and descriptions.
10 . The system of claim 1 , further configured to use graph community detection algorithms to identify clusters of related nodes in the knowledge graph.
11 . The system of claim 10 , further configured to use the graph community detection algorithms to find related constructors or related categories.
12 . The system of claim 1 , further configured to use a plurality of analytic models, including ANOVA, network centrality metrics, sentiment analysis, Hedonic regression, probit regression, and OLS regression.
13 . The system of claim 1 , wherein the weighted combination is modifiable based on human input.

Description

PRIORITY DATA This patent application claims the benefit of and priority to the following four U.S. Provisional Patent Applications: U.S. Provisional Patent Application No. 63/538,790, titled “ACCURATE ARTWORK PRICE PREDICTION,” filed Sep. 15, 2023 ;U.S. Provisional Patent Application No. 63/538,791, titled “INTERFACE FOR ARTWORK PRICE PREDICTION,” filed Sep. 15, 2023;U.S. Provisional Patent Application No. 63/538,792, titled “ARTIFICIAL INTELLIGENCE-BASED ARTWORK PRICE PREDICTION,” filed Sep. 15, 2023; andU.S. Provisional Patent Application No. 63/538,793, titled “END-TO-END SYSTEMS AND METHODS FOR ARTWORK PRICE PREDICTION,” filed Sep. 15, 2023.The priority U.S. Provisional Patent Applications are incorporated herein by reference in their entirety and for all purposes as if completely and fully set forth herein. FIELD OF THE TECHNOLOGY DISCLOSED The technology disclosed relates to artificial intelligence type computers and digital data processing systems and corresponding data processing methods and products for emulation of intelligence (i.e., knowledge based systems, reasoning systems, and knowledge acquisition systems); and including systems for reasoning with uncertainty (e.g., fuzzy logic systems), adaptive systems, machine learning systems, and artificial neural networks. BACKGROUND The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves can also correspond to implementations of the claimed technology. Deep learning is a frontier for artificial intelligence, aiming to be closer to its primary goal—artificial intelligence. Deep learning has seen great success in a wide variety of applications, such as natural language processing, speech recognition, medical applications, computer vision, and intelligent transportation systems. The great success of deep learning is due to the larger models. The scale of these models has included hundreds of millions of parameters. These hundreds of millions of parameters allow the model to have more degrees of freedom enough to produce awe-inspiring description capability. However, the large number of parameters requires a massive amount of training data with labels. Improving model performance by data annotation has two crucial challenges. On the one hand, the data growth rate is far behind the growth rate of model parameters, so data growth has primarily hindered the further development of the model. On the other hand, the emergence of new tasks has far exceeded the speed of data updates, and annotating for all samples is laborious. To tackle this challenge, new datasets are built by generating synthetic samples, thereby speeding up model iteration and reducing the cost of data annotation. Pre-training methods and transfer learning have also been used to solve this challenge, such as Transformers, BERT, and GPT. These works have achieved incredible results. However, the generated data is only used as base data to initialize the model. In order to obtain a high-precision usable model, it is often necessary to label and update specific data. Integrating apriori knowledge in the learning framework is an effective means to deal with sparse data, as the learner does not need to induce the knowledge from the data itself. As special agents, humans have rich prior knowledge. If the machine can learn human wisdom and knowledge, it will help deal with sparse data. Human-in-the-loop (HITL) addresses these issues by incorporating human knowledge into the modeling process. HITL aims to train an accurate prediction model with minimum cost by integrating human knowledge and experience. Humans can provide training data for machine learning applications and directly accomplish some tasks that are hard for computers in the pipeline with the help of machine-based approaches. At present, there is still a high degree of coupling between deep learning tasks and data, and the performance of deep learning largely depends on the quality of the data. For a new task, if you want to obtain better performance, you need to provide a large amount of high-quality labeled data. However, the labeled data requires a large amount of labor. In addition, large-scale data annotation takes a long time, and many iterations of tasks cannot wait such a long time. Unlike weak annotate and automatic annotate, HITL-based methods emphasize finding the key samples that play a decisive factor in new sample data. A core set is a weighted subset of a larger set. A core set guarantees that a model fitting the core set also fits the larger set. Core set construction methods perform importance sampling with respect to sensitivity score, to provide high-probability