CN-122003679-A - Generated artificial intelligence output verification engine in artificial intelligence system

CN122003679ACN 122003679 ACN122003679 ACN 122003679ACN-122003679-A

Abstract

Methods, systems, and computer storage media for providing a generative Artificial Intelligence (AI) output verification using a generative AI output verification engine in an artificial intelligence system. The generated AI output verification engine evaluates and determines the quality (e.g., quantized to an output verification score) of the generated AI output (e.g., LLM output). In operation, a generated AI output is accessed that includes summary data. The raw data from which the summary data is generated is accessed. A plurality of output verification operations associated with the generated AI output verification engine are performed. The generated AI output verification engine includes a multi-category analysis model that provides a corresponding output verification operation for quantifying the quality of the generated AI output. An output verification score is generated for the summary data using a generative AI output verification engine. The output verification score is transmitted. A feedback loop is established to incorporate human feedback for fine-tuning the generated AI output verification engine model.

Inventors

V. Vinay
J. C. Dennis
M. A. Duncan
D. Duran
B. E. Strom

Assignees

微软技术许可有限责任公司

Dates

Publication Date: 20260508
Application Date: 20241017
Priority Date: 20240326

Claims (20)

1. A computerized system, comprising: one or more computer processors, and A computer memory storing computer-usable instructions that, when used by the one or more computer processors, cause the one or more computer processors to perform operations comprising: accessing (302) a generated Artificial (AI) output comprising summary data; accessing (304) raw data associated with the summary data; Performing (306) a plurality of output verification operations associated with a generated AI output verification engine, wherein the generated AI output verification engine includes a multi-category analysis model having corresponding operations for quantifying a quality of a generated AI output; Generating (308) an output verification score associated with the summary data based on performing the plurality of output verification operations, and -Transmitting (310) the output verification score.
2. The system of claim 1, wherein the summary data comprises a summary of the raw data and a plurality of quality assessment scores, the plurality of quality assessment scores comprising a lexical analysis score, a semantic analysis score, and a sharpness analysis score.
3. The system of claim 1, wherein the plurality of multi-category analysis models includes a lexical analysis model, a semantic analysis model, and a sharpness analysis model.
4. The system of claim 1, wherein performing the plurality of output verification operations comprises: Performing a first plurality of output verification operations associated with a lexical analysis model supporting comparing lexical forms of the raw data with lexical forms of the summary data; performing a second plurality of output validation operations associated with a semantic analysis model that supports comparing a contextual analysis of the raw data with the summary data, and A third plurality of output verification operations associated with a sharpness analysis model supporting evaluating user confidence and satisfaction based on the plurality of identified metrics is performed.
5. The system of claim 1, wherein generating the output validation score is generated based on: Accessing a first final score associated with the lexical analysis model; accessing a second final score associated with the semantic analysis model; Accessing a third final score associated with the sharpness analysis model, and The output validation score is generated based on the first output validation score, the second output validation score, and the third output validation.
6. The system of claim 1, the operations further comprising: receiving a request for a security posture of a computing environment; Generating a security posture visualization associated with the output validation score based on the request for the security posture of the computing environment, and The security posture visualization is transmitted to cause display of the security posture visualization.
7. The system of claim 1, further comprising a customer feedback mechanism that supports presentation of summary data to a human verifier for review and provides feedback regarding any discrepancies or errors, wherein the feedback is used to refine the generated AI output verification engine.
8. One or more computer storage media having computer-executable instructions embodied thereon that, when executed by a computing system having a processor and a memory, cause the processor to perform operations comprising: transmitting (10) a request for a security posture of the computing environment; Based on the request for the security posture of the computing environment, a security posture visualization associated with an output validation score is accessed (34), wherein the output validation score is generated using a generative AI output validation engine comprising a multi-class analysis model having corresponding operations for quantifying quality of a generative AI output, and Causing (36) a display of the security posture visualization.
9. The medium of claim 8, wherein the plurality of multi-category analysis models includes a lexical analysis model, a semantic analysis model, and a sharpness analysis model.
10. The medium of claim 9, wherein the lexical analysis model employs a part-of-speech tagging algorithm to support comparing lexical forms of the raw data with lexical forms of the summary data.
11. The medium of claim 9, wherein the semantic analysis model employs an integrity determination algorithm to support comparing context analysis of raw data with summary data.
12. The medium of claim 9, wherein the sharpness analysis model employs a user confidence and satisfaction evaluation algorithm to support evaluation of user confidence and satisfaction based on a plurality of the identified metrics.
13. The medium of claim 11, the operations further comprising: receiving a request for a security posture of a computing environment; Generating a security posture visualization associated with the output validation score based on the request for the security posture of the computing environment, and The security posture visualization is transmitted to cause display of the security posture visualization.
14. The medium of claim 8, the operations further comprising: transmitting a request for a security posture of the computing environment; Accessing a security posture visualization associated with the output validation score based on the request for the security posture of the computing environment, and Causing a display of the security posture visualization.
15. A computer-implemented method, the method comprising: accessing (402) a generated Artificial Intelligence (AI) output associated with an AI model; Using a generative AI output verification engine, performing a plurality of output verification operations associated with a multi-category analysis model having corresponding operations for quantifying the quality of the generative AI output, wherein performing the plurality of output verification operations comprises: Performing (404) a first plurality of output validation operations associated with the lexical analysis model; performing (406) a second plurality of output validation operations associated with the semantic analysis model, and Performing (408) a third plurality of output verification operations associated with the sharpness analysis model; Generating (410) an output verification score based on performing a plurality of output verification operations, and -Transmitting (412) the output verification score.
16. The method of claim 15, wherein the lexical analysis model of the multi-category analysis models employs a part-of-speech tagging algorithm to support comparing lexical forms of raw data with lexical forms of summary data.
17. The method of claim 15, wherein the semantic analysis model of the multi-category analysis model employs an integrity determination algorithm to support comparing context analysis of raw data with summary data.
18. The method of claim 15, wherein a sharpness analysis model of the multi-category analysis model employs a user confidence and satisfaction evaluation algorithm to support evaluation of user confidence and satisfaction based on a plurality of the identified metrics.
19. The method of claim 15, wherein generating the output validation score is generated based on: Accessing a first final score associated with the lexical analysis model; accessing a second final score associated with the semantic analysis model; Accessing a third final score associated with the sharpness analysis model, and The output validation score is generated based on the first final score, the second final score, and the third final score.
20. The method of claim 15, the method further comprising: receiving a request for a security posture of a computing environment; Generating a security posture visualization associated with the output validation score based on the request for the security posture of the computing environment, and The security posture visualization is transmitted to cause display of the security posture visualization.

Description

Generated artificial intelligence output verification engine in artificial intelligence system Cross Reference to Related Applications The present application claims the benefit of U.S. provisional application No. 63/596,290 filed 11/2023. The entire contents of which are incorporated herein in their entirety. Background Users rely on computing environments with applications and services to accomplish computing tasks. The user may interact with different types of applications and services supported by an Artificial Intelligence (AI) system. In particular, the generative AI system may support text generation, image generation, music and audio generation, video generation, and data synthesis. The generated AI may refer to a class of AI systems and algorithms designed to generate new data or content that is similar to, or in some cases completely different from, the data on which they are trained. The generative AI may encompass a wide range of models and algorithms designed to generate new data or content. For example, large Language Models (LLMs) are a specific class of generative AI models that focus primarily on generating human-like text. LLM and other generative AI models utilize computing architecture, extensive pre-training of data sets, and fine-tuning of specific tasks to support natural language processing applications from chat robots and virtual assistants to content generation and language translation. Disclosure of Invention Aspects of the technology described herein relate generally to systems, methods, and computer storage media, particularly for providing generated Artificial Intelligence (AI) output verification using a generated AI output verification engine of an artificial intelligence system. The generative AI output verification engine evaluates and verifies the output from the generative AI model (e.g., the large language model). The generated AI output verification engine includes a multi-category analysis model (e.g., lexical analysis model, semantic analysis model, and human-facing sharpness analysis model) that provides corresponding operations for quantifying the quality of the generated AI output. The generated AI output validation engine evaluates and determines a quality (e.g., a quality quantified as an output validation score) based on verifying that the generated content meets criteria and criteria consistent with the intended use or context of the generated AI output. In one example, generating the AI output verification may specifically involve verifying that the summary data is complete, accurate, and clear, wherein the summary data is summarized from the source data. The generated AI output verification may be based on a generated AI output evaluation framework. The generated AI output evaluation framework supports identifying an informational metric indicative of a quality of the generated AI output. The generated AI output evaluation framework facilitates determining when the generated AI output has a low quality and why the generated AI output has a low quality. The generated AI output evaluation framework operates to provide diagnostics to identify situations in which the generated AI model associated with the generated AI output may be improved. Further, the generated AI output evaluation framework may be operative to reduce a scoring time of the generated AI output (e.g., event summary) by automatically evaluating a quality of each instance of the output (e.g., event summary) based on the raw data (e.g., event data). The generated AI output evaluation framework supports multiple categories and uses respective techniques to evaluate the generated AI output based on three analysis engines, lexical analysis, semantic analysis, and human-facing sharpness analysis. Each analysis engine generates a corresponding score that can be parsed into a final score (e.g., an output validation score) for each instance of the generated AI output under evaluation. The final score may be used as an assessment metric to represent the overall quality of the generated AI output. It is contemplated that assessment of human participation in a small sample of generated AI output may be implemented to assess a degree of trust in the generated AI output assessment framework, which may be replaced with customer feedback to the generated AI output to automate assessment of the generated AI output assessment framework. Conventionally, artificial intelligence systems lack the comprehensive computational logic and infrastructure necessary to efficiently provide information certification metrics for generating AI outputs. Existing evaluation metrics often yield a general assessment, lacking the pertinence of an improved opportunity to identify suboptimal AI-generated output. The generated AI evaluation metric lacks a metric for human sharpness (e.g., a quantitative measure of the generated AI output that corresponds to the degree to which the generated AI output is easily understood and unambiguous