CN-121999208-A - Readability prediction device and method

CN121999208ACN 121999208 ACN121999208 ACN 121999208ACN-121999208-A

Abstract

A readability prediction device and a method thereof divide a picture from data to be judged and a text corresponding to the picture. The readability prediction device transmits a prompt, the picture and the text corresponding to the picture to at least one multi-modal large language model to generate a picture semantic corresponding to the picture. The readability prediction device transmits a readability characteristic to a readability model so as to predict readability of the data to be judged. The technology provided by the invention increases the comprehensive understanding capability of the readability prediction device to texts and pictures and improves the accuracy of readability prediction.

Inventors

Zeng Houqiang
CHEN GUANYU
SONG YAOTING
CHEN BAILIN
WU JIEXUAN

Assignees

曾厚强

Dates

Publication Date: 20260508
Application Date: 20250414
Priority Date: 20250212

Claims (10)

1. A readability prediction apparatus, comprising: The receiving-transmitting interface is used for receiving data to be judged; a memory for storing at least one multi-modal large language model and a readability model, and A processor electrically connected to the transceiver interface and the memory and performing the following operations: dividing a picture from the data to be judged and a text corresponding to the picture; Transmitting a prompt, the picture and the text corresponding to the picture to the at least one multi-modal large language model to generate a picture semantic corresponding to the picture, wherein the prompt is used for indicating a generation type of the generated picture semantic, and Transmitting a readability feature to the readability model to predict a readability of the data to be judged, wherein the readability feature is generated based on the text corresponding to the picture and the picture semantic corresponding to the picture.
2. The apparatus of claim 1, wherein the operation of dividing the picture and the text corresponding to the picture from the data to be determined further comprises the operations of: analyzing a plurality of object data on the data to be judged to generate a data tag corresponding to each of the plurality of object data; Selecting a plurality of target object data corresponding to the plurality of target data tags from the plurality of object data based on a plurality of target data tags in the data tags corresponding to each of the plurality of object data, wherein the plurality of target data tags include a picture tag and a text tag, and And dividing the object data from the object data to serve as the picture and the text corresponding to the picture.
3. The apparatus of claim 1, wherein the at least one multi-modal large language model comprises a first large language model and a second large language model, and the processor further performs the following operations: Transmitting the prompt, the picture and the text corresponding to the picture to the first large language model to generate a first candidate picture description corresponding to the picture; Transmitting the prompt, the picture and the text corresponding to the picture to the second large language model to generate a second candidate picture description corresponding to the picture, and Combining the first candidate picture description corresponding to the picture and the second candidate picture description corresponding to the picture to generate the picture semantic corresponding to the picture.
4. The readability prediction apparatus of claim 1, wherein the readability characteristic is generated based on: Combining the text corresponding to the picture and the picture semantic corresponding to the picture to generate a combined text, wherein the combined text comprises a plurality of unit texts; Transmitting the combined text to a language model to calculate a plurality of unit text vectors corresponding to the plurality of unit texts, and Combining the plurality of unit text vectors corresponding to the plurality of unit texts to generate the readability feature.
5. The apparatus of claim 1, wherein the readability comprises a readability score, and the operation of predicting the readability of the data to be determined further comprises the operations of: and transmitting the readability characteristic to the readability model to calculate the readability score corresponding to the data to be judged.
6. The readability prediction apparatus of claim 5, wherein the readability model is generated based on: A predictive model is trained based on a plurality of historical readability features and a plurality of historical readability scores corresponding to the plurality of historical readability features to generate the readability model.
7. The apparatus of claim 1, wherein the readability comprises one of a plurality of readability classification levels, and the operation of predicting the readability of the data to be determined further comprises the operations of: Transmitting the readability characteristic to the readability model to predict a first readable classification level corresponding to the data to be judged, wherein the first readable classification level is one of the plurality of readable classification levels.
8. The readability prediction apparatus of claim 7, wherein the readability model is generated based on: A predictive model is trained based on a plurality of historical readability characteristics and a plurality of historical readability classification levels corresponding to the plurality of historical readability characteristics to generate the readability model.
9. The apparatus of claim 1, wherein the processor further performs the following operations: dividing a plurality of candidate pictures and a second text corresponding to each of the plurality of candidate pictures from the data to be judged, wherein the plurality of candidate pictures comprise the picture; Transmitting the prompt, the plurality of candidate pictures and the second text corresponding to each of the plurality of candidate pictures to the at least one multi-modal large language model to generate a plurality of candidate picture semantics corresponding to the plurality of candidate pictures, wherein the prompt is used for indicating a generation type of the generated plurality of candidate picture semantics And transmitting the readability characteristic to the readability model to predict the readability of the data to be judged, wherein the readability characteristic is generated based on the second text corresponding to each of the plurality of candidate pictures and the plurality of candidate picture semantics corresponding to the plurality of candidate pictures.
10. The method for predicting the readability is characterized by being applied to an electronic device, wherein the electronic device is used for storing at least one multi-mode large language model and one readability model, and the method for predicting the readability comprises the following steps of: Dividing a picture from data to be judged and a text corresponding to the picture; Transmitting a prompt, the picture and the text corresponding to the picture to at least one multi-modal large language model to generate a picture semantic corresponding to the picture, wherein the prompt is used for indicating a generation type of the generated picture semantic, and Transmitting a readability feature to a readability model to predict readability of the data to be judged, wherein the readability feature is generated based on the text of the picture and the picture semantic of the picture.

Description

Readability prediction device and method Technical Field The invention relates to a readability prediction device and a readability prediction method. More particularly, the present invention relates to a readability prediction apparatus and method capable of predicting readability of data including text and pictures. Background In recent years, various techniques and applications of readability prediction have been proposed successively. In the prior art, the readability of the input data is generally predicted by only analyzing the text semantic meaning corresponding to the input data. However, the existing text readability prediction model is limited to performing readability prediction on text, and cannot perform readability prediction by considering the content of the picture at the same time, so that the text readability prediction model is limited in the ability of "understanding the picture", and the universality and accuracy of the readability model cannot be further improved. In view of the foregoing, it would be desirable to provide an apparatus and method for automatically understanding the semantic meaning of a graphic and combining text content to make predictions of text readability. Disclosure of Invention An object of the present invention is to provide a readability prediction apparatus. The readability prediction device comprises a transceiver interface, a storage and a processor. The receiving-transmitting interface is used for receiving data to be judged, and the storage is used for storing at least one multi-mode large-scale language model and a readability model. The processor is electrically connected to the transceiver interface and the memory. The processor divides a picture from the data to be judged and a text corresponding to the picture. The processor transmits a prompt, the picture and the text corresponding to the picture to the at least one multi-modal large language model to generate a picture semantic corresponding to the picture, wherein the prompt is used for indicating a generation type of the generated picture semantic. The processor transmits a readability feature to the readability model to predict a readability of the data to be judged, wherein the readability feature is generated based on the text of the picture and the picture semantic of the picture. In some embodiments of the present invention, the operation of separating the picture and the text corresponding to the picture from the data to be determined further comprises the operations of analyzing a plurality of object data on the data to be determined to generate a data tag corresponding to each of the plurality of object data, selecting a plurality of target object data corresponding to the plurality of target data tags from the plurality of object data based on a plurality of target data tags in the data tag corresponding to each of the plurality of object data, wherein the plurality of target data tags comprise a picture tag and a text tag, and separating the plurality of target object data from the plurality of object data as the picture and the text corresponding to the picture. In some embodiments of the present invention, the at least one multi-modal large language model comprises a first large language model and a second large language model, and the processor further performs the operations of transmitting the prompt, the picture and the text corresponding to the picture to the first large language model to generate a first candidate picture description corresponding to the picture, transmitting the prompt, the picture and the text corresponding to the picture to the second large language model to generate a second candidate picture description corresponding to the picture, and combining the first candidate picture description corresponding to the picture and the second candidate picture description corresponding to the picture to generate the picture semantic meaning corresponding to the picture. In some embodiments of the present invention, the readability feature is generated based on combining the text corresponding to the picture and the picture semantic corresponding to the picture to generate a combined text, wherein the combined text comprises a plurality of unit texts, transmitting the combined text to a language model to calculate a plurality of unit text vectors corresponding to the plurality of unit texts, and combining the plurality of unit text vectors corresponding to the plurality of unit texts to generate the readability feature. In some embodiments of the present invention, wherein the readability comprises a readability score, and the operation of predicting the readability of the data to be determined further comprises the operation of transmitting the readability characteristic to the readability model to calculate the readability score of the data to be determined. In some embodiments of the present invention, wherein the readability model is generated based on training a predi