US-20260127902-A1 - TEXT READABILITY PREDICTION DEVICE AND TEXT READABILITY PREDICTION METHOD

US20260127902A1US 20260127902 A1US20260127902 A1US 20260127902A1US-20260127902-A1

Abstract

A text readability prediction device and method are provided. The text readability prediction device segments a picture and a text corresponding to the picture from a data to be determined. The text readability prediction device sends a prompt, the picture and the text corresponding to the picture to at least one multimodal large language model to generate a picture semantic corresponding to the picture. The text readability prediction device sends a readability feature to a readability model to predict a readability of the data to be determined.

Inventors

Hou-Chiang TSENG
Kuan-Yu Chen
Yao-Ting Sung
Berlin CHEN
Chieh-Hsuan WU

Assignees

NATIONAL TAIWAN UNIVERSITY OF SCIENCE AND TECHNOLOGY

Dates

Publication Date: 20260507
Application Date: 20250625
Priority Date: 20250212

Claims (15)

1 . A text readability prediction device, comprising: a transceiver interface, configured to receive a data to be determined; a storage, configured to store at least one multimodal large language model and a readability model; and a processor, electrically connected to the transceiver interface and the storage, wherein the processor is configured to perform following operations: segmenting a picture and a text corresponding to the picture from the data to be determined; sending a prompt, the picture and the text corresponding to the picture to the at least one multimodal large language model to generate a picture semantics corresponding to the picture, wherein the prompt is configured to indicate a generated type of the picture semantics; and sending a readability feature to the readability model to predict a readability corresponding to the data to be determined, wherein the readability feature is generated according to the text corresponding to the picture and the picture semantics corresponding to the picture.
2 . The text readability prediction device of claim 1 , wherein the operation of segmenting the picture and the text corresponding to the picture from the data to be determined further comprises following operations: analyzing a plurality of pieces of object data on the data to be determined to generate a data tag corresponding to each of the pieces of object data; selecting a plurality of pieces of target object data corresponding to a plurality of target data tags from the pieces of object data according to the target data tags of the data tags, wherein the target data tags comprise a picture tag and a text tag; and segmenting the pieces of target object data from the pieces of object data to serve as the picture and the text corresponding to the picture.
3 . The text readability prediction device of claim 1 , wherein the at least one multimodal large language model at least comprises a first large language model and a second large language model, wherein the processor is further configured to perform following operations: sending the prompt, the picture and the text corresponding to the picture to the first large language model to generate a first candidate picture description corresponding to the picture; sending the prompt, the picture and the text corresponding to the picture to the second large language model to generate a second candidate picture description corresponding to the picture; and combining the first candidate picture description corresponding to the picture and the second candidate picture description to generate the picture semantics corresponding to the picture.
4 . The text readability prediction device of claim 1 , wherein the readability feature is generated according to following operation: combining the text corresponding to the picture and the picture semantics corresponding to the picture to generate a combined text, wherein the combined text comprises a plurality of unit texts; sending the combined text to a language model to calculate a plurality of unit text vectors corresponding to the unit texts; and combining the unit text vectors corresponding to the unit texts to generate the readability feature.
5 . The text readability prediction device of claim 1 , wherein the readability comprises a readability score, and the operation of predicting the readability corresponding to the data to be determined further comprises following operations: sending the readability feature to the readability model to calculate the readability score corresponding to the data to be determined.
6 . The text readability prediction device of claim 5 , wherein the readability model is generated according to following operation: training a prediction model according to a plurality of historical readability features and a plurality of historical readability scores corresponding to the historical readability features to generate the readability model.
7 . The text readability prediction device of claim 1 , wherein the readability comprises one of a plurality of readability classification levels, and the operation of predicting the readability corresponding to the data to be determined further comprises following operation: sending the readability feature to the readability model to predict a first readability classification level corresponding to the data to be determined, wherein the first readability classification level is one of the readability classification levels.
8 . The text readability prediction device of claim 7 , wherein the readability model is generated according to following operation: training a prediction model according to a plurality of historical readability features and a plurality of historical readability classification levels corresponding to the historical readability features to generate the readability model.
9 . The text readability prediction device of claim 1 , wherein the processor is further configured to perform following operations: segmenting a plurality of candidate pictures and a second text corresponding to each of the candidate pictures from the data to be determined, wherein the candidate pictures comprise the picture; sending the prompt, the candidate pictures and the second text corresponding to each of the candidate pictures to the at least one multimodal large language model to generate a plurality of candidate picture semantics corresponding to the candidate pictures, wherein the prompt is configured to indicate a generated type of the candidate picture semantics; and sending the readability feature to the readability model to predict a readability corresponding to the data to be determined, wherein the readability feature is generated according to the second text corresponding to each of the candidate pictures and the candidate picture semantics corresponding to the candidate pictures.
10 . A text readability prediction method, adapted to an electronic device, wherein the electronic device is configured to store at least one multimodal large language model and a readability model, wherein the text readability prediction method comprises following steps of: segmenting a picture and a text corresponding to the picture from a data to be determined; sending a prompt, the picture and the text corresponding to the picture to the at least one multimodal large language model to generate a picture semantics corresponding to the picture, wherein the prompt is configured to indicate a generated type of the picture semantics; and sending a readability feature to a readability model to predict a readability corresponding to the data to be determined, wherein the readability feature is generated according to the text corresponding to the picture and the picture semantics corresponding to the picture.
11 . The text readability prediction method of claim 10 , wherein the step of segmenting the picture and the text corresponding to the picture from the data to be determined further comprises: analyzing a plurality of pieces of object data on the data to be determined to generate a data tag corresponding to each of the pieces of object data; selecting a plurality of pieces of target object data corresponding to a plurality of target data tags from the pieces of object data according to the target data tags of the data tags, wherein the target data tags comprise a picture tag and a text tag; and segmenting the pieces of target object data from the pieces of object data to serve as the picture and the text corresponding to the picture.
12 . The text readability prediction method of claim 10 , wherein the at least one multimodal large language model at least comprises a first large language model and a second large language model, wherein the text readability prediction method further comprises: sending the prompt, the picture and the text corresponding to the picture to the first large language model to generate a first candidate picture description corresponding to the picture; sending the prompt, the picture and the text corresponding to the picture to the second large language model to generate a second candidate picture description corresponding to the picture; and combining the first candidate picture description corresponding to the picture and the second candidate picture description to generate the picture semantics corresponding to the picture.
13 . The text readability prediction method of claim 10 , wherein the readability feature is generated according to following step of: combining the text corresponding to the picture and the picture semantics corresponding to the picture to generate a combined text, wherein the combined text comprises a plurality of unit texts; sending the combined text to a language model to calculate a plurality of unit text vectors corresponding to the unit texts; and combining the unit text vectors corresponding to the unit texts to generate the readability feature.
14 . The text readability prediction method of claim 10 wherein the readability comprises a readability score, and the step of predicting the readability corresponding to the data to be determined further comprises: sending the readability feature to the readability model to calculate the readability score corresponding to the data to be determined.
15 . The text readability prediction method of claim 14 , wherein the readability model is generated according to following operation: training a prediction model according to a plurality of historical readability features and a plurality of historical readability scores corresponding to the historical readability features to generate the readability model.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application claims priority to Taiwan Application Serial Number 114105237, filed Feb. 12, 2025, and U.S. Provisional Application Ser. No. 63/714,874, filed Nov. 1, 2024, all of which are herein incorporated by reference in their entireties. BACKGROUND FIELD OF INVENTION The present disclosure relates to a text readability prediction device and a method. More particularly, the present disclosure relates to a readability prediction device and method capable of predicting the readability of data containing text and pictures. Description of Related Art In recent years, various readability prediction technologies and applications have been proposed one after another. In the prior art, the readability of the input data is generally predicted by simply analyzing the text semantics corresponding to the input data. However, conventional text readability prediction models are limited to predicting readability of words and are unable to simultaneously consider the content of the picture itself for readability prediction. As a result, the text readability prediction model is limited in its ability to “understand pictures” and cannot further improve the versatility and accuracy of the readability model. For the foregoing reasons, there is a need for providing a device and a method capable of automatically understanding semantics of an picture and combining it with text content to predict text readability to solve the above problems encountered in related art approaches. SUMMARY One aspect of the present disclosure provides a text readability prediction device. The text readability prediction device includes a transceiver interface, a storage and a processor. The transceiver interface is configured to receive a data to be determined. The storage is configured to store at least one multimodal large language model and a readability model. The processor is electrically connected to the transceiver interface and the storage. The processor is configured to segment a picture and a text corresponding to the picture from the data to be determined. The processor is configured to send a prompt, the picture and the text corresponding to the picture to the at least one multimodal large language model to generate a picture semantics corresponding to the picture, where the prompt is configured to indicate a generated type of the picture semantics. The processor is configured to send a readability feature to the readability model to predict a readability corresponding to the data to be determined, where the readability feature is generated according to the text corresponding to the picture and the picture semantics corresponding to the picture. Another aspect of the present disclosure provides a method. The method is adapted to an electronic device. The method includes following steps of: segmenting a picture and a text corresponding to the picture from a data to be determined; sending a prompt, the picture and the text corresponding to the picture to the at least one multimodal large language model to generate a picture semantics corresponding to the picture, wherein the prompt is configured to indicate a generated type of the picture semantics; and sending a readability feature to a readability model to predict a readability corresponding to the data to be determined, wherein the readability feature is generated according to the text corresponding to the picture and the picture semantics corresponding to the picture. The technology provided by the present disclosure (at least including a text readability prediction device and method) is to segment a picture and a text corresponding to the picture from the data to be determined. Then, the present disclosure is configured to generate picture semantics corresponding to the picture according to a multimodal large language model. Finally, the present disclosure is configured to send the readability feature to the readability model to predict a readability corresponding to the data to be determined. The present disclosure is configured to generate picture semantics of the corresponding to the picture through the multimodal large language model, and combines the text and the picture semantics. Therefore, the technology provided by the present disclosure increases a comprehensive understanding ability of a readability prediction device for text and pictures, and also improves an accuracy of readability prediction. BRIEF DESCRIPTION OF THE DRAWINGS The present disclosure can be more fully understood by reading the following detailed description of the embodiment, with reference made to the accompanying drawings as follows: FIG. 1 depicts a schematic diagram of a text readability prediction device according to a first embodiment of the present disclosure; FIG. 2 depicts a schematic diagram of a storage according to a first embodiment of the present disclosure; FIG. 3 depicts a schematic diagram of data segmentation according to a first embodiment of