CN-116630981-B - Note summary generation method, device, equipment and storage medium

CN116630981BCN 116630981 BCN116630981 BCN 116630981BCN-116630981-B

Abstract

The invention provides a note summary generation method, a device, equipment and a storage medium, wherein the method comprises the steps of obtaining a target text picture; the method comprises the steps of dividing a plurality of target areas from a target text picture, determining the category of each target area, wherein the plurality of target areas comprise a plurality of text areas, each text area is one of an original text area, a user writing area and a user marking area, carrying out text recognition on each divided text area to obtain recognition results corresponding to the plurality of text areas, taking the recognition results corresponding to the user writing area and/or the recognition results corresponding to the user marking area as guiding information, and generating note summary of a target user by combining the recognition results corresponding to the original text area. According to the method and the device, the note summary of the user can be automatically generated according to the text pictures, compared with a manual note arrangement mode, the time consumption is greatly reduced, the note arrangement efficiency is improved, and meanwhile, the influence caused by human factors is avoided.

Inventors

LIU CHAOFAN
KONG CHANGQING
WAN GENSHUN
XIONG SHIFU
GAO JIANQING
PAN JIA
LIU CONG

Assignees

科大讯飞股份有限公司

Dates

Publication Date: 20260505
Application Date: 20221226

Claims (13)

1. A note summary generation method, comprising: Acquiring a text picture containing notes of a target user as a target text picture; Dividing a plurality of target areas from the target text picture, and determining the category of each target area, wherein the plurality of target areas comprise a plurality of text areas, and each text area is one of an original text area, a user writing area and a user marking area; Performing text recognition on each segmented text region to obtain recognition results corresponding to the text regions respectively; Taking the recognition result corresponding to the user writing area and/or the recognition result corresponding to the user marking area as guiding information, and generating note summary of the target user by combining the recognition result corresponding to the original text area; the generating a note summary of the target user by using the recognition result corresponding to the user writing area and/or the recognition result corresponding to the user marking area as guiding information and combining the recognition result corresponding to the original text area includes: taking the identification result corresponding to the user mark area as guiding information, and generating note summary faithful to the original text by combining the identification result corresponding to the original text area; Or at least taking the identification result corresponding to the writing area of the user as guiding information, and generating personalized note summary by combining the identification result corresponding to the original text area.
2. The note summary generation method of claim 1, wherein the number of target areas further comprises a number of image areas; The method further comprises the steps of: And merging the image areas into the generated note summary.
3. The method of claim 2, wherein the segmenting the target regions from the target text picture comprises: based on a picture segmentation model obtained through pre-training, segmenting a plurality of target areas from the target text picture, and determining the category of each target area; Each target area is one of an image area, an original text area, a user writing area and a user marking area, and the image segmentation model is obtained by training an initial image segmentation model by adopting training text images marked with positions and categories of a plurality of target areas.
4. A note summary generation method according to claim 3, wherein the initial image segmentation model comprises a feature extraction module; The feature extraction module in the initial picture segmentation model adopts training pictures based on picture categories to obtain feature extraction modules in the image classification model; The picture category of a training picture label is one or two of the following categories of only containing images, only containing texts, containing images and texts, no user notes and no user notes.
5. The method for generating a note summary according to any one of claims 1 to 4, wherein generating the note summary of the target user by using the recognition result corresponding to the user writing area and/or the recognition result corresponding to the user marking area as guiding information and combining the recognition result corresponding to the original text area includes: Acquiring a first text set and a second text set, wherein the first text set and the second text set are sequentially a first sentence subset, a second sentence subset or sequentially a third sentence subset and a fourth sentence subset, the first sentence set comprises each original sentence in a recognition result corresponding to the original text region, the second sentence set comprises each original sentence in a recognition result corresponding to the user mark region, the third sentence set comprises each user writing in the recognition result corresponding to the user writing region and each spliced sentence of each original sentence in the recognition result corresponding to the original text region, the fourth sentence set comprises spliced sentences comprising key sentences and/or key reminding sentences in the third sentence set, the key sentences are original sentences in the recognition result corresponding to the user mark region, and the key reminding sentences are original sentences corresponding to the user writing sentences; Obtaining chapter feature vectors respectively corresponding to the first text set and the second text set, wherein the chapter feature vectors are feature vectors of chapters consisting of all texts in the corresponding text set; And generating note summary of the target user based on the chapter feature vectors respectively corresponding to the first text set and the second text set.
6. The method of generating a note summary according to claim 5, wherein the obtaining chapter feature vectors corresponding to the first text set and the second text set, respectively, includes: a target text set for which corresponding chapter feature vectors are to be determined for the first text set and the second text set: Performing word-level coding on each text in the target text set to obtain sentence representation vectors of each text in the target text set; Fusing sentence expression vectors of texts in the target text set, wherein the fused vectors are used as chapter expression vectors of the target text set; And carrying out sentence level coding on the chapter representation vector of the target text set to obtain a chapter feature vector corresponding to the target text set.
7. The method of claim 6, wherein if the first text set and the second text set are a third sentence subset and a fourth sentence subset in order, performing sentence-level encoding on the chapter representative vector of the target text set comprises: And carrying out sentence level coding on the chapter representation vectors of the target text set by combining the position information respectively corresponding to each text in the target text set, wherein the position information corresponding to one text comprises the relative position information of two text areas where two sentences forming the text are located.
8. The method of generating a note summary of the target user according to claim 5, wherein generating the note summary of the target user based on the chapter feature vectors corresponding to the first text set and the second text set, respectively, comprises: Generating note summary corresponding to each text in the second text set by combining the chapter feature vectors corresponding to the first text set based on the chapter feature vectors corresponding to the second text set; And merging and processing the note summary corresponding to each text in the second text set to obtain the note summary of the target user.
9. The method for generating a note summary according to claim 8, wherein generating the note summary corresponding to each text in the second text set by combining the chapter feature vector corresponding to the first text set based on the chapter feature vector corresponding to the second text set includes: Target text for which a corresponding note summary is to be generated in the second text set: Obtaining a feature vector related to the target text from the chapter feature vector corresponding to the second text set as a first feature vector corresponding to the target text; determining a feature vector required for generating a note summary corresponding to the target text as a second feature vector corresponding to the target text according to the first feature vector corresponding to the target text and the chapter feature vector corresponding to the first text set; And generating note summary corresponding to the target text according to the second feature vector corresponding to the target text.
10. The method for generating a note summary according to claim 1, wherein generating the note summary of the target user by using the recognition result corresponding to the user mark area as guiding information and combining the recognition result corresponding to the original text area includes: Processing the recognition result corresponding to the user mark area and the recognition result corresponding to the original text area based on a first note summary generation model obtained through pre-training to obtain note summary which is faithful to the original text, wherein the first note summary generation model is obtained through training by adopting a plurality of pieces of training text data marked with the note summary which is faithful to the original text, and each piece of training text data comprises the recognition result corresponding to the original text area and the recognition result corresponding to the user mark area which are separated from a training text picture; Taking the recognition result corresponding to the user writing area and the recognition result corresponding to the user marking area as guiding information, and generating the note summary of the target user by combining the recognition result corresponding to the original text area, wherein the method comprises the following steps: And processing the recognition result corresponding to the user writing area, the recognition result corresponding to the user marking area and the recognition result corresponding to the original text area based on a second note summary generation model obtained through pre-training to obtain personalized note summary, wherein the second note summary generation model is obtained through training by adopting a plurality of pieces of training text data marked with the personalized note summary, and each piece of training text data comprises the recognition result corresponding to the user writing area, the recognition result corresponding to the user marking area and the recognition result corresponding to the original text area which are separated from a training text picture.
11. The note summary generating device is characterized by comprising a picture acquisition module, a picture segmentation module, a text recognition module and a note summary generating module; The picture acquisition module is used for acquiring a text picture containing notes of a target user and taking the text picture as a target text picture; The picture segmentation module is used for segmenting a plurality of target areas from the target text picture and determining the category of each target area, wherein the plurality of target areas comprise a plurality of text areas, and each text area is one of an original text area, a user writing area and a user marking area; the text recognition module is used for carrying out text recognition on each segmented text region to obtain recognition results corresponding to the text regions respectively; The note summary generating module is configured to generate a note summary of the target user by using the recognition result corresponding to the writing area of the user and/or the recognition result corresponding to the marking area of the user as guiding information and combining the recognition result corresponding to the original text area; The note summary generating module is specifically configured to generate a note summary faithful to an original text by using a recognition result corresponding to the user writing area and/or a recognition result corresponding to the user marking area as guiding information and combining the recognition result corresponding to the original text area when generating the note summary of the target user by using the recognition result corresponding to the user writing area and/or the recognition result corresponding to the user marking area as guiding information and combining the recognition result corresponding to the original text area, or generate a personalized note summary by using at least the recognition result corresponding to the user writing area as guiding information and combining the recognition result corresponding to the original text area.
12. A note summary generating apparatus includes a memory and a processor; the memory is used for storing programs; The processor is configured to execute the program to implement the steps of the note summary generating method according to any one of claims 1 to 10.
13. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the respective steps of the note summary generation method of any of claims 1 to 10.

Description

Note summary generation method, device, equipment and storage medium Technical Field The present invention relates to the field of natural language processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a note summary. Background In some scenarios, a user may write some content on the text, e.g., the user's mind about the text content, the user's refinement of the text content, etc., and may mark some content in the text, e.g., delineate or mark some content of interest. Taking a learning scene as an example, students can record the blackboard-writing of a teacher or the insight of the teacher in a textbook in the course of teaching, and mark the content of the teacher which is focused on. Because of limited effort, users generally do not pay particular attention to the regularity and neatness of the recordings when writing and marking on text, and when users want to review text content, the cluttered recordings can have a great impact on review, so users need to sort through the recorded content, however, the user is extremely time-consuming and labor-consuming to sort through the recorded content. Disclosure of Invention In view of the above, the present invention provides a method, an apparatus, a device, and a storage medium for generating note summaries, which are used for solving the problem that the user is extremely time-consuming and labor-consuming to sort the recorded content, and the technical scheme is as follows: A note summary generation method, comprising: Acquiring a text picture containing notes of a target user as a target text picture; Dividing a plurality of target areas from the target text picture, and determining the category of each target area, wherein the plurality of target areas comprise a plurality of text areas, and each text area is one of an original text area, a user writing area and a user marking area; Performing text recognition on each segmented text region to obtain recognition results corresponding to the text regions respectively; And generating note summary of the target user by taking the recognition result corresponding to the user writing area and/or the recognition result corresponding to the user marking area as guide information and combining the recognition result corresponding to the original text area. Optionally, the plurality of target areas further includes a plurality of image areas; The method further comprises the steps of: And merging the image areas into the generated note summary. Optionally, the dividing a plurality of target areas from the target text picture includes: based on a picture segmentation model obtained through pre-training, segmenting a plurality of target areas from the target text picture, and determining the category of each target area; Each target area is one of an image area, an original text area, a user writing area and a user marking area, and the image segmentation model is obtained by training an initial image segmentation model by adopting training text images marked with positions and categories of a plurality of target areas. Optionally, the initial image segmentation model includes a feature extraction module; The feature extraction module in the initial picture segmentation model adopts training pictures based on picture categories to obtain feature extraction modules in the image classification model; The picture category of a training picture label is one or two of the following categories of only containing images, only containing texts, containing images and texts, no user notes and no user notes. Optionally, the generating the note summary of the target user by using the recognition result corresponding to the writing area of the user and/or the recognition result corresponding to the marking area of the user as guiding information and combining the recognition result corresponding to the original text area includes: Acquiring a first text set and a second text set, wherein the first text set and the second text set are sequentially a first sentence subset, a second sentence subset or sequentially a third sentence subset and a fourth sentence subset, the first sentence set comprises each original sentence in a recognition result corresponding to the original text region, the second sentence set comprises each original sentence in a recognition result corresponding to the user mark region, the third sentence set comprises each user writing in the recognition result corresponding to the user writing region and each spliced sentence of each original sentence in the recognition result corresponding to the original text region, the fourth sentence set comprises spliced sentences comprising key sentences and/or key reminding sentences in the third sentence set, the key sentences are original sentences in the recognition result corresponding to the user mark region, and the key reminding sentences are original sentences corresponding to the user writing sente