Search

CN-122021597-A - Method, apparatus, device and storage medium for generating summary

CN122021597ACN 122021597 ACN122021597 ACN 122021597ACN-122021597-A

Abstract

According to embodiments of the present disclosure, a method, apparatus, device, and storage medium for generating a summary are provided. The method includes obtaining schedule data and document data within a time range indicated by a summary generation request in response to the summary generation request, determining at least one topic based on the schedule data and the document data, generating at least one content set corresponding to each of the at least one topic from the schedule data and the document data based on the at least one topic, and generating summary information corresponding to the time range based on the at least one content set. Thus, the integrity and accuracy of the generated summary may be improved.

Inventors

  • SUN YANPING
  • ZHANG JUNBO
  • LI ZHENGYING

Assignees

  • 京东科技信息技术有限公司

Dates

Publication Date
20260512
Application Date
20260129

Claims (15)

  1. 1. A method for generating a summary, comprising: Responding to a summary generation request, and acquiring schedule data and document data in a time range indicated by the summary generation request; Determining at least one topic based on the calendar data and the document data; Generating at least one content set corresponding to each of the at least one topic from the schedule data and the document data based on the at least one topic, and Summary information corresponding to the time range is generated based on the at least one content set.
  2. 2. The method of claim 1, wherein the calendar data includes respective names, categories, and time information of at least one calendar within a target timeframe, and Wherein the document data includes respective content, category and time information of at least one document within the target time range.
  3. 3. The method of claim 1, wherein determining at least one topic comprises: determining a plurality of content segments from the document data; Determining, for a content segment of the plurality of content segments, at least one topic label associated with the content segment based on the calendar data and the content segment, and The topic tags associated with the plurality of content segments are fused based on semantic similarity to determine the at least one topic.
  4. 4. The method of claim 3, wherein determining the plurality of content segments comprises at least one of: Determining at least one first document of which the content length is lower than a first threshold length in the document data as at least one content segment of the plurality of content segments, or At least one second document of the document data having a content length exceeding the first threshold length is divided into at least two content segments of the plurality of content segments.
  5. 5. The method of claim 4, wherein dividing the at least one second document into a plurality of content segments comprises: Responsive to the at least one second document including at least one semantic boundary identification therein, partitioning the at least one second document into the at least two content segments according to the at least one semantic boundary identification, and Responsive to the semantic boundary identification not being included in the at least one second document, partitioning the at least one second document into the at least two content segments based on a second threshold length.
  6. 6. The method of claim 3, wherein determining at least one topic tag associated with the content segment comprises at least one of: determining a name of at least one schedule as the at least one subject label in response to the content segment being related to the at least one schedule in the schedule data, or The at least one topic tag is determined based on the semantic content of the content segment and the category of the document to which the content segment belongs using a language model.
  7. 7. The method of claim 1, wherein generating at least one content set to which the at least one topic corresponds each comprises: for a topic in the at least one topic, Determining a candidate content set based on the schedule data and a plurality of content segments corresponding to the subject in the document data; deleting content segments from the candidate content set having a similarity between the topics below a threshold value, and And sequencing the plurality of content fragments in the candidate content set according to the corresponding time information of the document to which the plurality of content fragments belong so as to obtain a content set corresponding to the theme.
  8. 8. The method of claim 1, wherein the summary information follows a predetermined summary template, and generating the summary information comprises: Sorting the at least one content set by time; Executing, with the agent, a plurality of subtasks, each of which is configured to generate information in one of the template portions of the predetermined summary template, based on the ordered at least one content set, respectively, and And determining the summary information based on information of a plurality of template parts generated by the plurality of subtasks.
  9. 9. The method of claim 8, wherein the summary information comprises a backlog, wherein performing the plurality of subtasks comprises: With the aid of the first machine learning model, Identifying at least one item in the at least one content collection having semantic content to be processed; for an item of the at least one item, Determining whether the corresponding content of the item has a processed semantic identity at a later time, and In response to the respective content of the item not having the processed semantic identification, the item is determined to be the to-be-processed item.
  10. 10. The method of claim 8, wherein the summary information includes accent matters and other matters, wherein performing the plurality of subtasks comprises: with the aid of the second machine learning model, Identifying at least one item in the at least one content collection; Combining ones of the at least one item belonging to multiple phases of the same item, and The items of the at least one item are classified into the accent item and the other item based on at least one of a respective length of time, number of participants, item outcome, or a predetermined identification of the at least one item that is consolidated.
  11. 11. The method of claim 8, wherein the summary information comprises a schedule item over a next time horizon, wherein performing the plurality of subtasks comprises: with the aid of the third machine learning model, Identifying at least one item in the at least one content collection having planned semantic content or having semantic content related to the next time horizon, and And determining other matters except the completed matters in the at least one matters as planning matters in the next time range.
  12. 12. An apparatus for generating a summary, comprising: An acquisition module configured to acquire schedule data and document data within a time range indicated by a summary generation request in response to the summary generation request; a topic determination module configured to determine at least one topic based on the calendar data and the document data; A content dividing module configured to generate at least one content set corresponding to each of the at least one topic from the schedule data and the document data based on the at least one topic, and A summary module configured to generate summary information corresponding to the time range based on the at least one content set.
  13. 13. An electronic device, comprising: at least one processing unit, and At least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions when executed by the at least one processing unit cause the electronic device to perform the method according to any one of claims 1 to 11.
  14. 14. A computer readable storage medium having stored thereon a computer program executable by a processor to implement a method according to any one of claims 1 to 11.
  15. 15. A computer program product tangibly stored in a computer storage medium and comprising computer-executable instructions which, when executed by a device, cause the device to perform the method according to any one of claims 1 to 11.

Description

Method, apparatus, device and storage medium for generating summary Technical Field Example embodiments of the present disclosure relate generally to the field of informatization technology and, more particularly, relate to a method, apparatus, device, and computer-readable storage medium for generating a summary. Background With the rapid development of intelligent office, online collaboration and information processing technologies, users continue to generate a great deal of cross-time and cross-source semantic content in daily activities, such as meeting records, project documents, community communication, business communication or personal notes. These content may be distributed among different data sources or applications, with a high degree of semantic relevance but a loose organization. To facilitate duplication, summarization, and information sharing, users typically need to sort and summarize content over a range of times. Therefore, how to implement efficient information organization and multi-dimensional semantic summarization becomes an important technical challenge in the field of intelligent text generation. Disclosure of Invention In a first aspect of the present disclosure, a method for generating a summary is provided. The method includes obtaining schedule data and document data within a time range indicated by a summary generation request in response to the summary generation request, determining at least one topic based on the schedule data and the document data, generating at least one content set corresponding to each of the at least one topic from the schedule data and the document data based on the at least one topic, and generating summary information corresponding to the time range based on the at least one content set. In a second aspect of the present disclosure, an apparatus for generating a summary is provided. The device comprises an acquisition module, a theme determination module, a content division module and a summarization module, wherein the acquisition module is used for responding to a summarization generation request and acquiring schedule data and document data in a time range indicated by the summarization generation request, the theme determination module is used for determining at least one theme based on the schedule data and the document data, the content division module is used for generating at least one content set corresponding to the at least one theme from the schedule data and the document data based on the at least one theme, and the summarization module is used for generating summarization information corresponding to the time range based on the at least one content set. In a third aspect of the present disclosure, an electronic device is provided. The electronic device comprises at least one processing unit, and at least one memory coupled to the at least one processing unit and storing instructions for execution by the at least one processing unit, the instructions when executed by the at least one processing unit cause the electronic device to perform the method of the first aspect of the disclosure. In a fourth aspect of the present disclosure, a computer-readable storage medium is provided. The computer readable storage medium has stored thereon a computer program executable by a processor to perform the method according to the first aspect of the present disclosure. In a fifth aspect of the present disclosure, there is provided a computer program product tangibly stored in a computer storage medium and comprising computer-executable instructions that, when executed by a device, cause the device to perform a method according to the first aspect of the present disclosure. It should be understood that what is described in this section of content is not intended to limit key features or essential features of the embodiments of the present disclosure nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description. Drawings The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals denote like or similar elements, in which: FIG. 1 illustrates a schematic diagram of an example environment in which embodiments of the present disclosure may be implemented; FIG. 2A illustrates a schematic diagram of a process of generating a summary in accordance with some embodiments of the present disclosure; FIG. 2B illustrates a schematic diagram of a process of content reorganization according to some embodiments of the present disclosure; FIG. 2C illustrates a schematic diagram of a process of generating summary information according to some embodiments of the present disclosure; FIG. 3 illustrates a flow chart of a process for generating a summary i