CN-121996786-A - Conference summary automatic generation method, device, equipment and medium
Abstract
The invention relates to the technical field of artificial intelligence and discloses a conference summary automatic generation method, device, equipment and medium, which comprise the steps of acquiring conference images, conference texts and conference voices of a target conference, performing text conversion on the conference voices to obtain conversion texts, performing cross-modal semantic alignment on the conference images, the conference texts and the conversion text sequences to obtain semantic networks, extracting structural information of the semantic networks, constructing conference discussion flow patterns based on the conversion texts and time stamps of the structural information, generating discussion content summaries according to the conference discussion flow patterns, constructing conference knowledge patterns according to the semantic networks, generating conference layered summaries according to the conference knowledge patterns, summarizing the discussion content summaries and the conference layered summaries to obtain conference content summaries, and improving the accuracy of conference summaries.
Inventors
- CHEN XIAOJUN
- ZHOU YIFENG
- YU LIANJIE
- LIU JIANFENG
- HE SHENGLEI
- YUE TONG
Assignees
- 招商局融资租赁有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20251210
Claims (10)
- 1. An automatic meeting summary generation method is characterized by comprising the following steps: Acquiring a conference image, a conference text and conference voice of a target conference, and performing text conversion on the conference voice to obtain a converted text; performing cross-modal semantic alignment on the conference image, the conference text and the converted text sequence to obtain a semantic network; Extracting structural information of the semantic network, constructing a conference discussion flow chart based on the conversion text and a timestamp of the structural information, and generating discussion content summary according to the conference discussion flow chart; and constructing a conference knowledge graph according to the semantic network, generating a conference layered abstract according to the conference knowledge graph, and summarizing the discussion content summary and the conference layered abstract to obtain a conference content summary.
- 2. The automatic conference summary generating method according to claim 1, wherein said text converting the conference voice to obtain a converted text comprises: Identifying non-mute segments in the conference voice based on voice activity detection to obtain a segmented audio block sequence; Voiceprint recognition is carried out on the segmented audio block sequence, so that voiceprint recognition data are obtained; performing speaker identity marking on the segmented audio block sequence according to the voiceprint identification data based on a preset voiceprint library to obtain an identity marked audio block; Generating a recognition text with a time stamp according to the identity-marked audio block by utilizing a pre-trained voice recognition model; and carrying out term correction on the identification text by combining the context and a preset domain glossary to obtain a converted text.
- 3. The method for automatically generating a meeting summary according to claim 1, wherein the cross-modal semantic alignment of the meeting image, the meeting text, and the converted text sequence to obtain a semantic network comprises: Performing OCR text extraction and target detection on the conference image to obtain an image semantic unit; carrying out structural analysis on the conference text to obtain a structural text; mapping the image semantic unit, the structured text and the converted text sequence to a unified time axis to obtain a time synchronization multi-mode stream; Performing cross-modal entity connection on the time synchronization multi-modal flow to obtain an entity connection map; performing joint semantic coding on the entity connection map and the time synchronization multi-mode stream to obtain a cross-mode semantic vector sequence; and constructing the semantic network according to the cross-modal semantic vector sequence and the entity connection map.
- 4. The automatic meeting summary generation method of claim 1, wherein the extracting the structured information of the semantic network comprises: performing type recognition on the semantic network by utilizing a pre-trained graph neural network to obtain a classification mark point set; Extracting a relationship path from the classification mark point set based on the relationship edge of the semantic network to obtain a structured relationship path; Performing time sequence constraint binding on the structured relation path based on the timestamp of the semantic network to obtain a structured unit; and connecting the multi-modal evidence contained in the semantic network to the structuring unit to obtain structuring information.
- 5. The automatic meeting summary generation method of claim 1, wherein the constructing a meeting discussion flow graph based on the converted text and the timestamp of the structured information comprises: performing time sequence time node identification on the converted text and the structured information to obtain a time sequence time node set; Performing time topological sorting on the time sequence time node set to obtain an event sequence; Carrying out semantic recognition on the converted text to obtain the converted text semantic; identifying a logical relationship of the converted text semantics according to the event sequence; and generating a conference discussion flow map according to the logic relation and the event sequence.
- 6. The automatic meeting summary generating method according to claim 1, wherein said constructing a meeting knowledge graph from the semantic network comprises: extracting a core entity and an original relation edge of the semantic network; normalizing the core entity to obtain a normalized entity set; performing type labeling on the original relation edges to obtain an effective relation edge set; binding attribute information of the normalized entity set and the effective relation edge set to obtain an entity relation structure; performing hierarchical constraint on the entity relationship structure to obtain a hierarchical entity relationship structure; And generating a conference knowledge graph according to the hierarchical entity relationship structure.
- 7. The method for automatically generating a meeting summary according to claim 1, wherein the generating a meeting hierarchical summary according to the meeting knowledge graph comprises: a core path node sequence of the conference knowledge graph is calculated and identified based on node importance; performing hierarchical abstract template injection on the core path node sequence based on preset user role configuration to obtain a hierarchical abstract framework; acquiring node description of each node in the conference knowledge graph, and carrying out semantic concentration and evidence binding according to the hierarchical abstract skeleton and the node description to obtain abstract paragraphs with evidence marks; and generating a conference layered abstract according to the abstract segment falling span hierarchical association with the evidence mark.
- 8. An automatic meeting summary generating device, comprising: The data conversion module is used for acquiring a conference image, a conference text and conference voice of a target conference, and performing text conversion on the conference voice to obtain a conversion text; The semantic alignment module is used for performing cross-modal semantic alignment on the conference image, the conference text and the conversion text sequence to obtain a semantic network; The map construction module is used for extracting the structural information of the semantic network, constructing a conference discussion flow map based on the conversion text and the timestamp of the structural information, and generating discussion content summary according to the conference discussion flow map; And the summary generation module is used for constructing a conference knowledge graph according to the semantic network, generating a conference layering abstract according to the conference knowledge graph, and summarizing the discussion content summary and the conference layering abstract to obtain a conference content summary.
- 9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the conference summary automatic generation method according to any of claims 1 to 7 when executing the computer program.
- 10. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the conference summary automatic generation method according to any one of claims 1 to 7.
Description
Conference summary automatic generation method, device, equipment and medium Technical Field The invention relates to the technical field of artificial intelligence, in particular to an automatic meeting summary generation method, device, equipment and medium. Background Meeting is a core scene of organizing communication decision-making and advancing work, and accurate recording and efficient transmission of contents (such as discussion views, decision results and task allocation) are of great importance to team cooperation. Along with the popularization of remote office, conference forms are more and more diversified, multi-mode information such as video pictures, shared screens (PPT/whiteboard), real-time chat, voice conversations and the like is covered, and higher requirements are put on the comprehensiveness and integration capability of information processing. The existing conference processing tool has remarkable limitations that on one hand, single modes are processed independently (such as a voice-to-text tool only outputs text, a screen sharing tool only records images), cross-mode association is lacked (such as scheme B mentioned by voice cannot be corresponding to PPT pages displayed synchronously), information fragmentation is caused, on the other hand, voice-to-text is often wrong due to professional terms and accent problems, speaker identity labeling is lacked, content attribution judgment is influenced, in addition, conference summary is mostly simple text stacking, differentiated information cannot be generated according to roles without logical structuring according to the problem-decision-task, core content extraction efficiency is low, and meanwhile, key entities (such as schemes and tasks) in the conference and the relation thereof do not form a knowledge system, so that subsequent multiplexing and traceability are difficult to support. Disclosure of Invention The invention provides an automatic meeting summary generation method, device, computer equipment and medium, which are used for solving the problem of low accuracy of the existing meeting summary method in the current market. In a first aspect, a method for automatically generating a meeting summary is provided, including: Acquiring a conference image, a conference text and conference voice of a target conference, and performing text conversion on the conference voice to obtain a converted text; performing cross-modal semantic alignment on the conference image, the conference text and the converted text sequence to obtain a semantic network; Extracting structural information of the semantic network, constructing a conference discussion flow chart based on the conversion text and a timestamp of the structural information, and generating discussion content summary according to the conference discussion flow chart; and constructing a conference knowledge graph according to the semantic network, generating a conference layered abstract according to the conference knowledge graph, and summarizing the discussion content summary and the conference layered abstract to obtain a conference content summary. In a second aspect, there is provided an automatic meeting summary generating apparatus, including: The data conversion module is used for acquiring a conference image, a conference text and conference voice of a target conference, and performing text conversion on the conference voice to obtain a conversion text; The semantic alignment module is used for performing cross-modal semantic alignment on the conference image, the conference text and the conversion text sequence to obtain a semantic network; The map construction module is used for extracting the structural information of the semantic network, constructing a conference discussion flow map based on the conversion text and the timestamp of the structural information, and generating discussion content summary according to the conference discussion flow map; And the summary generation module is used for constructing a conference knowledge graph according to the semantic network, generating a conference layering abstract according to the conference knowledge graph, and summarizing the discussion content summary and the conference layering abstract to obtain a conference content summary. In a third aspect, a computer device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor executing the computer program to perform the steps of the above-described conference summary automatic generation method. In a fourth aspect, a computer-readable storage medium is provided, in which a computer program is stored, which when executed by a processor implements the steps of the above-described conference summary automatic generation method. According to the scheme realized by the conference summary automatic generation method, the conference summary automatic generation device, the conference text and the conferen