US-12621411-B2 - Methods and apparatus for displaying, compressing and/or indexing information relating to a meeting

US12621411B2US 12621411 B2US12621411 B2US 12621411B2US-12621411-B2

Abstract

A method of visualising a meeting between one or more participants on a display includes, in an electronic processing device, the steps of: determining a plurality of signals, each of the plurality of signals being at least partially indicative of the meeting; generating a plurality of features using the plurality of signals, the features being at least partially indicative of the signals; generating at least one of: at least one phase indicator associated with the plurality of features, the at least one phase indicator being indicative of a temporal segmentation of at least part of the meeting; and at least one event indicator associated with the plurality of features, the at least one event indicator being indicative of an event during the meeting. The method also includes the step of causing a representation indicative of the at least one phase indicator and/or the at least one event indicator to be displayed on the display to thereby provide visualisation of the meeting.

Inventors

Christopher RAETHKE
Saxon FLETCHER
Jaco DU PLESSIS
Andrew CUPPER
Iain McCowan

Assignees

PINCH LABS PTY LTD.

Dates

Publication Date: 20260505
Application Date: 20240228
Priority Date: 20190412

Claims (20)

1 . A method for summarizing a conversation, the method comprising, in an electronic processing device, the steps of: determining phases of the conversation from data representative of the conversation; generating textual phase indicators that describe the phases of the conversation; detecting moments within the conversation from the data representative of the conversation; generating textual moment indicators that describe the moments within the conversation; generating text summaries from audio data associated with the phases and/or the detected moments; and providing a textual representation of the conversation, which includes: the phases and moments of the conversation described by the textual phase indicators and the textual moment indicators; and the text summaries of the associated phases and/or detected moments.
2 . The method according to claim 1 , wherein the step of generating the textual moment indicators includes the step of classifying the moments by training a supervised classifier on a labelled dataset using a neural network.
3 . The method as claimed in claim 1 , wherein the method includes, in an electronic processing device, using multi-modal signals that are indicative of the conversation, the data representative of the conversation being included in the multi-modal signals.
4 . The method according to claim 3 , wherein the step of determining phases of the conversation includes the steps of: generating a plurality of feature vectors wherein each feature vector is generated using a respective one of the multi-modal signals; generating a plurality of multi-modal feature vectors using the feature vectors; and classifying the multi-modal feature vectors.
5 . The method according to claim 3 , wherein the step of detecting moments within the conversation includes the steps of: generating a plurality of feature vectors wherein each feature vector is generated using a respective one of the multi-modal signals; generating a plurality of multi-modal feature vectors using the feature vectors; classifying the multi-modal feature vectors.
6 . The method according to claim 1 , wherein the method includes, in an electronic processing device, the steps of: determining at least one moment type associated with each moment; and generating textual indicators that describe the moment types, wherein the step of providing the representation of the phases and moments of the conversation includes the step of providing a representation of the moment types with the textual indicators.
7 . The method according to claim 6 , wherein the step of determining at least one moment type includes the step of determining any one of more of: a. an action; b. a mood; c. a sentiment; d. a highlight; e. a question; f. recapitulation; g. a milestone; and h. an event type determined in accordance with user input.
8 . The method according to claim 1 , wherein the method includes, in an electronic processing device, the steps of: generating graphical phase indicators that represent the phases of the conversation; and generating graphical moment indicators that represent the moments within the conversation.
9 . The method according to claim 1 , wherein the step of determining phases of the conversation includes the step of generating temporal indicators, and the step of providing a representation of the phases includes the step of providing a graphical representation of the phases using the temporal indicators.
10 . The method according to claim 1 , wherein the step of determining phases of the conversation includes the step of using the detected moments to determine the phases of the conversation.
11 . The method according to claim 1 , wherein the step of detecting moments within the conversation includes the step of anomaly detection.
12 . The method according to claim 1 , wherein the method includes, in the electronic processing device, determining at least one of: a. at least one phase parameter; and b. at least one event parameter, wherein the phase and moment indicators are in accordance with at least one of the phase and event parameters.
13 . The method according to claim 1 , wherein the step of generating textual phase indicators that describe the phases of the conversation includes the step of incorporating summary keywords relating to the phases.
14 . The method according to claim 1 , wherein the step of generating textual moment indicators that describe the moments within the conversation includes the step of incorporating summary keywords relating to the moments.
15 . The method according to claim 1 , wherein the steps of determining phases of the conversation and detecting moments within the conversation includes the step of extracting cues from a variety of sources.
16 . The method according to claim 1 , wherein the steps of determining phases of the conversation and detecting moments within the conversation includes the step of extracting cues from information relating to a context of the conversation and generating the textual phase and moment indicators from those cues.
17 . The method according to claim 16 , wherein the information relating to the context of the conversation includes information that is of significance to a user, including, but not limited to industries, companies, professions, geographic location, and meeting types.
18 . The method according to claim 1 , wherein the method includes, in the electronic processing device, the steps of: receiving an input selection associated with at least one of the phase indicator and the moment indicator; selectively updating the representation to include an indication of one or more features, respectively, associated with the selected one of the phase indicator and/or the event indicator; and causing the update representation to be displayed.
19 . The method according to claim 1 , wherein the representation includes at least one temporal indicator indicative of a time range of the meeting.
20 . An apparatus for summarizing a conversation, the apparatus including an electronic processing device that is configured for carrying out the steps of: determining phases of the conversation from data representative of the conversation; generating textual phase indicators that describe the phases of the conversation; detecting moments within the conversation from the data representative of the conversation; generating textual moment indicators that describe the moments within the conversation, respectively; generating text summaries from audio data associated with the phases and/or detected moments, respectively; and providing a textual representation of the conversation, which includes: the phases and moments of the conversation described by the textual phase indicators and the textual moment indicators; and the text summaries of the associated phases and/or detected moments.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. application Ser. No. 17/602,821 filed Oct. 11, 2021, which is a 35 U.S.C. § 371 national stage filing of International Application No. PCT/AU2020/000029 filed Apr. 9, 2020 which claims benefit of Australian Patent Application No. 2019 901276 filed Apr. 12, 2019, the entireties of which are incorporated herein by reference. FIELD OF THE INVENTION The present invention relates to an apparatus and/or methods of displaying, compressing and/or indexing information, including information relating to meetings, and, more particularly, a plurality of signals indicative of a meeting between one or more participants. BACKGROUND Meetings among groups of participants are common in a business and industry context. Typically, in such settings one or more of the participants may be tasked with minuting events which occur during the meeting, including action points, discussion outcomes, and the like. However, such a system suffers from a number of drawbacks, as meeting minutes can be insufficient, biased, or can fail to be recorded or saved. Moreover, minutes can fail to be distributed or be difficult to access if in the possession of one or two participants. While it is possible to maintain audio recordings and/or transcriptions of meetings, it can be difficult to discern a summary of events or action points from such recordings. U.S. Pat. No. 7,298,930, incorporated herein by reference, describes meeting recordal that captures multimodal information of a meeting. Subsequent analysis of the information produces scores indicative of visually and aurally significant events that can help identify significant segments of the meeting recording. Textual analysis can enhance searching for significant meeting segments and otherwise enhance the presentation of the meeting segments. In McCowan, I et al. 2005, ‘Automatic Analysis of Multimodal Group Actions in Meetings’, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 3, pp. 305-317, incorporated herein by reference, the paper investigates the recognition of group actions in meetings. A framework is employed in which group actions result from the interactions of the individual participants. The group actions are modelled using different HMM-based approaches, where the observations are provided by a set of audiovisual features monitoring the actions of individuals. Experiments demonstrate the importance of taking interactions into account in modelling the group actions. It is also shown that the visual modality contains useful information, even for predominantly audio-based events, motivating a multimodal approach to meeting analysis. Wellner et al. 2005, ‘A Meeting Browser Evaluation Test’, CHI Extended Abstracts, incorporated herein by reference, introduces a browser evaluation test (BET) and describes a trial run application of the test. BET is a method for assessing meeting browser performance using the number of observations of interest found in the minimum amount of time as the evaluation metric, where observations of interest are statements about a meeting collected by independent observers. The resulting speed and accuracy scores aim to be objective, comparable and repeatable. Erol, B et al. 2003, ‘Multimodal summarisation of meeting recordings’, 2003 International Conference on Multimedia and Expo ICME'03, vol. 3 proposes a new method for creating meeting video skims based on audio and visual activity analysis together with text analysis. Audio activity analysis is performed by analyzing sound directions, that indicate different speakers, and audio amplitude. Detection of important visual events in a meeting is achieved by analysing the localized luminance variations in consideration with the omni-directional property of the video captured by a meeting recording system. Text analysis is based on a term frequency-inverse document frequency measure. The resulting video skims better capture the important meeting content compared to the skims obtained by uniform sampling. The inclusion of any information provided in this background or in the documents referenced in this background is not to be regarded as an admission that such information should be regarded as prior art information that is relevant to the appended claims. SUMMARY In a first broad form, the present invention seeks to provide a method of visualising a meeting between one or more participants on a display, the method including, in an electronic processing device, the steps of: determining a plurality of signals, each of the plurality of signals being at least partially indicative of the meeting;generating a plurality of features using the plurality of signals, the features being at least partially indicative of the signals;generating at least one of: at least one phase indicator associated with, or in accordance with, the plurality of features, the at least one phase indicator being indicative of