EP-4434004-B1 - MEETING-VIDEO MANAGEMENT ENGINE FOR A MEETING-VIDEO MANAGEMENT SYSTEM

EP4434004B1EP 4434004 B1EP4434004 B1EP 4434004B1EP-4434004-B1

Inventors

TIWARI, NIDHI
HAN, CHAO
GURDASANI, KRISH

Dates

Publication Date: 20260506
Application Date: 20220901

Claims (15)

A computerized system, comprising: at least one computer processor; and computer memory storing computer-useable instructions that, when used by the at least one computer processor, cause the at least one computer processor to perform operations comprising: accessing, at a meeting-video management engine (110), meeting-video content (132) that corresponds to a meeting associated with a user; using a clip-generator machine learning model (144), generating a first clip (344) and a second clip (344) that define video data (134) corresponding to the meeting-video content (132), such that a clip (344) includes a portion of a video that focuses on a speaker or a portion of a video corresponding to an audio slot which corresponds to audio from a speaker; accessing the video data (134), meeting data (136) of the meeting, and user data (138) of the user, wherein the video data (134), the meeting data (136), and the user data (138) are associated with a meeting-video tailoring machine learning model (154) that is trained based on meeting-video tailoring features (152) which correspond to the video data (134), the meeting data (136), and the user data (138), and the meeting-video tailoring machine learning model (154) is configured to generate a plurality of tailored meeting-video segments (242) of the meeting-video content (132) which are tailored according to the video data (134), the meeting data (136), and the user data (138); based on the meeting-video content (132), the video data (134), the meeting data (136), and the user data (138), generating, via the meeting-video tailoring machine learning model (154), a first tailored meeting-video segment (242) and a second tailored meeting-video segment (242) which are tailored according to the video data (134), the meeting data (136), and the user data (138); ranking the first tailored meeting-video segment (242) and the second tailored meeting-video segment (242); and communicating the first tailored meeting-video segment (242) and the second tailored meeting-video segment (242).
The computerized system of claim 1, wherein the first tailored meeting-video segment (242) and the second tailored meeting-video segment (242) are ranked based on an analytical hierarchy process in which weighted scores and weighted costs are computed for the meeting-video tailoring features (152).
The computerized system of claim 1, wherein communicating the first tailored meeting-video segment (242) and the second tailored meeting-video segment (242) causes presentation of the first tailored meeting-video segment (242) and the second tailored meeting-video segment (242) via a client device associated with the user.
The computerized system of claim 3, wherein communicating the first tailored meeting-video segment (242) and the second tailored meeting-video segment (242) comprises transmitting the first tailored meeting-video segment (242) and the second tailored meeting-video segment (242) for presentation in order based on the ranking.
The computerized system of claim 1, the meeting-video tailoring features (152) include video data features, meeting data feature, user data feature, wherein the meeting-video tailoring features (152) represent machine learning metrics relating meeting content, video content, and a user.
The computerized system of claim 1, wherein the meeting-video tailoring features (152) comprise: video data features indicative of: audio features comprising an emotion score, a sound score, a pitch variation score, a silence score, or any combination thereof; speech features comprising question type score, talk section score, repeated content score, or any combination thereof; and video features comprising an identity of a speaker, an emotion of the speaker, a pitch variation of audio, or any combination thereof; user data features indicative of an identity of the user, user preferences, user feedback, a time zone of the user, a role associated with the user, or any combination thereof; and meeting data features indicative of a date of the meeting, the speaker, a planned time duration, and actual time duration, a sponsor, or any combination thereof.
The computerized system of claim 1, wherein the clip-generator machine learning model (144) is trained based on meeting data features or video data features corresponding to the video data (134) and is configured to generate the first clip (344) and the second clip (344), wherein the first tailored meeting-video segment (242) includes the first clip (344), the second clip (344), or both, wherein the meeting data features and the video data features correspond to clip-generator machine learning features.
The computerized system of claim 1, further comprising a data structure storing the meeting-video tailoring features (152) used to train the meeting-video tailoring machine learning model (154), wherein the meeting-video tailoring features (152) include a video data feature, a meeting data feature, and a user data feature each organized in a database as respective records, wherein the video data feature, the meeting data feature, and the user data feature include database entries corresponding to the video data (134), the meeting data (136), and the user data (138), respectively.
A computer-implemented method, comprising: accessing, at a meeting-video management engine (110), meeting-video content (132) associated with a meeting associated with a user; accessing video data (134) of the meeting-video content (132), meeting data (136) of the meeting, and user data (138) of the user, wherein the video data (134), the meeting data (136), and the user data (138) are associated with a meeting-video tailoring machine learning model (154) that is trained based on meeting-video tailoring features (152) which correspond to the video data (134), the meeting data (136), and the user data (138), and the meeting-video tailoring machine learning model (154) is configured to generate a plurality of tailored meeting-video segments (242) of the meeting-video content (132) which are tailored according to the video data (134), the meeting data (136), and the user data (138); based on the meeting-video content (132), the video data (134), the meeting data (136), and the user data (138), generating, via the meeting-video tailoring machine learning model (154), a first tailored meeting-video segment (242) of the plurality of tailored meeting-video segments (242) which is tailored according to the video data (134), the meeting data (136), and the user data (138); and communicating the first tailored meeting-video segment (242) to cause presentation of the first tailored meeting-video segment (242).
The computer-implemented method of claim 9, further comprising: based on the meeting-video content (132), the video data (134), the meeting data (136), and the user data (138), causing the meeting-video tailoring machine learning model (154) to generate a second tailored meeting-video segment (242) which is tailored according to the video data (134), the meeting data (136), and the user data (138); ranking the first tailored meeting-video segment (242) relative to the second tailored meeting-video segment (242); and communicating the second tailored meeting-video segment (242) to cause the presentation of the second tailored meeting-video segment (242) via a client device associated with the user.
The computer-implemented method of claim 10, wherein communicating the first tailored meeting-video segment (242) and the second tailored meeting-video segment (242) comprises transmitting the first tailored meeting-video segment (242) and the second tailored meeting-video segment (242) for the presentation in order based on the ranking, wherein the first tailored meeting-video segment (242) and the second tailored meeting-video segment (242) are ranked based on an analytical hierarchy process in which weighted scores and weighted costs are computed for the meeting-video tailoring features (152).
The computer-implemented method of claim 9, wherein the video data (134) is generated (i) using a clip-generator machine learning model (144) to generate a first clip (344) and a second clip (344) that define the video data (134), wherein a clip (344) includes a portion of a video that focuses on a speaker or a portion of a video corresponding to an audio slot which corresponds to audio from a speaker; and (ii) deriving the video data (134) from the first clip (344) and the second clip (344).
The computer-implemented method of claim 9, wherein the video features comprise: video data features indicative of: audio features comprising an emotion score, a sound score, a pitch variation score; speech features comprising question type score, talk section score, repeated content score, or any combination thereof; and video features comprising an identity of a speaker, an emotion of the speaker, a pitch variation of audio, people in a scene; user data features indicative of an identity of the user, user preferences, user feedback, a time zone of the user, a role associated with the user, or any combination thereof; and meeting data features indicative of a date of meeting, a speaker, a planned time duration, and actual time duration, sponsors, or any combination thereof.
The computer-implemented method of claim 9, wherein the meeting-video tailoring features (152) include video data features, meeting data feature, user data feature, wherein the meeting-video tailoring features (152) represent machine learning metrics relating meeting content, video content, and a user.
One or more computer-storage media having computer-executable instructions embodied thereon that, when executed by a computing system having a processor and memory, cause the processor to: communicate, from a client device associated with a user, a request for meeting-video content (132) corresponding to a conference associated with a first meeting and a second meeting; based on communicating the request, receive a plurality of tailored meeting-video segments (242) from a meeting-video management engine (110), wherein the plurality of tailored meeting-video segments (242) are generated based on (i) video data (134), (ii) meeting data (136) associated with the first meeting and the second meeting, and (iii) user data (138) of the user, wherein the video data (134), the meeting data (136), and the user data (138) are associated with meeting-video tailoring features (152) of a meeting-video tailoring machine learning model (154) that is trained to generate the plurality of tailored meeting-video segments (242) of the meeting-video content (132) based on meeting-video tailoring features (152), the meeting-video tailoring features which correspond to the video data (134), the meeting data (136), and the user data (138) such that the tailored meeting-video segments (242) are tailored according to the video data (134), the meeting data (136), and the user data (138); and causing presentation, on the client device, of a meeting-video graphical user interface element of a meeting-video graphical user interface that controls playback of the plurality of tailored meeting-video segments (242) of the meeting-video content (132).

Description

BACKGROUND Users rely on applications and services to facilitate access to different types of video content. Distributed computing systems (e.g., cloud computing platforms) host video management systems that support networked access to video content. A meeting-video management system can be part of a video management system in a distributed computing system that provides different types of productivity tools from word processing to task management. The meeting-video management system can operate as part of the video management system to provide live and on-demand meeting-videos in association with the different types of productivity tools. In particular, the meeting-video management system performs computing tasks to facilitate meetings. For example, meeting-video management systems support meeting-video calls and supporting meeting operations including secured user access, meeting hosting, recording, and distributing meeting content. Conventionally, meeting-video management systems are not configured with a computing infrastructure or logic to deliver uniquely tailored meeting-video segments. In particular, conventional meeting-video management systems present meeting-video content as full recordings that include irrelevant superfluous video content. Full recordings increase computing resource burden in that users perform additional video review and playback operations when trying to identify relevant video content. As such, a more comprehensive meeting-video management system - with an alternative basis for performing meeting-video management operations - can improve computing operations and interfaces in meeting-video management systems. "NBA Basketball Video Summarization for News Report via Hierarchical-Grained Deep Reinforcement Learning", Naye Ji et al, Image and Graphics: 11th International Conference, ICIG 2021, Haikou, China, August 6-8, 2021, Proceedings, Part III, Aug 2021, pages 712-728, discloses a hierarchical-grained deep reinforcement learning framework to generate a short basketball video. For a long basketball game video, a hierarchical-grained subshot segmentation algorithm is proposed, which takes into account both semantics and objective factors, and preserves spatiotemporal consistency. SUMMARY The invention is defined by the set of appended claims. Various aspects of the technology described herein are generally directed to systems, methods, and computer storage media, for among other things, providing a tailored meeting-video segment associated with a meeting-video management engine of a meeting-video management system. The tailored meeting-video segment - also known as either of the following: a meeting highlight, highlight segment, subset of the meeting-video content, or tailored meeting highlight - corresponds to a portion of meeting-video content that is programmatically generated based on features associated with video data, meeting data, and user data. First, a plurality of clips of the meeting-video content - associated with a meeting and a user - are generated using a clip-generator machine learning model of the meeting-video management engine. Then, the tailored meeting-video segment - or a plurality of tailored meeting-video segments - can be generated by employing a meeting-video tailoring machine learning model of the meeting-video management engine. The features - associated with (1) video data comprising the plurality of clips, (2) meeting data of the meeting, and (3) user data of the user - are meeting-video tailoring features used by the meeting-video tailoring machine learning model to generate the tailored meeting-video segment. The tailored meeting-video segment is communicated to a user to enable uniquely tailored playback of content computed to be relevant to the user. Conventionally, meeting-video management systems are not configured with a computing infrastructure or logic to deliver uniquely tailored meeting-video segments. A technical solution - to the limitations of conventional meeting-video management system operations - provides tailored meeting-video segments via a meeting-video management engine of a meeting-video management system. In operation, the meeting-video management engine accesses meeting-video content associated with a meeting associated with a user. For example, the meeting-video content may include video data, meeting data, and user data. The video data may be derived from a first clip and a second clip generated via the clip-generator machine learning model. The video data, the meeting data, and the user data are associated with meeting-video tailoring features of a meeting-video tailoring machine learning model that is trained to generate tailored meeting-video segments. Based on the video data, the meeting data, and the user data, the meeting-video management engine generates a first tailored meeting-video segment and a second tailored meeting-video segment that are ranked with respect to one another. The meeting-video management engin