CN-121986344-A - Integrated multi-modal neural network platform for generating content based on scalable sensor data

CN121986344ACN 121986344 ACN121986344 ACN 121986344ACN-121986344-A

Abstract

The application relates to an integrated multi-modal neural network. The computer system obtains sensor data from a plurality of sensor apparatuses during a duration, and the plurality of sensor apparatuses includes at least two different sensor types and is disposed in a physical environment. One or more signature events are detected in the sensor data and, independently of the sensor type of the sensor device, one or more information items are generated to characterize the one or more signature events detected in the sensor data. A large behavioral model is applied to process the one or more information items and generate a multimodal output associated with the sensor data. The multimodal output describes the signature event associated with the sensor data in one of a plurality of predefined output modalities. The multi-modal output is presented according to the one of the plurality of predefined output modalities.

Inventors

I. Pupilev
B. Babelo
L Justy
LIEN JAIME
N. E. Gillian

Assignees

阿克泰普人工智能公司

Dates

Publication Date: 20260505
Application Date: 20240823
Priority Date: 20230824

Claims (20)

1. A method for compressing sensor data, comprising: At a computer system having one or more processors and memory: Obtaining the sensor data from a plurality of sensor apparatuses disposed in a physical environment during a duration, each sensor apparatus corresponding to a time sequence of respective sensor samples; For each of the plurality of sensor devices, processing the time series of respective sensor samples to generate an ordered sequence of respective sensor data features, independent of a sensor type of the respective sensor device, the ordered sequence defining a respective parameterized representation of the time series of respective sensor samples; Detecting one or more signature events for the duration based on the respective parameterized representations of the plurality of sensor devices, and One or more information items characterizing the one or more signature events detected in the sensor data are generated.
2. The method of claim 1, wherein processing the time series of respective sensor samples further comprises: receiving the time series of corresponding sensor samples at an input of a sensor data encoder model, and The ordered sequence of respective sensor data features is generated by the sensor data encoder model based at least on the time sequence of respective sensor samples.
3. The method of claim 2, wherein processing the time series of respective sensor samples further comprises: The time data and the time series of respective sensor samples are jointly received at the input of the sensor data encoder model, wherein the time data includes one or more of a sequence of time stamps, a length of the duration, and a sampling rate of the time series of respective sensor samples.
4. A method according to claim 2 or 3, wherein the sensor data encoder model is applied independent of the type of each of the one or more sensor devices.
5. The method of any of claims 1-4, wherein the respective parameterized representation includes an nth order polynomial representation of highest power of sampling time equal to N, where N is a positive integer, and the ordered sequence of respective sensor data features includes n+1 data features, each of the data features corresponding to a different coefficient of the nth order polynomial representation.
6. The method of any one of claims 1-5, wherein the sensor data includes a time series of sensor data, and obtaining the sensor data further comprises: obtaining a context data stream continuously measured by the plurality of sensor devices, the context data stream comprising the time series of respective sensor samples grouped for each sensor device based on a time window, the time window configured to move with a time axis, and Each sensor data item of the time series of sensor data is associated with a respective timestamp and a subset of respective sensor samples grouped based on the time window.
7. The method of any one of claims 1 to 6, further comprising: The one or more information items associated with the one or more signature events are stored in the memory, the one or more information items including a timestamp and a location of each of the one or more signature events.
8. The method of any of claims 1-7, wherein a generic event projection model is applied to process the respective parameterized representations of the plurality of sensor devices and to generate the one or more information items characterizing the one or more signature events.
9. The method of claim 8, wherein each of the respective parameterized representations is associated with a sensor tag indicating a type of a respective sensor device, and the sensor tags of each of the respective parameterized representations and the plurality of sensor devices are jointly input into the generic event projection model in a predefined data format.
10. The method of claim 8 or 9, wherein the respective parameterized representations of the plurality of sensor devices are input into the generic event projection model in a predefined order determined based on respective types of the respective sensor devices.
11. The method of any one of claims 1 to 10, wherein for each of a subset of the plurality of sensor apparatuses, an individual projection model is applied to process the respective parameterized representation and to generate a subset of the one or more information items.
12. The method of any of claims 1-11, wherein detecting the one or more signature events further comprises, for a time window corresponding to a subset of sensor data: machine learning is used to process the subset of sensor data within the respective time window and detect one or more signature events.
13. The method of any one of claims 1-12, wherein the plurality of sensor devices includes one or more of a presence sensor, a proximity sensor, a microphone, a motion sensor, a gyroscope, an accelerometer, a radar, a lidar scanner, a camera, a temperature sensor, a heartbeat sensor, and a respiration sensor.
14. The method of any one of claims 1 to 13, further comprising: Instead of the sensor data obtained from the plurality of sensor devices, the ordered sequence of corresponding sensor data features or the one or more information items are stored in a database.
15. The method of claim 14, further comprising, after obtaining the sensor data: The sensor data is processed to continuously and iteratively generate one or more sets of intermediate items until the one or more information items are generated.
16. The method as recited in claim 15, further comprising: Processing the sensor data to generate a first set of intermediate items at a first time; Storing the first set of intermediate items in the database; Processing the first set of intermediate items to continuously generate one or more second sets of intermediate items at one or more continuous second times subsequent to the first time; Continuously storing the one or more second sets of intermediate items in the database and deleting the first set of intermediate items from the database, and Processing a most recent intermediate set of the one or more second intermediate sets of items to generate the one or more information items at a third time subsequent to the one or more consecutive second times.
17. The method of any one of claims 1 to 16, further comprising: A large behavioral model is applied to process the one or more information items and generate a multimodal output associated with the sensor data that describes the one or more signature events associated with the sensor data in one of a plurality of predefined output modalities, wherein the large behavioral model includes a Large Language Model (LLM).
18. The method of claim 17, wherein the multimodal output includes one or more of a description, a timestamp, digital information, a statistical summary, a warning message, and a recommended action associated with one or more signed events, and the plurality of predefined output modalities includes one or more of a text statement, a software code, an image or video, an information dashboard having a predefined format, a user interface, and a heat map.
19. A computer system, comprising: One or more processors, and A memory having instructions stored thereon that, when executed by the one or more processors, cause the processors to perform the method of any of claims 1-18.
20. A non-transitory computer-readable storage medium having instructions stored thereon, which when executed by one or more processors, cause the one or more processors to perform the method of any of claims 1-18.

Description

Integrated multi-modal neural network platform for generating content based on scalable sensor data RELATED APPLICATIONS The present application is a continuation of and claims priority to U.S. provisional application No. 63/578,460 entitled "integrated multimodal neural network platform （Integrated Multimodal Neural Network Platform for Generating Content based on Scalable Sensor Data）" for generating content based on scalable sensor data," filed on month 8 of 2023, which is incorporated by reference in its entirety. Technical Field The present application relates generally to data processing, including but not limited to building an integrated multi-modal neural network platform to apply a large language model to process multi-modal data (e.g., sensor data and content data) and to produce user-friendly output that is convenient for users and their client devices to perceive. Disclosure of Invention The present disclosure provides an integrated multimodal neural network platform to process sensor data and content data (e.g., text, audio, image, video data) to produce user defined outputs (e.g., one or more of narrative messages, program code, and user interfaces). An integrated multimodal neural network platform includes a server system configured to collect sensor data from one or more sensors, generate one or more information items characterizing the sensor data, and apply a neural network (e.g., deep neural network, large Language Model (LLM)) to process the one or more information items and generate a Neural Network (NN) output (e.g., LLM output). The one or more sensors include one or more of a presence sensor, a proximity sensor, a microphone, a motion sensor, a gyroscope, an accelerometer, a radar, a lidar scanner, a camera, a temperature sensor, a heartbeat sensor, and a respiration sensor. In some embodiments, the one or more sensors include a plurality of sensors distributed at a venue or distributed across different venues. The collected sensor data includes one or more of image data, video data, audio data, analog electrical signals, digital electrical signals, and digital data. In the present application, LLM is used as an example of a deep neural network. In some implementations, the deep neural network includes a large transducer model. In some embodiments, the neural network applied to the integrated multimodal neural network platform to process sensor data and content data and produce user-defined output is also referred to as a Large Behavioral Model (LBM). LBM is a generic physical Artificial Intelligence (AI) base model configured to address physical use cases across various application vertical fields (application verticals) and sensor data types. In one aspect of the application, a method for compressing sensor data is implemented at a computer system. The method includes obtaining sensor data from a plurality of sensor devices disposed in a physical environment during a time duration, each sensor device corresponding to a time sequence of respective sensor samples, processing, for each of the plurality of sensor devices, the time sequence of respective sensor samples, independent of a sensor type of the respective sensor device, to generate an ordered sequence of respective sensor data features defining respective parameterized representations of the time sequence of respective sensor samples, detecting one or more signature events over the time duration based on the respective parameterized representations of the plurality of sensor devices, and generating one or more information items characterizing the one or more signature events detected in the sensor data. In another aspect of the application, a method for presenting sensor data is implemented at a computer system. The method includes obtaining sensor data from a plurality of sensor devices over a duration of time, the plurality of sensor devices including at least two different sensor types and disposed in a physical environment, detecting one or more signature events in the sensor data, generating one or more information items representative of the one or more signature events detected in the sensor data independent of the sensor types of the plurality of sensor devices, applying a large behavioral model to process the one or more information items and generate a multimodal output associated with the sensor data, the multimodal output describing the one or more signature events associated with the sensor data in one of a plurality of predefined output modalities, and presenting the multimodal output in accordance with one of the plurality of predefined output modalities. In yet another aspect of the application, a method for presenting sensor data is implemented at a computer system. The method includes obtaining sensor data from a plurality of sensor devices disposed in a physical environment during a duration, generating one or more information items characterizing one or more signature events detected in the sensor data ov