US-20260126953-A1 - SYSTEM AND METHOD OF ENABLING DIGITAL PHOTO FRAME WITH VOICE INTERACTION AND INTELLIGENCE GENERATION FUNCTIONS

US20260126953A1US 20260126953 A1US20260126953 A1US 20260126953A1US-20260126953-A1

Abstract

A system of enabling digital photo frame with voice interaction and intelligence generation and a method thereof are provided. In the system, a digital photo frame provides an interaction voice to an image edition and modification server, which uses a speech-to-text technology to convert the interaction voice to a text message and provides the text message to an artificial intelligence (AI) platform through an application programming interface (API), the AI platform transmits back a text response containing an execution instruction; the image edition and modification server uses an artificial general intelligence (AGI) to execute the execution instruction, obtain a digital image from the digital photo frame, and edit or/and modify to form an edited digital image; based on the multimodal information contained in the edited digital image, the digital photo frame displays and plays the edited digital image based on the multimodal information contained in the edited digital image.

Inventors

Chuan-Cheng Chiu
Hong Fu
Zhuo-Jia Bian

Assignees

SQ Technology (Shanghai) Corporation
INVENTEC CORPORATION

Dates

Publication Date: 20260507
Application Date: 20250115
Priority Date: 20241101

Claims (8)

1 . A system of enabling a digital photo frame with voice interaction and intelligence generation functions, comprising: a digital photo frame, configured to obtain an interaction voice through a microphone device which is embedded in the digital photo frame or externally connected to the digital photo frame, provide the interaction voice, obtain at least one edited digital image, and display the at least one edited digital image based on multimodal information contained in the at least one edited digital image, or display the at least one edited digital image based on the multimodal information contained in the at least one edited digital image and play the at least one edited digital image based on the multimodal information through a speaker device which is embedded in the digital photo frame or externally connected to the digital photo frame; an artificial intelligence (AI) platform, configure to obtain a text message through an application programming interface (API), provide the text message to a large language model to generate a text response containing at least one execution instruction, and transmit back the text response through the API; and an image edition and modification server, comprises: a non-transitory computer readable storage medium, configured to store computer readable instructions; and a hardware processor, electrically connected to the non-transitory computer readable storage medium, and configured to execute the computer readable instruction to make the image edition and modification server operate: obtaining the interaction voice from the digital photo frame; using the speech-to-text technology to convert the interaction voice of voice modal into the text message of text modal; providing the text message to the AI platform through the API, and obtain the text response through the API; using an artificial general intelligence (AGI) to execute the at least one execution instruction contained in the text response to obtain at least one digital image from the digital photo frame and edit or modify the obtained digital image to form at least one edited digital, or obtain the at least one digital image from the digital photo frame and edit and modify the obtained digital image to form the at least one edited digital; and providing the at least one edited digital image to the digital photo frame.
2 . The system of enabling digital photo frame with voice interaction and intelligence generation functions according to claim 1 , wherein when the at least one digital image is a three-dimensional digital image, the AGI selects a 3D modeling tool which was used to create the at least one digital image, to edit or modify the at least one digital image to form the at least one edited digital image, or edit and modify the at least one digital image to form the at least one edited digital image.
3 . The system of enabling digital photo frame with voice interaction and intelligence generation functions according to claim 1 , wherein the obtaining the at least one digital image from the digital photo frame and editing or modify the at least one digital image to form the at least one edited digital image, or editing and modify the at least one digital image to form the at least one edited digital image, comprises: performing annotation, classification, selection, display setting, background music adding or AI edition on the at least one digital image to edit or modify the at least one digital image to form the at least one edited digital image, or edit and modify the at least one digital image to form the at least one edited digital image.
4 . The system of enabling digital photo frame with voice interaction and intelligence generation functions according to claim 1 , wherein when the digital photo frame is a 3D digital photo frame, the 3D digital photo frame provides an operator to control to rotate or move the at least one digital image or the at least one edited digital image through touch control or gesture control.
5 . A method of enabling digital photo frame with voice interaction and intelligence generation functions, comprising: obtaining an interaction voice through a microphone device which is embedded in the digital photo frame or externally connected to the digital photo frame, and providing the interaction voice to an image edition and modification server, by a digital photo frame; using a speech-to-text technology to convert the interaction voice of voice modal into a text message of text modal, by the image edition and modification server; providing the text message to an AI platform through an API, by the image edition and modification server; providing the text message to a large language model to generate a text response containing at least one execution instruction, by the AI platform; transmitting back the text response to the image edition and modification server through the API, by the AI platform; using an AGI to execute the at least one execution instruction contained in the text response, obtaining at least one digital image from the digital photo frame, and editing or modifying the at least one digital image to form at least one at least one edited digital image, or obtaining the at least one digital image from the digital photo frame, and editing and modifying the at least one digital image to form the at least one edited digital image, by the image edition and modification server; providing the at least one edited digital image to the digital photo frame, by the image edition and modification server; and based on the multimodal information contained in the at least one edited digital image, displaying the at least one edited digital image, or displaying at least one edited digital image and playing the at least one edited digital image through a speaker device which is embedded in the digital photo frame or externally connected to the digital photo frame, by the digital photo frame.
6 . The method of enabling digital photo frame with voice interaction and intelligence generation according to claim 5 , wherein when the at least one digital image is a three-dimensional digital image, the AGI selects a 3D modeling tool which was used to create the at least one digital image, to edit or modify the at least one digital image to form the at least one edited digital image, or edit and modify the at least one digital image to form the at least one edited digital image.
7 . The method of enabling digital photo frame with voice interaction and intelligence generation according to claim 5 , wherein the obtaining the at least one digital image from the digital photo frame and editing or modify the at least one digital image to form the at least one edited digital image, or editing and modify the at least one digital image to form the at least one edited digital image, comprises: performing annotation, classification, selection, display setting, background music adding or AI edition on the at least one digital image to edit or modify the at least one digital image to form the at least one edited digital image, or edit and modify the at least one digital image to form the at least one edited digital image.
8 . The method of enabling digital photo frame with voice interaction and intelligence generation according to claim 5 , wherein when the digital photo frame is a 3D digital photo frame, the 3D digital photo frame provides an operator to control to rotate or move the at least one digital image or the at least one edited digital image through touch control or gesture control.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an interaction and generation system and a method thereof, particularly to a system of enabling digital photo frame with voice interaction and intelligence generation that provides voice interaction and AI-based intelligent to generate an edited/modified digital image, and a method thereof. 2. Description of the Related Art A digital photo frame is an electronic device designed specifically to display a digital image. In addition to displaying a single digital image, the digital photo frame allows the selection of some or all digital images and displays the selected digital images in a loop or randomly based on a set time interval. Therefore, the digital photo frame necessarily provides an operational interface. With technological advancements, the integration of various industries with artificial intelligence (AI) has become the main direction of current industry development. If AI can be integrated with the digital photo frame, in addition to the general configuration features of the digital photo frame, the digital photo frame can further have diverse intelligent functions such as annotation, classification, selection, display settings, and AI edition and modification in the backend. It is obvious that the digital photo frame can be improved with AI. According to above-mentioned contents, what is needed is to develop an improved solution to the problem that the existing digital photo frames only provide overly monotonous and simple operations. SUMMARY OF THE INVENTION An objective of the present invention is to disclose a system of enabling digital photo frame with voice interaction and intelligence generation and a method thereof, to solve the problem that the existing digital photo frames only provide overly monotonous and simple operations. To achieve the objective, the present invention discloses a system of enabling digital photo frame with voice interaction and intelligence generation, and the system includes a digital photo frame, an AI platform and an image edition and modification server. The image edition and modification server includes a non-transitory computer readable storage medium and a hardware processor. The digital photo frame is configured to obtain an interaction voice through a microphone device which is embedded in the digital photo frame or externally connected to the digital photo frame, provide the interaction voice, obtain at least one edited digital image, and display the at least one edited digital image based on multimodal information contained in the at least one edited digital image, or display the at least one edited digital image based on the multimodal information contained in the at least one edited digital image and play the at least one edited digital image based on the multimodal information through a speaker device which is embedded in the digital photo frame or externally connected to the digital photo frame. The AI platform is configured to obtain a text message through an application programming interface (API), provide the text message to a large language model to generate a text response containing at least one execution instruction, and transmit back the text response through the API. The non-transitory computer readable storage medium is configured to store computer readable instructions. The hardware processor is electrically connected to the non-transitory computer readable storage medium, and configured to execute the computer readable instruction to make the image edition and modification server operate: obtaining the interaction voice from the digital photo frame; using the speech-to-text technology to convert the interaction voice of voice modal into the text message of text modal; providing the text message to the AI platform through the API, and obtain the text response through the API; using an artificial general intelligence (AGI) to execute the at least one execution instruction contained in the text response to obtain at least one digital image from the digital photo frame and edit or modify the obtained digital image to form at least one edited digital, or obtain the at least one digital image from the digital photo frame and edit and modify the obtained digital image to form the at least one edited digital; providing the at least one edited digital image to the digital photo frame. To achieve the objective, the present invention discloses a method of enabling digital photo frame with voice interaction and intelligence generation, and the method comprises steps of: obtaining an interaction voice through a microphone device which is embedded in the digital photo frame or externally connected to the digital photo frame, and providing the interaction voice to an image edition and modification server, by a digital photo frame; using a speech-to-text technology to convert the interaction voice of voice modal into a text message of text modal, by the image edition a