KR-20260062914-A - Personalized Singing Synthesis System Based on Personal Voice Assetization Through AI Training and Content Generation Method Including Copyright Protection
Abstract
The present invention relates to an AI-based user-customized singing synthesis system and method that learns general voice data of a user to generate digital voice assets, synthesizes music content into a natural singing form based thereon, and provides a function for protecting ownership rights to the generated content.
Inventors
- 이광수
Assignees
- 이광수
Dates
- Publication Date
- 20260507
- Application Date
- 20260419
Claims (5)
- Voice Assetization Unit that receives non-singing voice data from users and generates digital voice assets: A singing conversion unit that converts the above digital voice assets into a singing-capable form: Audio separation unit for securing accompaniment data of a target sound source: Synthesizing unit that generates singing data using the above digital voice assets: Synchronization unit that aligns generated vocal data and accompaniment data to generate final content: Protection of rights for inserting identifying information into and generated content: A user-customized singing synthesis system including
- A system according to claim 1, characterized in that the step of synthesizing the singing data includes the step of analyzing the test or melody mood of the target music to inject emotional parameters and layering the microstyle of the original song.
- A system according to claim 1, wherein the synchronization unit is characterized by an adaptive phase alignment algorithm that corrects the time difference between the vocal and the accompaniment using cross-correlation analysis.
- A system according to claim 1, wherein the rights protection unit inserts an inaudible watermark into the generated content.
- A system according to claim 1, characterized in that the generated content is linked with an avatar within a metaverse platform.
Description
Personalized Singing Synthesis System Based on Personal Voice Assetization Through AI Training and Content Generation Method Including Copyright Protection Personalized Singing Synthesis System Based on Personal Voice Assetization Through AI Training and Content Generation Method Including Copyright Protection The present invention relates to the field of artificial intelligence-based voice processing technology {voice conversion and singing synthesis}, and more specifically, to a user-customized singing synthesis system and method that learns a user's general voice data to generate a unique digital voice asset, synthesizes music content into a singing form using this asset, and provides a function for protecting rights to the generated content. In particular, the invention aims to provide an integrated system that generates natural singing content by utilizing not only the user's unique timbre but also archive data of the deceased, and manages this content securely, representing a new field of music content based on artificial intelligence. With the recent advancement of artificial intelligence, technologies for text-to-speech, voice conversion, and singing voice synthesis are rapidly developing. However, conventional technologies are primarily based on training data from famous singers, voice actors, and characters, which limits the ability of general users to utilize their own voices as digital assets. Furthermore, issues exist regarding the ownership of generated content, unauthorized copying, voice theft, and rights disputes arising from the restoration of the deceased's voice. In particular, general users face the challenge of generating natural-sounding singing content using only their everyday conversational voices, as they lack separate song recording data. The present invention aims to provide a system that enables general users to collect their unique voices, train them with artificial intelligence to build them into digital assets, and generate high-quality, personalized singing content by precisely synchronizing them with various music tailored to their tastes. In particular, the invention is characterized by going beyond simple voice conversion to derive and generate commercial-grade results through emotion transfer and physical synchronization technologies, managing these results as "personal digital voice assets," clearly defining ownership of the generated content, and establishing a protection system. [Representative Diagram] Entire System Process (S100 ~ S500) ● S100 (Input): User/deceased voice data and target song input ● S200 (Separation and Learning): Vocal/Accompaniment Separation and Timbre/Emotion Feature Extraction ● S300 (Synthetic): Application of emotion transference and dynamic formant correction algorithms ● S400 (Synchronization): Precision mixing based on synchronization and spectrum ducking ● S500 (Output): Watermark insertion, final vocal content generation, and output [Figure 1] Precision Synchronization Algorithm Flowchart ● Step 1: Cross-correlation analysis of synthesized vocals and accompaniment ● Step 2: Phase difference threshold check (verify within 1ms) ● Step 3: Correction of fine waveform shift when time axis error occurs ● Step 4: Hand over to the mixing section after final synchronization is complete 1. Collect user's voice data through a mobile device or server. 2. Noise removal and feature extraction are performed on the collected voice data. 3. An artificial intelligence model learns the user's voice characteristics to generate personal voice assets. 4. Analyze the melody, beat, and lyrics information of the target song. 5. The synthesis engine generates a song with the user's voice. 6. Insert a watermark into the final result and provide it to the user. 1. Personalized music production platform. 2. AI Singer and Virtual Idol Industry. 3. Metaverse Avatar Voice Service. 4. Memorial content and voice restoration service. 5. Copyright-protected music distribution platform.