US-20260128022-A1 - MUSIC DATA PROCESSING

US20260128022A1US 20260128022 A1US20260128022 A1US 20260128022A1US-20260128022-A1

Abstract

There are provided methods, devices, and computer program products for processing music data. In a method, the music data is divided into a plurality of segments according to a predetermined length. A plurality of control tokens are determined for the plurality of segments based on control information associated with the plurality of segments, respectively. A plurality of sound tokens are determined for the plurality of segments based on sound information associated with the plurality of segments, respectively. A feature for the music data is obtained based on the plurality of control tokens and the plurality of sound tokens.

Inventors

Haonan CHEN
Jordan BL Smith
Janne Jayne Harm Renée SPIJKERVET
Ju-Chiang Wang
Pei Zou
Bochen Li
Qiuqiang Kong
Xingjian DU

Assignees

BEIJING ZITIAO NETWORK TECHNOLOGY CO., LTD.
LEMON INC.

Dates

Publication Date: 20260507
Application Date: 20241104

Claims (20)

1 . A method for processing music data, comprising: dividing the music data into a plurality of segments according to a predetermined length; determining a plurality of control tokens for the plurality of segments based on control information associated with the plurality of segments, respectively; determining a plurality of sound tokens for the plurality of segments based on sound information associated with the plurality of segments, respectively; and obtaining a feature for the music data based on the plurality of control tokens and the plurality of sound tokens.
2 . The method of claim 1 , wherein obtaining the feature based on the plurality of control tokens and the plurality of sound tokens comprises: determining a control token sequence based on the plurality of control tokens, the control token sequence having a control sequence end; determining a sound token sequence based on the plurality of sound tokens, the sound token sequence having a sound sequence end; and determining the feature based on the control token sequence and the sound token sequence.
3 . The method of claim 1 , wherein determining the plurality of control tokens based on the control information associated with the plurality of segments comprises: with respect to a segment in the plurality of segments, extracting a control item from the segment, the control item comprising at least any of: a genre, a section, a speed, a chord, and a track of the music data; and determining a control token for the segment based on the control item.
4 . The method of claim 3 , wherein the music data comprises at least one track, and determining the control token comprises: with respect to a track in the at least one track within the segment, generating a track part for the track; and inserting the track part into the control token for the segment.
5 . The method of claim 4 , wherein determining the plurality of sound tokens based on the sound information associated with the plurality of segments comprises: determining a sound token for the segment by updating the control token for the segment with the sound information associated with the segment.
6 . The method of claim 5 , wherein determining the sound token for the segment comprises: extracting a sound item from the sound information associated with the segment, the sound item comprising at least any of: a position, a duration, and a pitch of a musical note in the segment; and determining the sound token for the segment by updating the track part in the control token for the segment with the sound item.
7 . The method of claim 1 , further comprising: in response to receiving a plurality of reference music data, determining a plurality of reference features; combining the plurality of reference features into a reference feature sequence; obtaining a training sample from the reference feature sequence according to a predetermined window size; and training a music generating model based on the training sample, the music generating model representing an association relationship between at least one reference previous token and a reference subsequent token that follows the at least one reference previous token.
8 . The method of claim 7 , further comprising: determining a first probability of a subsequent token according to the music generating model based on at least one previous token; determining a sub-space in a token space of the subsequent token according to a finite state machine based on the at least one previous token; and determining the subsequent token based on the first probability of the subsequent token and the sub-space.
9 . The method of claim 8 , wherein determining the subsequent token comprises: determining a second probability associated with the first probability and the sub-space; and determining the subsequent token based on the second probability.
10 . The method of claim 8 , further comprising any of: in response to a determination that the subsequent token is an end token, generating target music data based on the at least one previous token; or in response to a determination that the subsequent token is not an end token, appending the subsequent token to an end of the at least one previous token.
11 . An electronic device, comprising a computer processor coupled to a computer-readable memory unit, the memory unit comprising instructions that when executed by the computer processor implements a method for processing music data, the method comprises: dividing the music data into a plurality of segments according to a predetermined length; determining a plurality of control tokens for the plurality of segments based on control information associated with the plurality of segments, respectively; determining a plurality of sound tokens for the plurality of segments based on sound information associated with the plurality of segments, respectively; and obtaining a feature for the music data based on the plurality of control tokens and the plurality of sound tokens.
12 . The electronic device of claim 11 , wherein obtaining the feature based on the plurality of control tokens and the plurality of sound tokens comprises: determining a control token sequence based on the plurality of control tokens, the control token sequence having a control sequence end; determining a sound token sequence based on the plurality of sound tokens, the sound token sequence having a sound sequence end; and determining the feature based on the control token sequence and the sound token sequence.
13 . The electronic device of claim 11 , wherein determining the plurality of control tokens based on the control information associated with the plurality of segments comprises: with respect to a segment in the plurality of segments, extracting a control item from the segment, the control item comprising at least any of: a genre, a section, a speed, a chord, and a track of the music data; and determining a control token for the segment based on the control item.
14 . The electronic device of claim 13 , wherein the music data comprises at least one track, and determining the control token comprises: with respect to a track in the at least one track within the segment, generating a track part for the track; and inserting the track part into the control token for the segment.
15 . The electronic device of claim 14 , wherein determining the plurality of sound tokens based on the sound information associated with the plurality of segments comprises: determining a sound token for the segment by updating the control token for the segment with the sound information associated with the segment.
16 . The electronic device of claim 15 , wherein determining the sound token for the segment comprises: extracting a sound item from the sound information associated with the segment, the sound item comprising at least any of: a position, a duration, and a pitch of a musical note in the segment; and determining the sound token for the segment by updating the track part in the control token for the segment with the sound item.
17 . The electronic device of claim 11 , the method further comprising: in response to receiving a plurality of reference music data, determining a plurality of reference features; combining the plurality of reference features into a reference feature sequence; obtaining a training sample from the reference feature sequence according to a predetermined window size; and training a music generating model based on the training sample, the music generating model representing an association relationship between at least one reference previous token and a reference subsequent token that follows the at least one reference previous token.
18 . The electronic device of claim 17 , the method further comprising: determining a first probability of a subsequent token according to the music generating model based on at least one previous token; determining a sub-space in a token space of the subsequent token according to a finite state machine based on the at least one previous token; and determining the subsequent token based on the first probability of the subsequent token and the sub-space.
19 . The electronic device of claim 18 , wherein further comprising any of: in response to a determination that the subsequent token is an end token, generating target music data based on the at least one previous token; or in response to a determination that the subsequent token is not an end token, appending the subsequent token to an end of the at least one previous token.
20 . A non-transitory computer program product, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by an electronic device to cause the electronic device to perform a method for processing music data, the method comprises: dividing the music data into a plurality of segments according to a predetermined length; determining a plurality of control tokens for the plurality of segments based on control information associated with the plurality of segments, respectively; determining a plurality of sound tokens for the plurality of segments based on sound information associated with the plurality of segments, respectively; and obtaining a feature for the music data based on the plurality of control tokens and the plurality of sound tokens.

Description

FIELD The present disclosure generally relates to machine learning, and more specifically, to methods, devices and computer program products for processing music data. BACKGROUND In the current technology of generating multi track music score, music score is usually converted into a token sequence first, then a model (usually based on a transformer) may be used to model the token sequence. Multi track music has correlations between the time dimension and different instrument track dimensions, but the token sequence is one-dimensional. How to design the encoding method of the token sequence to facilitate a model to learn this two-dimensional correlation is an issue. Furthermore, because the music score may be directly edited by composers, how to enable composers to control the generation of music score through some control signals is another issue. SUMMARY In a first aspect of the present disclosure, there is provided a method for processing music data. In the method, the music data is divided into a plurality of segments according to a predetermined length. A plurality of control tokens are determined for the plurality of segments based on control information associated with the plurality of segments, respectively. A plurality of sound tokens are determined for the plurality of segments based on sound information associated with the plurality of segments, respectively. A feature for the music data is obtained based on the plurality of control tokens and the plurality of sound tokens. In a second aspect of the present disclosure, there is provided an electronic device. The electronic device comprises: a computer processor coupled to a computer-readable memory unit, the memory unit comprising instructions that when executed by the computer processor implements a method according to the first aspect of the present disclosure. In a third aspect of the present disclosure, there is provided a computer program product, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by an electronic device to cause the electronic device to perform a method according to the first aspect of the present disclosure. This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS Through the more detailed description of some implementations of the present disclosure in the accompanying drawings, the above and other objects, features and advantages of the present disclosure will become more apparent, wherein the same reference generally refers to the same components in the implementations of the present disclosure. FIG. 1 illustrates a schematic diagram of music data being encoded based on a related work; FIG. 2 illustrates an example diagram of processing music data according to implementations of the present disclosure; FIG. 3 illustrates a schematic diagram of combining control tokens and sound tokens according to implementations of the present disclosure; FIG. 4 illustrates a schematic diagram of determining sound tokens according to implementations of the present disclosure; FIG. 5 illustrates a schematic diagram of training a music generating model according to implementations of the present disclosure; FIG. 6 illustrates a schematic diagram of determining a subsequent token according to implementations of the present disclosure; FIG. 7 illustrates an example flowchart of a method for processing music data according to implementations of the present disclosure; and FIG. 8 illustrates a block diagram of a computing device in which various implementations of the present disclosure can be implemented. DETAILED DESCRIPTION Principle of the present disclosure will now be described with reference to some implementations. It is to be understood that these implementations are described only for the purpose of illustration and help those skilled in the art to understand and implement the present disclosure, without suggesting any limitation as to the scope of the disclosure. The disclosure described herein can be implemented in various manners other than the ones described below. In the following description and claims, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skills in the art to which this disclosure belongs. References in the present disclosure to “one implementation,” “an implementation,” “an example implementation,” and the like indicate that the implementation described may include a particular feature, structure, or characteristic, but it is not necessary that every implementation includes the particular feature,