CN-121983084-A - Folk music style recognition method, system and storage medium

CN121983084ACN 121983084 ACN121983084 ACN 121983084ACN-121983084-A

Abstract

The invention discloses a method, a system and a storage medium for identifying folk music styles, which are used for identifying folk music styles by analyzing audio signals; calculating the short-time energy envelope change rate to obtain a transient intensity factor, weighting a periodic differential function by the factor, carrying out cumulative mean normalization processing on the modified differential function, calculating harmonic structure stability and screening fundamental frequency candidate points, calculating global tonality probability distribution based on all candidate points, constructing a state transition matrix considering frequency difference and tonality harmony degree, decoding an optimal fundamental frequency sequence, dividing the sequence into a playing technique segment and a melody stable segment according to the transient intensity factor, extracting a feature subset to form feature vectors, inputting a classification model and outputting a style recognition result.

Inventors

LUO CHAOPENG
ZHU JUNMING
SHI LUZHEN
ZHAO XIAOJIA
ZHANG MENGJIE

Assignees

安阳师范学院
河南手掌文化传播有限公司

Dates

Publication Date: 20260505
Application Date: 20251231

Claims (10)

1. A method for identifying a style of folk music, comprising the steps of: The method comprises the steps of obtaining folk music audio signals, calculating transient intensity factors of all analysis windows based on short-time energy envelope change rates of the signals, and weighting a periodic difference function by using the transient intensity factors to obtain a corrected difference function; Screening out a fundamental frequency candidate point set for each analysis window according to the harmonic structure stability, wherein the harmonic structure stability is inversely related to the scale of the fundamental frequency candidate point set; The method comprises the steps of calculating global tonality probability distribution based on a fundamental frequency candidate point set of all analysis windows, constructing a state transition matrix, and decoding an optimal fundamental frequency sequence by utilizing the state transition matrix, wherein the state transition cost of the matrix is determined by frequency difference values between adjacent fundamental frequencies and the degree of cooperation between the current fundamental frequency and the global tonality probability distribution; Dividing the optimal fundamental frequency sequence into a playing technique segment and a melody stabilizing segment according to transient intensity factors corresponding to fundamental frequencies in the optimal fundamental frequency sequence, respectively extracting feature subsets from the two segments and connecting the feature subsets to form feature vectors, inputting the feature vectors into a preset classification model, and outputting a style recognition result.
2. The method of claim 1, wherein said calculating transient intensity factors for each analysis window based on a short-time energy envelope rate of change of the signal comprises: The method comprises the steps of framing an audio signal, calculating short-time energy of each frame, performing first-order difference on short-time energy sequences of adjacent frames, performing half-wave rectification on difference results, and normalizing the half-wave rectified results to a [0,1] interval through a maximum value to obtain transient intensity factors of all analysis windows.
3. The method of claim 1, wherein weighting the periodic difference function with the transient strength factor results in a modified difference function, comprising: setting transient intensity factor of current analysis window as TSF (t), original periodic difference function as d (t), defining weighting coefficient w (t) =1-TSF (t), multiplying original periodic difference function d (t) and correspondent weighting coefficient w (t) according to element to obtain modified difference function 。
4. The method of claim 1, wherein said calculating harmonic structural stability for each analysis window comprises: And calculating the variation coefficient of the 10 harmonic peaks, and taking the reciprocal of the variation coefficient as the representing value of the harmonic structure stability of the analysis window.
5. The method of claim 1, wherein said screening out a set of fundamental frequency candidate points for each analysis window based on the harmonic structural stability comprises: If the harmonic structure stability of the current analysis window is larger than the harmonic structure stability threshold, selecting a global maximum point in the modified difference function as a unique fundamental frequency candidate point, and if the harmonic structure stability of the current analysis window is smaller than or equal to the harmonic structure stability threshold, selecting N peak points with highest amplitude in the modified difference function to jointly form a fundamental frequency candidate point set, wherein N is a positive integer.
6. The method of claim 1, wherein constructing the state transition matrix, the matrix state transition cost being determined by a frequency difference between adjacent fundamental frequencies and a degree of co-ordination of a current fundamental frequency and the global tonal probability distribution, comprises: let the fundamental frequency of the previous analysis window be The fundamental frequency of the current analysis window is Corresponding pitch grade of The global tonal probability distribution is P (k), then Transfer to State transition cost of (2) The calculation formula is as follows: k1 is a coefficient.
7. The method according to claim 1, wherein the dividing the sequence into a playing technique segment and a melody stabilization segment according to the transient strength factor corresponding to each fundamental frequency in the optimal sequence of fundamental frequencies comprises: traversing each frame in the optimal base frequency sequence, if the transient strength factor corresponding to the frame is greater than the transient strength factor threshold value And combining continuous marking frames of the same type with the length of more than 5 frames to obtain a playing skill segment and a melody stabilizing segment.
8. The method of claim 1, wherein extracting feature subsets from the two types of segments, respectively, and concatenating the feature subsets to form feature vectors, comprises: the method comprises the steps of calculating a mean value of fundamental frequency first order difference, fundamental frequency jitter and amplitude jitter from each playing technique segment, forming a first feature subset by 3 dimensions, calculating a one-dimensional statistical feature derived from a fundamental frequency histogram and a 27-dimensional second feature subset formed by a mean value and standard deviation of a first 13-dimensional Mel frequency cepstrum coefficient from each melody stable segment, and sequentially splicing the first feature subset and the second feature subset to form a 30-dimensional feature vector.
9. A folk music style recognition system, comprising the following modules: the weighting module is used for obtaining folk music audio signals, calculating transient intensity factors of all analysis windows based on the short-time energy envelope change rate of the signals, and weighting the periodic difference function by utilizing the transient intensity factors to obtain a corrected difference function; the screening module is used for carrying out cumulative mean normalization processing on the corrected difference function and calculating harmonic structure stability of each analysis window, and screening out a fundamental frequency candidate point set for each analysis window according to the harmonic structure stability, wherein the harmonic structure stability is inversely related to the scale of the fundamental frequency candidate point set; The decoding module is used for calculating global tonality probability distribution based on the fundamental frequency candidate point set of all analysis windows, constructing a state transition matrix, wherein the state transition cost of the matrix is determined by the frequency difference value between adjacent fundamental frequencies and the degree of cooperation between the current fundamental frequency and the global tonality probability distribution; The output module is used for dividing the optimal fundamental frequency sequence into a playing technique segment and a melody stabilizing segment according to transient intensity factors corresponding to fundamental frequencies in the optimal fundamental frequency sequence, extracting feature subsets from the two segments respectively and connecting the feature subsets to form feature vectors, inputting the feature vectors into a preset classification model, and outputting a style recognition result.
10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the method according to any of claims 1-8.

Description

Folk music style recognition method, system and storage medium Technical Field The application belongs to the field of identification, and particularly relates to a method, a system and a storage medium for identifying folk music styles. Background The style of folk music is reflected in melody contours, rhythm patterns and rich playing skills. The baseband estimation algorithms, such as YIN and probabilistic refined versions pYIN, perform well in general music processing. However, the pYIN algorithm processes folk music, a playing technique frequently appearing in folk music can cause transient changes of signals, octave frequency hopping errors are easily caused in the base frequency estimation, and the pYIN algorithm ignores the global tonal structure of music when decoding the optimal base frequency path. In the prior art that feature extraction is performed based on fundamental frequency for style recognition, in the generation and screening stage of fundamental frequency candidate points, harmonic stable parts and non-harmonic or transient noise parts in signals cannot be distinguished based on threshold values or energy criteria, so that quality of an initial candidate set is uneven, and decoding difficulty and error rate are increased. In the feature extraction link, unified statistical analysis is prone to be carried out on the whole fundamental frequency sequence, and different style information carried by the melody stable segment and the rich playing technique segment cannot be distinguished and utilized. The stable melody segment can reflect the adjustment, and the skill segment reflects the expressive force and the regional characteristics of the player. The two are mixed to extract the characteristics, so that the recognition degree of the other party can be diluted, and the representation capability of the characteristic vector to the music style is weakened. Therefore, how to improve the fundamental frequency extraction process and plan the feature extraction strategy for the characteristics of folk music is a technical problem to be solved in the current field. Disclosure of Invention The invention provides a folk music style identification method, which is used for solving the problems that the prior art can not distinguish a harmonic stable part from a non-harmonic or transient noise part in a signal, and can not distinguish and utilize different style information carried by melody stable segments and rich playing technique segments, and comprises the following steps: The method comprises the steps of obtaining folk music audio signals, calculating transient intensity factors of all analysis windows based on short-time energy envelope change rates of the signals, and weighting a periodic difference function by using the transient intensity factors to obtain a corrected difference function; Screening out a fundamental frequency candidate point set for each analysis window according to the harmonic structure stability, wherein the harmonic structure stability is inversely related to the scale of the fundamental frequency candidate point set; The method comprises the steps of calculating global tonality probability distribution based on a fundamental frequency candidate point set of all analysis windows, constructing a state transition matrix, and decoding an optimal fundamental frequency sequence by utilizing the state transition matrix, wherein the state transition cost of the matrix is determined by frequency difference values between adjacent fundamental frequencies and the degree of cooperation between the current fundamental frequency and the global tonality probability distribution; Dividing the optimal fundamental frequency sequence into a playing technique segment and a melody stabilizing segment according to transient intensity factors corresponding to fundamental frequencies in the optimal fundamental frequency sequence, respectively extracting feature subsets from the two segments and connecting the feature subsets to form feature vectors, inputting the feature vectors into a preset classification model, and outputting a style recognition result. In addition, the invention also relates to a folk music style recognition system, which comprises the following modules: the weighting module is used for obtaining folk music audio signals, calculating transient intensity factors of all analysis windows based on the short-time energy envelope change rate of the signals, and weighting the periodic difference function by utilizing the transient intensity factors to obtain a corrected difference function; the screening module is used for carrying out cumulative mean normalization processing on the corrected difference function and calculating harmonic structure stability of each analysis window, and screening out a fundamental frequency candidate point set for each analysis window according to the harmonic structure stability, wherein the harmonic structure stability is inversely related t