KR-20260067156-A - SKELETON CUTMIX AUGMENTATION METHOD FOR SKELETON-BASED MEDICAL DATA AND THE APPARATUS THEREOF
Abstract
The present invention relates to a CutMix-based data augmentation method and apparatus for skeleton-based medical data, comprising the steps of: selecting a first skeleton sample dataset and a second skeleton sample dataset within a mini-batch; converting the first skeleton sample dataset and the second skeleton sample dataset from the time domain to the frequency domain to obtain the first skeleton data and the second skeleton data, and separating them into low-frequency components and high-frequency components; and combining the low-frequency components of the first skeleton data and the high-frequency components of the second skeleton data to generate new skeleton data.
Inventors
- 이민식
- 오재석
Assignees
- 한양대학교 에리카산학협력단
Dates
- Publication Date
- 20260512
- Application Date
- 20241105
Claims (14)
- A CutMix-based data augmentation method for skeleton-based medical data comprising at least one processor, A step of selecting a first skeleton sample dataset and a second skeleton sample dataset within a mini-batch; A step of converting the first skeleton sample dataset and the second sample skeleton full dataset from the time domain to the frequency domain to obtain the first skeleton data and the second skeleton data, and separating them into low-frequency components and high-frequency components; and A step of generating new skeleton data by combining the low-frequency component of the first skeleton data and the high-frequency component of the second skeleton data. A data augmentation method including
- In paragraph 1, The above-mentioned selection step A data augmentation method comprising selecting any first skeleton sample dataset ( x1 , y1 ) within the above mini-batch and selecting any second skeleton sample dataset ( x2 , y2 ) within the same mini-batch as the first skeleton sample dataset.
- In paragraph 1, The above separation step A step of converting the first skeleton sample dataset and the second skeleton sample dataset from the time dimension (T) to the frequency domain to obtain the first skeleton data ( X1 ) and the second skeleton data ( X2 ), and obtaining the low-frequency component and the high-frequency component of each of the first skeleton data and the second skeleton data; and A step of maintaining the low-frequency component of the first skeleton data and replacing the high-frequency component of the second skeleton data. A data augmentation method including
- In paragraph 3, The step of obtaining the above A data augmentation method for converting the x1 variable value of the first skeleton sample dataset and the x2 variable value of the second skeleton sample dataset into the first skeleton data and the second skeleton data in the frequency domain by applying a Fourier transform to the time dimension (T).
- In paragraph 4, The step of obtaining the above A data enhancement method for separating low-frequency components and high-frequency components in the first skeleton data and the second skeleton data, respectively, using a low-pass filter (LPF) and a high-pass filter (HPF) in the frequency domain after applying a Fourier transform to the time dimension (T).
- In paragraph 5, The above LPF and the above HPF are A value having a magnitude equal to the product of the length of the time dimension (T) of the frequency domain data, the mixing ratio (lambda), and the 1-mixing ratio, respectively. A data augmentation method characterized by
- In paragraph 3, The above-mentioned replacement step Maintaining the low-frequency component of the first skeleton data as is, and replacing the high-frequency component of the first skeleton data with the high-frequency component of the second skeleton data. A data augmentation method characterized by
- In Paragraph 7, The step of generating the above Generating the new skeleton data by combining the low-frequency component maintained in the first skeleton data and the high-frequency component replaced by the second skeleton data, with a length equal to the time dimension (T) of the first skeleton data. A data augmentation method characterized by
- In paragraph 8, The step of generating the above A data augmentation method that generates new skeleton data in the time domain by applying an inverse Fourier transform to the time domain and converting back to the time domain by applying the inverse Fourier transform to the time dimension (T).
- In Paragraph 9, The step of generating the above Generating a minibatch consisting of new skeleton data in the above time domain A data augmentation method characterized by
- In a CutMix-based data augmentation device for skeleton-based medical data, A selection unit for selecting a first skeleton sample dataset and a second skeleton sample dataset within a mini-batch; A processing unit that converts the first skeleton sample dataset and the second sample skeleton full dataset from the time domain to the frequency domain to obtain the first skeleton data and the second skeleton data, and separates them into low-frequency and high-frequency components; and A generating unit that generates new skeleton data by combining the low-frequency component of the first skeleton data and the high-frequency component of the second skeleton data. A data augmentation device including
- In Paragraph 11, The above selection unit A data augmentation device that selects any first skeleton sample dataset within the above mini-batch and selects any second skeleton sample dataset within the same mini-batch as the first skeleton sample dataset.
- In Paragraph 11, The above processing unit The first skeleton sample dataset and the second skeleton sample dataset are converted from the time dimension (T) to the frequency domain to obtain the first skeleton data and the second skeleton data, and the low-frequency component and the high-frequency component of the first skeleton data and the second skeleton data, respectively are obtained, the low-frequency component of the first skeleton data is retained, and the high-frequency component of the second skeleton data is replaced. A data augmentation device characterized by
- In Paragraph 13, The above generating part Generating the new skeleton data by combining the low-frequency component maintained in the first skeleton data and the high-frequency component replaced by the second skeleton data, with a length equal to the time dimension (T) of the first skeleton data. A data augmentation device characterized by
Description
CutMix-based Data Augmentation Method and Apparatus for Skeleton-based Medical Data The present invention relates to a CutMix-based data augmentation method and apparatus for skeleton-based medical data, and more specifically, to a technology for generating new skeleton data by separating skeleton data into low-frequency and high-frequency components, maintaining the low-frequency components corresponding to large movements that have a significant influence on disease diagnosis, while replacing the high-frequency components, which are fine movements that have a relatively small influence on disease diagnosis, with the high-frequency components of another person. Recently, the data primarily used in Human Action Recognition (HAR) is broadly classified into RGB+D based data and skeleton-based data. In the case of RGB+D based data, videos captured by RGB and depth cameras are used to represent human movement; since the data dimensionality is equal to the size of the video frames, it possesses a very large dimensionality, resulting in information loss when downsampled. However, skeleton-based data can represent human movement with a smaller dimensionality by using only the connectivity relationships and coordinates between joints, and it is not affected by external factors such as lighting. Due to these differences, skeleton-based data demonstrates higher efficiency in terms of artificial neural network size and computational cost, as well as higher performance, compared to RGB+D based data. However, in the medical field, determining the presence of disease in patients using artificial neural networks trained on skeleton data requires a very large amount of skeleton data from both patients and the general public; consequently, data collection faces difficulties due to issues of time and cost. To address these problems, the first conventional technique (Pengfei Zhang, Cuiling Lan, Wenjun Zeng, Junliang Xing, Jianru Xue, Nanning Zheng, “Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1112-1121)) augments skeleton data by directly manipulating the values of skeleton coordinates, such as 2D image rotation or scaling. However, this has the potential to distort the nature of skeleton data, such as the connectivity between joints and the range of rotation of the joints. The second conventional technique, CutMix (Sangdoo Yun, Dongyoon Han, SeongJoon Oh, Sanghyuk Chun, Junsuk Choe, Youngjoon Yoo, “CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features,” In The IEEE International Conference on Computer Vision (ICCV), 2019, pp.6023-6032), experimentally demonstrated that the performance of artificial neural networks is improved with low computational overhead by replacing a portion of one 2D image within an input mini-batch with an identical portion of another 2D image within the same mini-batch. However, this is limited to use only with 2D images. The present invention proposes a data augmentation technique to improve research on diagnosing age-related neurological diseases by applying HAR based on physical movements. Figure 1 illustrates a flowchart of the operation of a CutMix-based data augmentation method for skeleton-based medical data according to an embodiment of the present invention. FIG. 2 illustrates a detailed operation flowchart of step S120 according to an embodiment of the present invention. Figure 3 is illustrated to explain the CutMix (SF-CutMix) in the frequency domain according to an embodiment of the present invention. Figure 4 illustrates the KSSP dataset inference performance results according to an embodiment of the present invention. FIG. 5 is a block diagram illustrating the detailed configuration of a CutMix-based data augmentation device for skeleton-based medical data according to an embodiment of the present invention. The advantages and features of the present invention and the methods for achieving them will become clear by referring to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below but may be implemented in various different forms. These embodiments are provided merely to ensure that the disclosure of the present invention is complete and to fully inform those skilled in the art of the scope of the invention, and the present invention is defined only by the scope of the claims. The terms used herein are for describing the embodiments and are not intended to limit the invention. In this specification, the singular form includes the plural form unless specifically stated otherwise in the text. As used herein, "comprises" and/or "comprising" do not exclude the presence or addition of one or more other components, steps, actions, and/or elements to the mentioned components, steps, actions, and/or elements. Unl