CN-122025103-A - Feature selection method and device for early screening of infant food allergy
Abstract
The invention discloses a feature selection method and a device for early screening of infant food allergy, which relate to the technical field of medical data processing, and the invention realizes dynamic balance search of global pathogenic path exploration and local feature association mining by introducing a nonlinear parameter control mechanism to simulate nonlinear distribution of complex clinical risk factors, dynamically adjusts the switching probability between global search and local search by utilizing a self-adaptive diversity driving mechanism, dynamically balances nonlinear distribution rules of complex clinical risk factors during the global search and the local search, and dynamically adjusts search metrics of the global search and the local search to prevent the local optimal solution from being trapped in the search; and simultaneously, a hidden inspiring resuscitation mechanism is utilized to revive critical risk characteristic factors with weak signals and search repeatedly so as to protect the critical weak signals with biological significance on the basis of dynamic balance of local and global search, thereby identifying core potential risk characteristic factors.
Inventors
- ZHANG WEIXI
- WANG LEI
- CHEN HUILING
- WANG WEIWEI
- QIU PAN
- JIA XIAOXIAO
- CHEN YI
- HU HAO
Assignees
- 温州医科大学附属第二医院(温州医科大学附属育英儿童医院)
Dates
- Publication Date
- 20260512
- Application Date
- 20260131
Claims (8)
- 1. A feature selection method for early screening of infant food allergy, comprising the steps of: Obtaining a feature dataset associated with food allergies for an infant patient, the feature dataset comprising infant parent clinical features and infant clinical features; Generating an initial solution set based on the characteristic data set, determining a local search direction of clinical characteristics of a baby parent and clinical characteristics of the baby associated with the food allergy of the baby by using a nonlinear parameter control mechanism, and guiding individuals in the initial solution set to search in a characteristic space based on the local search direction and the global search direction in a nonlinear balance manner, dynamically adjusting a switching probability between the global search and the local search by using an adaptive diversity driving mechanism, and resuscitating inferior individuals associated with the food allergy of the baby when the global search and the local search are performed by using a implicit heuristic resuscitating mechanism, and repeating the search to obtain a plurality of candidate solutions; And mapping the position vector of each candidate solution into a binary decision vector by using a Sigmoid conversion function, acquiring the fitness value of each solution based on the binary decision vector, taking the minimized fitness value of each solution as a target, and repeating searching until the maximum iteration number is reached, thereby obtaining the optimal feature subset corresponding to the optimal solution.
- 2. A feature selection method for early screening of infant food allergies according to claim 1, characterized in that the initial solution set is expressed as: ; Wherein: Represent the first Scheme of feature subset in the first Feature values of dimensions; And Respectively represent the first Maximum and minimum boundaries in the individual dimensions; for generating random numbers between 0 and 1.
- 3. The feature selection method for early detection of food allergy in infants according to claim 2, wherein the obtaining of the plurality of candidate solutions comprises: The nonlinear control parameter a based on an exponential function is adopted to balance global pathogenic factor search and local relevance development, various potential risk sources are widely covered at the initial stage of iteration, and a high-probability pathogenic area is locked at the later stage for fine screening, which is expressed as: ; ; Wherein: And Respectively representing the maximum value and the minimum value of the parameter a; Representing a nonlinear modulation index; by monitoring the diversity of feature combinations and optimal solution stagnation count in real time, two different disturbance strategies are dynamically triggered, and the probability is controlled Dynamically adjusting the homogenization degree of the population, wherein the homogenization degree is expressed as follows: ; ; Wherein: Obeying a standard normal distribution; representing a binary random mask; For recording the number of times that the global optimum is not improved; representing the average euclidean distance between all individuals; When the current characteristic combination scheme falls into stagnation, the historical high-potential solution is traced back from the archive to replace the current inferior individual, and the recovery and the re-search of the pathogenic factors are carried out, so that a plurality of candidate solutions are finally generated.
- 4. A feature selection method for early screening of infant food allergies according to claim 3, characterized in that the acquisition of the optimal feature subset comprises: Mapping the location vector of each candidate solution to a binary decision vector using a Sigmoid transfer function, 1 representing the selection of the clinical feature, 0 representing the non-selection of the feature, the calculation being represented as: ; ; constructing an fitness function balancing low misdiagnosis rate and few detection indexes for evaluating clinical application value of the feature subset, wherein the fitness function is expressed as follows: ; Wherein: representing a conversion function value; Represent the first Individuals of individual populations A dimension value; representing a classification error rate; Representing individual fitness values; Representing the weight coefficient; Representing the number of core risk features selected for the subset; representing the total feature number; And taking the fitness function as an optimization target of the improved gray wolf algorithm, and finally finding out a binary vector which minimizes the fitness function value through iterative search of the algorithm, wherein the vector is the screened optimal feature subset.
- 5. The method of claim 1, wherein the characteristic data set comprises infant mother gestational diabetes or abnormal glucose tolerance, infant mother's alcohol intake frequency, history of azithromycin usage within 0-6 months after birth, infant mother's pre-pregnancy BMI index, and infant father pre-pregnancy BMI index; The infant mother gestational diabetes index is used for evaluating the oxidation stress level of the fetal immune cell differentiation environment, the infant azithromycin use history is used for evaluating the early intestinal flora field planting and the damage risk of immune tolerance establishment, and the infant father pre-gestational BMI index is used for evaluating the immune susceptibility risk transferred through sperm epigenetic reprogramming.
- 6. A feature selection apparatus for early screening of infant food allergies, comprising: the data acquisition module is used for acquiring a characteristic data set of the infant patient associated with food allergy, wherein the characteristic data set comprises clinical characteristics of an infant parent and clinical characteristics of the infant; The feature selection module is used for generating an initial solution set based on the feature data set, determining a local search direction of clinical features of a baby parent and clinical features of the baby associated with the baby food allergy by using a nonlinear parameter control mechanism, guiding individuals in the initial solution set to search in a feature space based on the local search direction and the global search direction in a nonlinear balance mode, dynamically adjusting the switching probability between the global search and the local search by using an adaptive diversity driving mechanism, and recovering inferior individuals associated with the baby food allergy during the global search and the local search by using a implicit heuristic recovery mechanism, and repeating the search to obtain a plurality of candidate solutions; And mapping the position vector of each candidate solution into a binary decision vector by using a Sigmoid conversion function, acquiring the fitness value of each solution based on the binary decision vector, taking the minimized fitness value of each solution as a target, and repeating searching until the maximum iteration number is reached, thereby obtaining the optimal feature subset corresponding to the optimal solution.
- 7. An electronic device is characterized by comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to implement the steps of a feature selection method for early detection of infant food allergy according to any one of claims 1 to 5 when executing the computer program stored in the memory.
- 8. A computer readable storage medium for storing a computer program which when executed by a processor carries out the steps of a feature selection method for early detection of infant food allergy according to any one of claims 1 to 5.
Description
Feature selection method and device for early screening of infant food allergy Technical Field The invention relates to the technical field of medical data processing, in particular to a feature selection method, device, equipment and medium for early screening of infant food allergy. Background The global incidence of food allergy in infants has increased dramatically in recent years to become an increasingly serious public health problem, however, current clinical practice has focused mainly on diagnosis and treatment after symptoms appear, lacking effective primary prevention strategies, early life stages are critical window periods of immune system development and maturation during which multiple exposures (such as parental health, pregnancy status, environmental impact, infant characteristics, etc.) may co-model individual immune tracks, determining their susceptibility to allergic diseases, and thus identifying these critical early risk factors is critical to formulating prevention strategies, whereas data describing early life exposures is inherently high-dimensional, heterogeneous and complex, including a large number of variables and complex nonlinear relationships, traditional statistical methods face great challenges in processing such data, from which it is difficult to discern truly predictive core risk factors. Machine learning methods provide a new solution to this problem. Feature selection is used as a key link of machine learning, aims to remove irrelevant or redundant variables from high-dimensional data, improves the interpretation and performance of a model, helps understand a disease mechanism, is favored by high classification precision in various methods, but is limited in application due to high calculation cost, and is widely used for optimizing a wrapped feature selection process due to strong global searching capability in order to balance precision and efficiency. The gray wolf optimization algorithm is widely focused due to the fact that the concept is simple, the super-parameters are few, and the searching mechanism is balanced, but the gray wolf optimization algorithm and variants thereof generally adopt linear control parameters, are difficult to adapt to nonlinear distribution rules with complex clinical risk factors, and in the later stage of iteration, the population looks like to be optimized rapidly, and is trapped in suboptimal combinations formed by non-specific or redundant features to sink into a locally optimal solution, so that weak signal features truly having biological significance are eliminated in early iteration, and finally core potential risk feature factors with weak signals such as BMI (body mass index), early trace antibiotic exposure and the like of a father are difficult to identify and have important biological significance are finally caused. Disclosure of Invention The embodiment of the invention provides a feature selection method and a feature selection device for early screening of infant food allergy, which can solve the problems in the prior art. The embodiment of the invention provides a feature selection method for early screening of infant food allergy, which comprises the following steps of: Obtaining a feature dataset associated with food allergies for an infant patient, the feature dataset comprising infant parent clinical features and infant clinical features; Generating an initial solution set based on the characteristic data set, determining a local search direction of clinical characteristics of a baby parent and clinical characteristics of the baby associated with the food allergy of the baby by using a nonlinear parameter control mechanism, and guiding individuals in the initial solution set to search in a characteristic space based on the local search direction and the global search direction in a nonlinear balance manner, dynamically adjusting a switching probability between the global search and the local search by using an adaptive diversity driving mechanism, and resuscitating inferior individuals associated with the food allergy of the baby when the global search and the local search are performed by using a implicit heuristic resuscitating mechanism, and repeating the search to obtain a plurality of candidate solutions; And mapping the position vector of each candidate solution into a binary decision vector by using a Sigmoid conversion function, acquiring the fitness value of each solution based on the binary decision vector, taking the minimized fitness value of each solution as a target, and repeating searching until the maximum iteration number is reached, thereby obtaining the optimal feature subset corresponding to the optimal solution. Preferably, the initial solution set is expressed as: ; Wherein: Represent the first Scheme of feature subset in the firstFeature values of dimensions; And Respectively represent the firstMaximum and minimum boundaries in the individual dimensions; for generating random numbers between