CN-121982337-A - Method for establishing non-lambertian industrial material BRDF nerve representation model based on pre-training-fine-tuning framework
Abstract
In the method for establishing the BRDF neural representation model of the non-lambertian industrial material based on the pre-training-fine-tuning framework, in the process of describing the light reflection characteristics of the non-lambertian surface and realizing the three-dimensional surface morphology reconstruction and defect detection, the current network model does not have priori knowledge, hundreds of sample sequences and a large amount of GPU training time are required to obtain available results, and a standard and feasible processing mode is lacked. The prediction evaluation method of the invention is to build a core network model by taking gray level images, illumination direction information and reflection characteristic codes under the condition of multiple illumination as input feature vectors, obtain a pre-training prediction result according to the core network model, and then insert LoRA the core network model into a fine tuning model formed by a fine tuning module under the condition of corresponding materials or reflection. And a universal model is built through pre-training, a vertical model is formed based on the adaptation and fine adjustment of a specific application scene, and the evaluation process of the morphology reconstruction and defect detection of the surface to be tested is realized.
Inventors
- FAN KUO
- TAN JUN
- DING YONG
- SHEN YINGJIE
- LV DAGANG
- WANG JUANJUAN
- LI CHEN
- ZHANG CHAO
- WANG JIANFEI
Assignees
- 哈尔滨工业大学
- 中电投工程研究检测评定中心有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260126
Claims (10)
- 1. A method for establishing a non-lambertian industrial material BRDF nerve representation model based on a pre-training and fine-tuning framework is characterized in that a prediction evaluation method is characterized in that a core network model is established by taking gray images, illumination direction information and reflection characteristic codes under a multi-illumination condition as input feature vectors, a pre-training prediction result is obtained according to the core network model, a fine-tuning model formed by inserting LoRA fine-tuning modules into the core network model under a corresponding material or reflection condition is established through pre-training, a general model is established through pre-training, a vertical model is formed through fine-tuning based on specific application scene adaptation, and the evaluation process of morphology reconstruction and defect detection of a surface to be tested is completed.
- 2. The method for building the BRDF nerve representation model of the non-lambertian industrial material based on the pre-training and fine-tuning framework of claim 1, wherein the process of obtaining the input feature vector is divided into three parts before taking the gray level image, the illumination direction information and the reflection characteristic code under the condition of multiple illumination as the input feature vector: the first part takes a real three-dimensional geometric image as a first input characteristic vector directly; The second part is that the real three-dimensional geometric image is processed by a Mitsuba light rendering engine to form a multi-angle gray scale image, and the multi-angle gray scale image is used as a second input feature vector; And the third part is used for acquiring the BRDF key parameters, and taking the BRDF key parameters as a third input characteristic vector after BRDF encoding processing.
- 3. The method for building the non-lambertian industrial material BRDF neural representation model based on the pre-training and fine-tuning framework of claim 2, wherein the pre-training core framework is formed before the core network model is built, the pre-training core framework is formed by combining the multi-angle illumination graph and the BRDF prompt, forming ViTransformer model through the processing of a shared CNN encoder, the ViTransformer model is processed by the CNN encoder to form the pre-training core framework, the gray image, the illumination direction information and the reflection characteristic codes under the multi-illumination condition are used as input feature vectors to be combined with the pre-training core framework to form a core network model, and the core network model is a CNN+ ViTransformer model.
- 4. The method for building the BRDF neural representation model of the non-lambertian industrial material based on the pre-training and fine-tuning framework of claim 3, wherein the built CNN+ ViTransformer model is processed in two parts, the two parts are simultaneously parallel, one part is processed by taking the CNN+ ViTransformer model as a pre-training model and obtaining a pre-training prediction result through the pre-training model, and the other part is processed by inserting a fine-tuning model formed by inserting a LoRA fine-tuning module into a core network model.
- 5. The method for building the non-lambertian industrial material BRDF nerve representation model based on the pre-training and fine-tuning framework of claim 4, wherein a low-rank adaptation LoRA mechanism is introduced before a core network model is inserted into a fine-tuning model formed by LoRA fine-tuning modules, and the pre-training model is subjected to parameter efficient fine-tuning treatment, which comprises the following specific steps: In LoRA fine tuning stage, ensuring that the core network model keeps a frozen state and does not participate in gradient updating, inserting LoRA fine tuning module and inputting a small amount of data formed after specific application scene data are processed through BRDF prompt to form a BRDF sub-model of scene adaptation, wherein the calculation process is as follows: The calculation formula of the linear mapping weight matrix of any self-attention layer in the transducer is set as follows: ; the following parametric decomposition forms were introduced: ; in the above, W is the original weight matrix which is fixed in the pre-training stage, For the newly added low rank trainable matrix, In the embedding process, firstly, the calculation process of the Query (Q) and Value (V) mapping branches of a LoRA module which is preferably arranged in a self-attention structure is finished, and the calculation process is as follows: ; In the above, X is an input feature vector, Q 'is a corrected Query vector, and V' is a corrected Value vector; W Q is a linear mapping weight matrix for mapping the input feature X to the Query space basis, W V is a linear mapping weight matrix for mapping the input feature X to the Value space basis, B Q is a modulation base matrix corresponding to the Query, B V is a modulation base matrix corresponding to the Value, A Q is a modulation coefficient matrix of the Query, A V is a modulation coefficient matrix of the Value; the calculation formula of the training objective function in the fine tuning stage is as follows: ; In the above formula, N is a model predictive normal vector, Spatial gradient of the normal vector field; The weight coefficients of the three constraint items are respectively; The initial stage of fine tuning introduces LoRA a weight scaling strategy, i.e. used in forward propagation, the calculation process is: ; in the above formula, alpha is an adjustable scaling factor; for the newly added low rank trainable matrix, Super parameters are constrained for rank; after LoRA fine tuning is completed, the core network model combines the low-rank correction matrix with the original weight in an reasoning stage to form an equivalent weight matrix W', and the process of performing parameter efficient fine tuning processing on the pre-training model to form a BRDF sub-model of scene adaptation is completed.
- 6. The method for building the BRDF neural representation model of the non-lambertian industrial material based on the pre-training and fine-tuning architecture according to claim 1, 2, 3, 4 or 5, wherein the method is characterized in that a general model is built through pre-training, a vertical model is formed based on the adaptation and fine-tuning of a specific application scene, and the evaluation process of the surface to be tested is completed by geometrical reconstruction and defect detection, wherein the evaluation process comprises geometric accuracy evaluation, reflection characteristic fitting degree evaluation, defect detection performance evaluation, model efficiency and adaptability evaluation and robustness evaluation.
- 7. The method for building the non-lambertian industrial material BRDF nerve representation model based on the pre-training and fine-tuning framework of claim 6, wherein the geometric accuracy is estimated by calculating the angle error between the predictive normal vector and the true normal vector, the geometric accuracy of the morphology reconstruction is estimated, and the average angle error MAE and the root mean square angle error RMSE are used as evaluation indexes, and the calculation formula is: ; In the above-mentioned method, the step of, For the normal vector of the model prediction, For the true normal vector, N is the total number of pixels, the evaluation method can realize high-precision prediction on the normal vector of the surface, ensure that the average angle error on various non-lambertian materials is lower than 3 degrees, and meet the precision requirement on the geometric morphology in industrial detection.
- 8. The method for building the BRDF nerve representation model for non-Lambert Industrial materials based on the Pre-training-Fine-tuning framework of claim 6, wherein the fitting degree of the reflection characteristics is evaluated by comparing the predicted reflection brightness of the model with the real imaging brightness, the fitting capacity of the model to the reflection characteristics of the materials is evaluated, the MSE and the SSIM are used as evaluation indexes, and the calculation formula is: ; In the above-mentioned method, the step of, In order to be able to determine the number of light conditions, The gray brightness under the corresponding illumination.
- 9. The method for building the BRDF nerve representation model of the non-Lambert industrial material based on the pre-training and fine-tuning framework of claim 6, wherein the defect detection performance is evaluated as a normal vector field or a reflection residual error map based on reconstruction, the automatic detection process of the surface defects is carried out, the evaluation index is graded by taking F1 as a harmonic mean, and the calculation formula is as follows: ; in the above formula, precision is Precision and Recall is Recall.
- 10. The method for building the BRDF nerve representation model of the non-Lambert industrial material based on the pre-training-fine-tuning framework of claim 6, wherein the model efficiency and the adaptability are evaluated to evaluate the parameter efficiency, the training speed and the cross-material migration capacity of the model in the fine-tuning stage, the parameter quantity, the training time and the performance of a zero sample or a small sample on a new material are recorded, and the LoRA fine-tuning module only introduces the parameter quantity of 1% -5% of the original model to complete the adaptation process of the new material; the robustness evaluation process is an evaluation process of carrying out normal vector prediction stability and defect detection consistency on the model under non-ideal conditions of noise illumination, local shielding and imaging noise, and carrying out pressure test by adding Gaussian noise and simulating shielding areas, thereby completing the whole evaluation process of the model.
Description
Method for establishing non-lambertian industrial material BRDF nerve representation model based on pre-training-fine-tuning framework Technical Field The invention particularly relates to a method for establishing a BRDF neural representation model of a non-lambertian industrial material based on a pre-training-fine-tuning framework. Background Photometric stereo is a three-dimensional reconstruction technique for obtaining surface brightness information through multiple images under multiple light sources and inverting surface normal vectors based on reflection models. The method is characterized in that the normal vector of the surface of the object is calculated by utilizing the pixel response difference under the illumination conditions in different directions, so that three-dimensional morphology reconstruction and surface defect detection are realized. The technology has the advantages of non-contact, high precision and the like in industrial automation detection, and becomes one of important methods for realizing tasks such as surface quality detection, microstructure measurement and the like. However, conventional photometric stereo methods face significant limitations in practical industrial applications. Classical methods are typically based on lambertian reflection models, assuming that the surface is an isotropic diffuse reflector, the pixel brightness of which is only related to the angle between the illumination direction and the normal vector of the surface. This assumption is not true on the surface of most practical industrial materials such as metals, ceramics, composite materials, and the like, especially for non-lambertian surfaces with specular high light components or complex microstructure distribution, the accuracy of normal vector recovery will be directly affected by reflection model mismatch caused by the lambertian assumption, and thus the application and popularization of photometric stereo technology on complex reflection materials are restricted. To address the effects of non-lambertian reflections on the light stereo method, researchers have proposed various improvement strategies. One type of method is based on taking specular highlight and shadow areas as abnormal observations, and eliminating or weakening the specular highlight and shadow areas in the inversion process by means of threshold discrimination, weighted least squares, robust estimation or the like. The basic assumption thereof can still be expressed as a lambertian model, where outliers are considered to originate from highlights or shadows and are assumed to be sparse or negligible interference. In actual solution, a weighted least square problem is often constructed by introducing a weight function, and the influence of the observation judged to be abnormal on the normal vector estimation is reduced by the existing correlation formula. However, under non-lambertian industrial surface conditions, the specular term itself generally represents that when the specular region is simply treated as outliers and culled in the formula, it is equivalent to discarding a portion of the effective observation information strongly related to the normal vector during inversion, such that the remaining observations appear significantly unbalanced in the illumination direction distribution, resulting in reduced observability of the solution. At the same time, the brightness corresponding to the shadow area satisfies the condition of less than or equal to zero, and the quantity of available illumination constraint is further reduced after the brightness is filtered, so that the linear system tends to be underdetermined or ill-conditioned. Because the weight setting generally depends on an empirical threshold or local statistical property, consistency is difficult to maintain under different materials and different illumination conditions, the method often causes unstable normal vector estimation in scenes with large specular ratio or complex reflection behaviors, and cannot fundamentally describe physical information contained in non-lambertian reflection. Therefore, the simple treatment of highlights and shadows as outliers is theoretically still built on lambertian assumptions, failing to solve the structural mismatch problem between the reflection model and the real imaging mechanism. The other method introduces a nonlinear reflection model, enhances the adaptability of the nonlinear reflection model to the non-lambertian reflection in a luminosity three-dimensional inversion frame by means of parameterization model, optimization solution and the like, and performs joint solution on corresponding normal vector and reflection parameters through nonlinear minimization problems. Because the reflection parameters and the material properties are strongly coupled in the model, and the objective function is highly sensitive to illumination direction errors, light intensity nonuniformity and shading shadows, the problems of multiple