EP-4502932-B1 - MICROSCOPIC IMAGE PROCESSING METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM

EP4502932B1EP 4502932 B1EP4502932 B1EP 4502932B1EP-4502932-B1

Inventors

CAI, De
HAN, Xiao

Dates

Publication Date: 20260513
Application Date: 20230518

Claims (13)

A microscopic image processing method, executable by a computer device, and comprising: performing (501) instance segmentation on a microscopic image to obtain an instance image, the instance image comprising a target object in the microscopic image; performing (502) skeleton extraction on the target object in the instance image to obtain skeleton form information of the target object, the skeleton form information representing a skeleton form of the target object; performing (503) motion analysis on the target object based on the skeleton form information to obtain a plurality of eigenvalues, the plurality of eigenvalues representing weighting coefficients each for one of a plurality of preset motion states configured to synthesize the skeleton form; and generating (503) an eigenvalue sequence comprising the plurality of eigenvalues as motion component information of the target object, wherein the performing (502) skeleton extraction on the target object in the instance image to obtain skeleton form information of the target object comprises: inputting the instance image into a skeleton extraction model for any target object in the ROI, to perform skeleton extraction on the target object through the skeleton extraction model to obtain a skeleton form image of the target object, the skeleton extraction model being used for predicting a skeleton form of a target object based on an instance image of the target object; recognizing the skeleton form image to obtain a head endpoint and a tail endpoint in a skeleton form of the target object; and determining the skeleton form image, the head endpoint, and the tail endpoint as the skeleton form information, wherein the recognizing the skeleton form image to obtain a head endpoint and a tail endpoint in a skeleton form of the target object comprises: truncating (6051) the skeleton form image to obtain a first local endpoint region and a second local endpoint region, the first local endpoint region and the second local endpoint region being respectively located at two ends of a skeleton; extracting (6052) a first histogram of oriented gradient HOG feature at one end of the skeleton based on the first local endpoint region; extracting (6053) a second HOG feature at the other end of the skeleton based on the second local endpoint region; and recognizing (6054) the one end of the skeleton and the other end of the skeleton respectively based on the first HOG feature and the second HOG feature to obtain the head endpoint and the tail endpoint.
The method according to claim 1, wherein the instance image comprises a contour image and a mask image of the target object; and the performing instance segmentation on a microscopic image to obtain an instance image comprising a target object in the microscopic image comprises: determining a region of interest, ROI, comprising the target object from the microscopic image; and performing instance segmentation on the ROI to obtain the contour image and the mask image of the target object.
The method according to claim 2, wherein in a case that there are a plurality of target objects in the microscopic image, and the ROI comprises the plurality of target objects that overlap each other, the performing instance segmentation on the ROI to obtain the contour image and the mask image of the target object comprises: determining a ROI candidate frame based on position information of the ROI, a region selected by the ROI candidate frame comprising the ROI; determining a local image feature of the ROI from a global image feature of the microscopic image, the local image feature representing a feature of the region selected by the ROI candidate frame in the global image feature; and inputting the local image feature into a bilayer instance segmentation model, to process the local image feature through the bilayer instance segmentation model, and output respective contour images and mask images of the plurality of target objects in the ROI, the bilayer instance segmentation model being used for respectively establishing image layers each for one of different image objects to obtain an instance segmentation result for each of the image objects.
The method according to claim 3, wherein the ROI comprises an occluder and an occludee that overlap each other; the bilayer instance segmentation model comprises an occluder layer network and an occludee layer network, the occluder layer network being used for extracting a contour and a mask of the occluder at a top layer, and the occludee layer network being used for extracting a contour and a mask of the occludee at a bottom layer; and the inputting the local image feature into a bilayer instance segmentation model, to process the local image feature through the bilayer instance segmentation model, and output respective contour images and mask images of the plurality of target objects in the ROI comprises: inputting the local image feature into the occluder layer network, to extract a first perceptual feature of the occluder at the top layer in the ROI through the occluder layer network, the first perceptual feature representing an image feature of the occluder on an instance segmentation task; upsampling the first perceptual feature to obtain a contour image and a mask image of the occluder; inputting, into the occludee layer network, a fused feature obtained by fusing the local image feature and the first perceptual feature, to extract a second perceptual feature of the occludee at the bottom layer in the ROI, the second perceptual feature representing an image feature of the occludee on the instance segmentation task; and upsampling the second perceptual feature to obtain a contour image and a mask image of the occludee.
The method according to claim 4, wherein the occluder layer network comprises a first convolution layer, a first graph convolution layer, and a second convolution layer, the first graph convolution layer comprising a non-local operator, and the non-local operator being used for associating pixels in an image space according to similarity of corresponding eigenvectors; and the inputting the local image feature into the occluder layer network, to extract a first perceptual feature of the occluder at the top layer in the ROI through the occluder layer network comprises: inputting the local image feature into the first convolution layer of the occluder layer network, to perform a convolution operation on the local image feature through the first convolution layer to obtain an initial perceptual feature; inputting the initial perceptual feature into the first graph convolution layer of the occluder layer network, to perform a convolution operation on the initial perceptual feature through the non-local operator at the first graph convolution layer to obtain a graph convolution feature; and inputting the graph convolution feature into the second convolution layer of the occluder layer network, to perform a convolution operation on the graph convolution feature through the second convolution layer to obtain the first perceptual feature; wherein the inputting, into the occludee layer network, a fused feature obtained by fusing the local image feature and the first perceptual feature, to extract a second perceptual feature of the occludee at the bottom layer in the ROI comprises: inputting the fused feature into a third convolution layer of the occludee layer network, to perform a convolution operation on the fused feature through the third convolution layer to obtain a perceptual interaction feature; inputting the perceptual interaction feature into a second graph convolution layer of the occludee layer network, to perform a convolution operation on the perceptual interaction feature through a non-local operator at the second graph convolution layer to obtain a graph-convolution interaction feature; and inputting the graph-convolution interaction feature into a fourth convolution layer of the occludee layer network, to perform a convolution operation on the graph-convolution interaction feature through the fourth convolution layer to obtain the second perceptual feature.
The method according to claim 3, wherein the bilayer instance segmentation model is obtained through training based on a plurality of synthetic sample images, the synthetic sample image comprising a plurality of target objects, and being synthesized based on a plurality of original images comprising only a single target object.
The method according to claim 6, wherein in a case that the target object is darker than background in the original image, a pixel value of each pixel in the synthetic sample image is equal to a lowest pixel value among pixels in a same position in the plurality of original images used for synthesizing the synthetic sample image; or in a case that the target object is brighter than background in the original image, a pixel value of each pixel in the synthetic sample image is equal to a highest pixel value among pixels in a same position in the plurality of original images.
The method according to claim 1, wherein the skeleton extraction model comprises a plurality of cascaded convolution layers; and the inputting the instance image into a skeleton extraction model, to perform skeleton extraction on the target object through the skeleton extraction model to obtain a skeleton form image of the target object comprises: inputting the instance image into the plurality of convolution layers of the skeleton extraction model, to perform convolution operations on the instance image layer by layer through the plurality of convolution layers to obtain the skeleton form image, the skeleton extraction model being obtained through training based on a sample image comprising a target object and skeleton form label information labeled on the target object.
The method according to claim 8, wherein the skeleton form label information comprises respective skeleton tangential angles of a plurality of sampling points for sampling a skeleton form of the target object in the sample image, and the skeleton tangential angle represents an angle between a tangent line corresponding to the sampling point as a tangent point and a horizontal line on the directed skeleton form from a head endpoint to a tail endpoint; and a loss function value of the skeleton extraction model at a training stage is determined based on errors between the respective skeleton tangential angles of the sampling points and a predicted tangential angle, the predicted tangential angle being obtained by sampling a skeleton form image obtained by predicting the sample image by the skeleton extraction model.
The method according to claim 1, wherein the recognizing (6054) the one end of the skeleton and the other end of the skeleton respectively based on the first HOG feature and the second HOG feature to obtain the head endpoint and the tail endpoint comprises: inputting the first HOG feature into a head-tail recognition model, to perform binary classification on the first HOG feature through the head-tail recognition model to obtain a first recognition result, the first recognition result representing whether the one end of the skeleton is the head endpoint or the tail endpoint; inputting the second HOG feature into the head-tail recognition model, to perform binary classification on the second HOG feature through the head-tail recognition model to obtain a second recognition result, the second recognition result representing whether the other end of the skeleton is the head endpoint or the tail endpoint; and determining the head endpoint and the tail endpoint based on the first recognition result and the second recognition result, the head-tail recognition model being used for determining, according to a HOG feature of a local endpoint region, whether an endpoint in a skeleton of a target object is a head endpoint or a tail endpoint.
The method according to any one of claims 1 to 10, wherein the performing (503) motion analysis on the target object based on the skeleton form information to obtain a plurality of eigenvalues comprises: sampling the skeleton form of the target object based on the skeleton form information to obtain an eigenvector formed by respective skeleton tangential angles of a plurality of sampling points, the skeleton tangential angle representing an angle between a tangent line corresponding to the sampling point as a tangent point and the horizontal line on the directed skeleton form from a head endpoint to a tail endpoint; separately sampling preset skeleton forms indicated by the plurality of preset motion states, to obtain respective preset eigenvectors of the plurality of preset motion states; and decomposing the eigenvector into a sum of products of the plurality of preset eigenvectors and the plurality of eigenvalues to obtain the plurality of eigenvalues.
A computer device, comprising one or more processors and one or more memories, the one or more memories storing at least one computer program, and the at least one computer program being loaded and executed by the one or more processors to implement the microscopic image processing method according to any one of claims 1 to 11.
A storage medium, storing at least one computer program, the at least one computer program being loaded and executed by a processor to implement the microscopic image processing method according to any one of claims 1 to 11.

Description

FIELD OF THE TECHNOLOGY This application relates to the field of image processing technologies, and in particular, to a microscopic image processing method and apparatus, a computer device, and a storage medium. BACKGROUND OF THE DISCLOSURE Nematodes are a classic multicellular organism with a short life cycle. Because the nematodes are small, easy to cultivate, and made up of a small quantity of cells, and can be operated in large batches like microorganisms, there is a need to study morphology and pedigrees of the nematodes. Relevant image processing methods are known from e.g. STEPHENS GREG J. ET AL: "Dimensionality and Dynamics in the Behavior of C. elegans",PLOS COMPUTATIONAL BIOLOGY, (2008-04-25), which discloses tracking microscopy with high spatial and temporal resolution to extract the two-dimensional shape of individual C. elegans from images of freely moving worms over long periods of time; and are also known from KE LEI ET AL: "Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers", 2021 IEEE, 20 June 2021 (2021-06-20), pages 4018-4027, which discloses segmenting images in the presence of occlusions. SUMMARY Embodiments of this application provide a microscopic image processing method and apparatus, a computer device, and a storage medium, to reduce labor costs of microscopic image analysis and improve efficiency of the microscopic image analysis. Technical solutions are as follows: The invention is set out in the appended set of claims. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a principle flowchart of a method for segmenting a round target according to an embodiment of this application.FIG. 2 is a principle flowchart of a conventional skeleton extraction method according to an embodiment of this application.FIG. 3 is a schematic diagram of analysis of a swimming frequency of a nematode according to an embodiment of this application.FIG. 4 is a schematic diagram of an implementation environment of a microscopic image processing method according to an embodiment of this application.FIG. 5 is a flowchart of a microscopic image processing method according to an embodiment of this application.FIG. 6 is a flowchart of a microscopic image processing method according to an embodiment of this application.FIG. 7 is a schematic diagram of a segmentation principle of a bilayer instance segmentation model according to an embodiment of this application.FIG. 8 is a flowchart of an instance segmentation manner for two target objects according to an embodiment of this application.FIG. 9 is a schematic principle diagram of a bilayer instance segmentation model according to an embodiment of this application.FIG. 10 is a flowchart of synthesis of a synthetic sample image according to an embodiment of this application.FIG. 11 is a schematic principle diagram of training and prediction stages of a skeleton extraction model according to an embodiment of this application.FIG. 12 is a flowchart of a method for recognizing a head endpoint and a tail endpoint according to an embodiment of this application.FIG. 13 is a schematic diagram of obtaining a local endpoint region through truncation according to an embodiment of this application.FIG. 14 is a flowchart of motion analysis of a target object according to an embodiment of this application.FIG. 15 is a principle diagram of motion analysis of a target object according to an embodiment of this application.FIG. 16 is a principle flowchart of a microscopic image processing method according to an embodiment of this application.FIG. 17 is a schematic structural diagram of a microscopic image processing apparatus according to an embodiment of this application.FIG. 18 is a schematic structural diagram of a terminal according to an embodiment of this application.FIG. 19 is a schematic structural diagram of a computer device according to an embodiment of this application. DESCRIPTION OF EMBODIMENTS Terms in embodiments of this application are described as follows: Nematode: As an example of a target object in this application, the nematode is a classic model organism. As a multicellular organism with a short life cycle, the nematode is small, easy to cultivate, and made up of a small quantity of cells, and can be operated in large batches like microorganisms. Therefore, morphology and pedigrees of the constitutive cells can be studied exhaustively. A cuticle mainly including collagen, lipid, and glycoprotein can be formed above an epithelial layer of the nematode. The cuticle is a protective exoskeleton of the nematode, and is a necessary structure for the nematode to maintain a form.Otsu's algorithm: is an automatic thresholding method self-adaptive to bimodality and proposed by Japanese scholar Nobuyuki Otsu in 1979, and is also referred to as the Otsu's method, the maximum between-class variance method, the maximum variance automatic thresholding method, and the like. Through the Otsu's algorithm, an image is divided into two parts, namely, background and a