EP-4434005-B1 - MONITORING AN ENTITY IN A MEDICAL FACILITY

EP4434005B1EP 4434005 B1EP4434005 B1EP 4434005B1EP-4434005-B1

Inventors

BRESCH, Erik
BOUTS, Mark Jacobus Rosalie Joseph
ZUO, FEI
VAN DER HEIDE, Esther Marjan

Dates

Publication Date: 20260506
Application Date: 20221109

Claims (15)

A computer implemented method for use in monitoring a first entity in a medical facility, the method comprising: obtaining an image of the medical facility; using a machine learning process to fit a first articulated model to the first entity in the image, wherein the first articulated model comprises keypoints corresponding to joints and affinity fields that indicate links between the keypoints; and determining a location or posture of the first entity in the medical facility from relative locations of fitted keypoints of the first articulated model in the image, characterized in that the step of using a machine learning process to fit a first articulated model to a first entity in the image comprises: using a first deep neural network to determine a first set of locations in the image corresponding to the keypoints in the first articulated model; and using a first graph-fitting process that takes as input the locations in the image corresponding to the keypoints and the affinity fields in the first model to fit the first articulated model to the first entity in the image.
A method as in claim 1, wherein the keypoints correspond to position co-ordinates, and wherein the affinity fields correspond to vectors linking the co-ordinates of the relevant keypoints.
A method as in claim 1 or 2, wherein the first articulated model is represented as: a tuple of co-ordinates, each coordinate in the tuple of coordinates corresponding to a keypoint, and a tuple of vectors between different pairs of co-ordinates in the tuple of co-ordinates, each vector corresponding to an affinity field.
A method as in any of claims 1-3, wherein the machine learning process comprises use of a neural network.
A method as in any of claims 1-4 wherein the image is a frame in a video and wherein the method further comprises repeating steps i), ii) and iii) on a sequence of frames in the video; and determining a change in posture or a change in location of the first entity across the sequence of frames.
A method as in any of claims 1-5 wherein the location or posture is used to determine whether an event has occurred with respect to the first entity, wherein: the first entity is a person and wherein the event is: the person exiting a bed; the person having a seizure; or the person remaining in one position for longer than a predefined time threshold; or wherein: the first entity is a piece of medical equipment and wherein the event is: the piece of medical equipment being moved from a first location to a second location; the piece of equipment being attached to a patient; or the piece of equipment being used to perform a medical procedure on a patient.
A method as in any one of the preceding claims further comprising: using the machine learning process to fit a second articulated model to a second entity in the image, wherein the second articulated model comprises keypoints corresponding to joints and affinity fields that indicate links between the keypoints; and determining an interaction between the first entity and the second entity in the image from relative locations of fitted keypoints of the first articulated model and fitted keypoints of the second articulated model, determining depth information associated with fitted keypoints in the first articulated model and fitted keypoints in the second articulated model; and wherein the step of determining an interaction between the first entity and the second entity in the image is further based on the depth information.
A method as in claim 7 wherein the first entity is a clinician, the second entity is a patient and the first interaction is: contact between the clinician and the patient; or a medical procedure being performed on the patient by the clinician.
A method as in claim 1 when dependent on claim 7 or 8 further comprising: using the first deep neural network to determine a second set of locations in the image corresponding to the keypoints in the second articulated model; and using a second graph-fitting process that takes as input the locations in the image corresponding to the keypoints and the affinity fields in the second model to fit the second articulated model to the second entity in the image.
A method as in claim 1 when dependent on claim 7 or 8 further comprising: using a second deep neural network to determine a second set of locations in the image corresponding to the keypoints in the second articulated model; and using a second graph-fitting process that takes as input the locations in the image corresponding to the keypoints and the affinity fields in the second model to fit the second articulated model to the second entity in the image.
A method as in any one of the preceding claims wherein the location or posture of the first entity is used to determine whether an item in a clinical workflow has been performed; and updating the workflow with the result of the determination.
A method as in any one of claims 1 to 11 wherein the method is triggered by an item in a clinical workflow and wherein the location or posture of the first entity is used to determine whether the item has been performed; and updating the workflow with the result of the determination.
A computer program product comprising computer readable medium the computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform the method as claimed in any one of the preceding claims.
An apparatus for use in monitoring a first entity in a medical facility, the apparatus comprising: a memory comprising instruction data representing a set of instructions; and a processor configured to communicate with the memory and to execute the set of instructions, wherein the set of instructions, when executed by the processor, cause the processor to: obtain an image of the medical facility; use a machine learning process to fit a first articulated model to the first entity in the image, wherein the first articulated model comprises keypoints corresponding to joints and affinity fields that indicate links between the keypoints; and determine a location or posture of the first entity in the medical facility from relative locations of fitted keypoints of the first articulated model in the image, characterized in that the use of a machine learning process to fit a first articulated model to a first entity in the image comprises: using a first deep neural network to determine a first set of locations in the image corresponding to the keypoints in the first articulated model; and using a first graph-fitting process that takes as input the locations in the image corresponding to the keypoints and the affinity fields in the first model to fit the first articulated model to the first entity in the image.
An apparatus as in claim 14 further comprising: an image acquisition unit for obtaining the image; and/or a time of flight camera to obtain image depth information for the fitted keypoints of the entity in the image.

Description

FIELD OF THE INVENTION The disclosure herein relates to monitoring an entity (e.g. person, clinician, piece of equipment) in a medical facility. BACKGROUND OF THE INVENTION Workflows (otherwise known as clinical workflows) are used in medical facilities (hospitals, clinics, etc.) to ensure that appropriate actions are taken for each patient, in a standardized manner. This helps ensure best practice in medical facilities and compliance with clinical guidelines. Workflows often specify a particular set of tasks or checks (items in the workflow) that should be performed with respect to the patient. Workflows may be used at all stages of the patient's treatment. for example, there may be a workflow associated with admitting the patient to the medical facility; another workflow associated with triage of the patient; and subsequent workflows that are used dependent on the particular issues or treatment pathways identified for the patient. Workflow management (e.g. recording when actions in a workflow have been performed) is a significant, yet important overhead in medical facilities. As such, automated analysis, optimization, and control of clinical workflows is an ongoing area of active research. Aside from workflow management, there are other tasks in medical facilities that it is desirable to automate, for example, equipment and/or patient tracking. The disclosure herein aims to address these problems and others. SUMMARY OF THE INVENTION Various projects have aimed to automate different aspects of work-flow management. Previous work in this area has, for example, tracked patients, medical staff, and equipment in hospital settings using infra-red light sensor tags with a view to improving resource allocation and avoiding supply bottlenecks, e.g., in an emergency department. However, such data is often comparatively coarsely resolved in time and space, and the subsequent semantic understanding of the clinical processes is far from easy. Another project proposes the use of in-hospital video (infra-red and/or depth) data, which offers much richer information; it allows capture of the presence, location, and activities of multiple people, e.g., medical care givers and patients, as well as the use of medical equipment in great spatial and temporal detail. The room set-up and devices, in combination with the information on the people in an image can give a complete view of the context. Video monitors directly capture events such as a nurse changing an infusion pump, a nurse working with a monitor, a patient being sat in a chair for a certain time, etc. However, a significant challenge is the automation of such video analysis by means of computer algorithms. In particular, clinical environments often present cluttered and highly complex scenes in which conventional classic image processing techniques for object detection and tracking tend to struggle or fail entirely. Artificial Intelligence (AI) technology and, in particular, deep learning (DL) methods for large neural networks provide an opportunity for real-time video analysis. For example, the You Only Look Once (YOLO) algorithm described in the paper by J. Redmon, S. Divvala, R. Girshick and A. Farhadi ("You Only Look Once: Unified, Real-Time Object Detection," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779-788, doi: 10.1109/CVPR.2016.91) enables real-time identification and tracking of objects in a video stream. YOLO produces bounding boxes identifying the location of desired objects, but has the disadvantage that it is difficult to infer further semantic meaning from the video feeds using YOLO alone. Another deep neural network solution, namely the "OpenPose" algorithm (Hidalgo et al., "Single-Network Whole-Body Pose Estimation", 2019) is capable of detecting humans in image and video data. OpenPose confers more information than YOLO, as the outputs include the locations of keypoints which might include (depending on the precise model used) the location of the head, shoulders, hips, elbows of the people in the images. It has been realized by the inventors herein that algorithms such as OpenPose might be advantageously applied in medical facilities to extract semantic information from a video feed, allowing deeper understanding of the events taking place in the hospital. As will be described in more detail below, such semantic information may thus be used to update clinical workflows in a reliable and automated manner. Summary The invention is defined by the appended claims. Thus, according to a first aspect herein there is a method for use in monitoring a first entity in a medical facility, the method comprising: i) obtaining an image of the medical facility; ii) using a machine learning process to fit a first articulated model to the first entity in the image, wherein the first articulated model comprises keypoints corresponding to joints and affinity fields that indicate links between the keypoints; and iii) determining