CN-122024153-A - Fall recognition method and device for aged and elderly people in nursing home based on multi-mode large model

CN122024153ACN 122024153 ACN122024153 ACN 122024153ACN-122024153-A

Abstract

The invention discloses a method and a device for identifying the falling of old people in a nursing home based on a multi-mode large model, which comprises the steps of firstly obtaining an old people target boundary frame coordinate set, extracting a target area based on the old people target boundary frame coordinate set to obtain an old people behavior image set, and then matching a question text for each old man behavior image in the old man behavior image set, inputting the question text into a multi-mode large model, and performing fall detection on the old man behavior image set to obtain a fall analysis answer list. And then carrying out semantic analysis and classification operation on the fall analysis answer list to obtain a fall sample index set, and carrying out fall event positioning and visual marking on the rest house video monitoring image based on the fall sample index set and the old object boundary frame coordinate set to obtain a rest house old fall identification annotation graph. Compared with the traditional detection method, the method has higher recognition accuracy and lower missing report rate.

Inventors

SHAO QIKE
LIU LEI
FANG CHEN
WANG ZHONGQI
Niu Shuaixiang
XIE JUNYU
YAN SHIHANG

Assignees

浙江工业大学

Dates

Publication Date: 20260512
Application Date: 20251209

Claims (7)

1. The method for identifying the falling of the aged in the nursing home based on the multi-mode large model is characterized by comprising the following steps of: Performing real-time capture on a camera video stream of the nursing home to obtain a video monitoring image of the nursing home, performing character recognition on the video monitoring image of the nursing home through a pre-training target detection model to obtain an old man target boundary frame coordinate set, and extracting a target area based on the old man target boundary frame coordinate set to obtain an old man behavior image set; matching a question text for each old man behavior image in the old man behavior image set, inputting the question text into a multi-mode large model, and performing fall detection on the old man behavior image set to obtain a fall analysis answer list; Performing semantic analysis and classification operation on the fall analysis answer list to obtain a fall sample index set; based on the falling sample index set and the old man target boundary frame coordinate set, falling event positioning and visual marking are carried out on the video monitoring image of the nursing home so as to obtain a falling identification mark diagram of the old man of the nursing home.
2. The method for identifying the fall of the aged in the senior citizen home based on the multi-mode large model as claimed in claim 1, wherein the extracting the target area based on the coordinate set of the target bounding box of the aged to obtain the behavioral image set of the aged comprises the following steps: performing context expansion processing on each old people target boundary frame in the old people target boundary frame coordinate set to obtain an expanded target boundary frame coordinate set; and extracting a target area image on the video monitoring image of the nursing home based on the expanded target boundary frame coordinate set to obtain an old man behavior image set.
3. The method for identifying the fall of the aged in the senior citizen's home based on the multi-modal large model according to claim 1, wherein the pairing question text for each of the behavioral images of the aged in the behavioral image set is further included before the text is input into the multi-modal large model: constructing a fine tuning training sample set; and adopting a fine tuning training sample set to perform fine tuning training on the multi-mode large model.
4. The method for identifying the fall of the aged in the senior citizen based on the multi-modal large model as claimed in claim 3, wherein the multi-modal large model comprises a large language model and a visual encoder, and the visual encoder is frozen to perform fine adjustment on a linear layer in the large language model when the multi-modal large model is subjected to fine adjustment training.
5. A method of identifying a senior citizen based on a multi-modal large model as claimed in claim 3 wherein the fine tuning training sample set comprises a fall sample, a first non-fall sample, a second non-fall sample, wherein no senior citizen falls in the first non-fall sample, a senior citizen staff is detected in the second non-fall sample, and other samples are attitude information insufficiency or image blur samples.
6. The method for identifying the old people in the nursing home based on the multi-mode large model according to claim 1, wherein the method for identifying the old people in the nursing home based on the index set of falling samples and the coordinate set of the target bounding box of the old people performs the locating and the visual marking of the falling event on the video monitoring image of the nursing home to obtain the falling identification label graph of the old people in the nursing home, further comprises: The fall analysis answer list comprises answer texts, and the answer texts of the fall samples and corresponding old man target boundary box coordinates are output together with the fall identification annotation graph of the old man in the senior citizen's hospital.
7. A fall identification device for aged persons in a nursing home based on a multi-modal large model, comprising a processor and a memory storing a number of computer instructions, characterized in that the computer instructions, when executed by the processor, implement the steps of the method of any one of claims 1 to 6.

Description

Fall recognition method and device for aged and elderly people in nursing home based on multi-mode large model Technical Field The application belongs to the technical field of computer vision, and particularly relates to a method and a device for identifying falling of old people in a senior citizen based on a multi-mode large model. Background With the rapid development of the medical field and the continuous perfection of social system, the nursing home is used as a core place for intensively caring the old people, and receives a large number of old people with weak mobility and high disease risk, and the falling incidence rate of the old people is obviously higher than that of the common old people due to the degradation of body functions. If the patient is not cured in time after falling, serious complications such as fracture, intracranial hemorrhage and the like directly threaten the life safety of the old. Therefore, the real-time and effective automatic fall detection of the behavior state of the old is an important link of the safety management of the intelligent nursing home. Currently, the identification method for the fall of the old in the nursing home mainly depends on wearable sensor technology and computer vision technology. Based on the detection technology of the wearable sensor, the core method is that acceleration and angular velocity data of human body movement are collected through equipment such as an accelerometer and a gyroscope worn by the old, a classification model is constructed by using a machine learning algorithm, and a falling event is identified by judging whether a movement parameter exceeds a falling threshold (such as sudden change of acceleration in the vertical direction). The method has the shortcomings that firstly, the old needs to continuously wear equipment, monitoring is interrupted due to insufficient comfort level or forgetting to wear the equipment, the method is especially not suitable for the old with intelligence loss and half disabilities, secondly, the detection threshold depends on specific experimental scene calibration, when the action amplitude of the old is changed slightly, such as slow falling or equipment wearing position deviation, the problems of missing report, low detection precision and the like are easy to occur, and thirdly, the wearable sensor equipment needs to correspond to the old in each nursing home, and the purchasing, later maintenance and algorithm updating of the equipment can generate larger actual cost. On the other hand, the method for identifying the falling of the old is constructed by relying on a computer vision technology, and a human body posture estimation model (such as YOLO Pose, openPose and other series models) based on a deep learning neural network is generally adopted for carrying out specific training on the method so as to meet the actual needs. However, this conventional detection method based on a single-mode vision model exposes a number of drawbacks in practical applications. Firstly, in order to learn comprehensive visual characteristics, massive annotation data are often required for training, so that time and labor are consumed, and a higher application cost threshold is brought. Secondly, the generalization capability of the method is relatively poor, and when key parts of the body of the old in an actual scene are blocked by furniture or the brightness of an image and training data have large difference due to light change, the recognition performance of the model can be obviously reduced, so that the recognition accuracy is low. Most importantly, the single-mode small model is difficult to add additional logic requirements, for example, the posture of the old people in the room of the nursing home needs to be cleaned regularly by staff, sometimes has no obvious difference from the falling behavior, and the model has high false alarm rate easily caused by not logically distinguishing the old people from the staff. Disclosure of Invention The application aims to provide a fall recognition method and device for old people in a senior citizen based on a multi-mode large model, which are used for solving the problems of high purchase and model training cost, insufficient generalization capability, high detection accuracy error reporting rate under a complex scene and the like of a traditional detection method in the background art. In order to achieve the above purpose, the technical scheme of the application is as follows: a method for identifying the falling of old people in a nursing home based on a multi-mode large model comprises the following steps: Performing real-time capture on a camera video stream of the nursing home to obtain a video monitoring image of the nursing home, performing character recognition on the video monitoring image of the nursing home through a pre-training target detection model to obtain an old man target boundary frame coordinate set, and extracting a target area based on t