US-20260125012-A1 - VEHICLE WINDOW CONTROL METHOD AND VEHICLE WINDOW CONTROL DEVICE
Abstract
The present disclosure relates to a multi-mode vehicle window control method and device. The method combines voice recognition and computer vision technologies, detects a user's voice wake-up word using a vehicle-mounted microphone, turns on the vehicle-mounted camera to detect the user's face and head, analyzes the user's gaze, face pose, and head posture to estimate the user's attention region, and aligns the attention region with a vehicle body coordinate system. Intelligent control of a vehicle window is realized according to specific coordinates of an attention area on the vehicle window. Even if the user's face or eyes are not detected, the vehicle window may be operated accurately by determining the user's intention through the face or head pose.
Inventors
- Jun Zhou
- Yicheng Fan
- Sungbo Yang
- Yao Yao
Assignees
- HYUNDAI MOTOR COMPANY
- KIA CORPORATION
Dates
- Publication Date
- 20260507
- Application Date
- 20251104
- Priority Date
- 20241107
Claims (20)
- 1 . A method performed by an apparatus of a vehicle, the method comprising: detecting, via a microphone of the vehicle, a spoken preset wake-up word associated with a user of the vehicle; turning on a sensor of the vehicle based on the detecting of the spoken preset wake-up word; obtaining, via the sensor, image data associated with face and head of the user; determining, based on the image data, a face state of the user, performing at least one of: when the face state indicates that eyes on the face are visible, estimating a gaze of the user and setting, as a first attention region, a projection region of the vehicle toward which the gaze is directed; when the face state indicates that eyes on the face are invisible and the face is visible, estimating a face pose of the user and setting, as a second attention region, a projection region of the vehicle toward which the face pose is directed; or when the face state indicates that the face is invisible, estimating a head pose of the user and setting, as a third attention region, a projection region of the vehicle toward which the head pose is directed; aligning one attention region of the first attention region, the second attention region, or the third attention region, with a body coordinate system of the vehicle to determine coordinates of attention of the user on a window of the vehicle; and based on the spoken preset wake-up word and the coordinates of the attention on the window, performing opening or closing control of the window.
- 2 . The method of claim 1 , wherein the determining of the face state of the user comprises: performing point detection on the face to obtain coordinate information of facial feature points; and estimating glabella depth using the coordinate information of the facial feature points to determine depth information associated with the face in a three-dimensional space.
- 3 . The method of claim 1 , wherein the performing of the opening or closing control of the window is based on the attention remaining longer than a predetermined duration threshold.
- 4 . The method of claim 2 , wherein the performing of the opening or closing control of the window is based on a determination that the attention remains on a preset position in the vehicle longer than a predetermined duration threshold.
- 5 . The method of claim 1 , further comprising: after the detecting of the spoken preset wake-up word and before the obtaining of the image data, performing dynamic calibration on the sensor to obtain external parameter information of the sensor with respect to the body coordinate system, wherein the external parameter information is used to align the image data with the body coordinate system when the sensor is at different positions.
- 6 . The method of claim 2 , further comprising: between the detecting of the spoken preset wake-up word and the obtaining of the image data, obtaining external parameter information of the sensor with respect to the body coordinate system, wherein the external parameter information is used to align the image data with the body coordinate system.
- 7 . An apparatus of a vehicle, the apparatus comprising: a microphone configured to detect a spoken preset wake-up word of a user; a sensor configured to obtain image data associated with face and head of the user; and a processor circuit configured to: turn on the sensor based on the spoken preset wake-up word, determine, based on the image data, a face state of the user, based on the face state of the user indicating that eyes on the face are visible, estimate a gaze of the user and set, as a first attention region, a projection region of the vehicle toward which the gaze is directed, based on the face state of the user indicating that eyes on the face are invisible and the face is visible, estimate a face pose of the user and set, as a second attention region, a projection region of the vehicle toward which the face pose is directed, based on the face state of the user indicating that the face is invisible, estimate a head pose of the user and set, as a third attention region, a projection region of the vehicle toward which the head pose is directed, align one attention region of the first attention region, the second attention region, or the third attention region, with a body coordinate system of the vehicle to determine coordinates of attention of the user on a window of the vehicle, and based on the spoken preset wake-up word and the coordinates of the attention on the window, perform opening or closing control of the window.
- 8 . The apparatus of claim 7 , wherein the processor circuit is configured to: perform point detection on the face to obtain coordinate information of facial feature points, and estimate glabella depth using the coordinate information of the facial feature points to determine depth information associated with the face in a three-dimensional space.
- 9 . The apparatus of claim 7 , wherein the processor circuit is configured to perform the opening or closing control of the window based on the attention remaining longer than a predetermined duration threshold.
- 10 . The apparatus of claim 8 , wherein the processor circuit is configured to perform the opening or closing control of the window based on a determination that the attention remains on a preset position in the vehicle longer than a predetermined duration threshold.
- 11 . The apparatus of claim 7 , wherein the processor circuit is configured to perform dynamic calibration on the sensor to obtain external parameter information of the sensor with respect to the body coordinate system, wherein the external parameter information is used to align the image data with the body coordinate system when the sensor is at different positions.
- 12 . The apparatus of claim 8 , wherein the processor circuit is configured to obtain external parameter information of the sensor with respect to the body coordinate system, wherein the external parameter information is configured to be used to align the image data with the body coordinate system.
- 13 . A vehicle comprising: at least one sensor configured to obtain interior data of the vehicle, wherein the interior data comprises at least one of voice data of an occupant and image data of the occupant captured within a cabin of the vehicle; and a processor circuit configured to: detect, from the voice data, a voice command associated with a window of the vehicle, process, from the image data, at least one of a gaze vector, a face pose, or a head pose of the occupant to estimate an attention region of the occupant, determine, based on the estimated attention region, a position of the window, output, based on the detected voice command and the position of the window, a signal indicating to operate the window, and control, based on the signal, operation of the window.
- 14 . The vehicle of claim 13 , wherein the processor circuit is configured to detect a user-defined wake-up word as part of the voice command associated with the window.
- 15 . The vehicle of claim 13 , wherein the processor circuit is configured to prioritize execution of the operation of the window based on a voice command of a driver of the vehicle over a voice command of a passenger of the vehicle.
- 16 . The vehicle of claim 13 , wherein the processor circuit is configured to, when eyes of the occupant are invisible, estimate the attention region based on the face pose.
- 17 . The vehicle of claim 13 , wherein the processor circuit is configured to, when both eyes and face of the occupant are invisible, estimate the attention region based on the head pose.
- 18 . The vehicle of claim 13 , wherein the processor circuit is configured to generate the signal based on the attention region remaining on the window for at least a predetermined time period.
- 19 . The vehicle of claim 13 , wherein the processor circuit is configured to, based on determining lighting conditions inside the cabin as insufficient, obtain an infrared image of the occupant and perform face depth estimation using the infrared image.
- 20 . The vehicle of claim 13 , wherein the processor circuit is configured to, based on determining at least one of a vehicle speed, an outside temperature, or a stored user preference, control a degree of opening of the window.
Description
CROSS-REFERENCE TO RELATED APPLICATION This application claims the benefit of priority to Chinese Patent Application No. 202411580916.9 filed in the Chinese National Intellectual Property Administration on Nov. 7, 2024, the entire contents of which are incorporated herein by reference. TECHNICAL FIELD The present disclosure relates to vehicle control technology, and more particularly, to a multi-mode vehicle window control method combining computer vision and voice recognition. BACKGROUND The matters described in this Background section are only for enhancement of understanding of the background of the disclosure, and should not be taken as acknowledgment that they correspond to prior art already known to those skilled in the art. A control of vehicle comfort, convenience, and intelligence may be a core element in design and manufacturing of vehicles. With the advancement of technology, vehicle window control systems may be also gradually evolving toward automation and intelligence. However, vehicle window control may primarily rely on manual operation, which realizes the lifting of the vehicle window through a physical button or switch. While this method is simple and intuitive, it has certain limitations in practical use. First, manual control requires the driver or a passenger to directly operate a button, which may cause inconvenience to the driver or passenger in certain situations. For example, when the driver needs to focus on driving, operating the vehicle window button may distract attention, which may affect driving safety. In addition, in an emergency situation, a quick reaction to open or close the vehicle window may be limited. Next, vehicle window control technologies may lack the ability to intelligently sense the in-cabin environment and the occupant's intention. The opening and closing status of the vehicle window often may not be dynamically adjusted according to changes in the vehicle's interior and exterior environment and the needs of the occupants. For example, when the in-cabin air quality is poor, the system may not automatically adjust the vehicle window to improve airflow, or when external noise is too loud, it may not automatically close the vehicle window to reduce noise interference. To address the aforementioned issues, some technologies may introduce computer vision and artificial intelligence technologies to improve the intelligence level of vehicle window control. For example, through camera and image processing technology, the system may recognize an in-cabin occupant's facial expression, gaze direction, and even gesture movements. By combining voice recognition technology, the vehicle window control system may receive and interpret passengers'voice commands to realize more intuitive and convenient operation. Such technologies may improve the degree of automation of vehicle window operation and reduce the need for manual operation, which improves the convenience of vehicle window control to a certain extent. However, vision control may rely heavily on the gaze direction of the human eye, and if the human eye is occluded, blurred, lost, or the face is lost, the estimation of the gaze direction is lost, which may lead to a lack of control information and greatly impact the user experience. Furthermore, due to the limited accuracy, stability, and convenience, it may be difficult to satisfy the needs of a convenient and fast intelligent cabin. SUMMARY An example of the present disclosure provides a vehicle window control method that integrates vision and voice multi-modes that may accurately perform vehicle window control even if there are no eyes or no face. According to the present disclosure, a method performed by an apparatus of a vehicle may comprise detecting, via a microphone of the vehicle, a spoken preset wake-up word associated with a user of the vehicle, turning on a sensor of the vehicle based on the detecting of the spoken preset wake-up word, obtaining, via the sensor, image data associated with a face and a head of the user, and determining, based on the image data, a face state of the user. The method may further comprise performing at least one of the following: when the face state indicates that eyes on the face are visible, estimating a gaze of the user and setting, as a first attention region, a projection region of the vehicle toward which the gaze is directed; when the face state indicates that eyes on the face are invisible and the face is visible, estimating a face pose of the user and setting, as a second attention region, a projection region of the vehicle toward which the face pose is directed; or when the face state indicates that the face is invisible, estimating a head pose of the user and setting, as a third attention region, a projection region of the vehicle toward which the head pose is directed. The method may further comprise aligning one attention region of the first attention region, the second attention region, or the third attention regio