EP-4604117-B1 - METHOD AND SYSTEM FOR A VOICE ASSISTED QUALITY INSPECTION

EP4604117B1EP 4604117 B1EP4604117 B1EP 4604117B1EP-4604117-B1

Inventors

SHARMA, HIMANSHU
VISHWAKARMA, Arpit
MALHOTRA, CHETAN PREMKUMAR
BASAVARSU, Purushottham Gautham

Dates

Publication Date: 20260513
Application Date: 20250205

Claims (12)

A processor-implemented method (300) comprising: receiving (302), via an Input/Output (I/O) interface, one or more voice inputs from an inspector while inspecting an artifact, wherein the artifact is a physical entity subjected to inspection; pre-processing (304), via one or more hardware processors, the one or more voice inputs of the inspector to remove noise using a predefined filtering technique; converting (306), via the one or more hardware processors, the one or more pre-processed voice inputs into a predefined language text using a predefined language translation technique to ensure uniformity; characterised in that the predefined language translation technique includes a universal language model, and a custom speech model, and in that the method further comprises: transforming (308), via the one or more hardware processors, the predefined language text into one or more structured defects for a structured defect log on a domain model of artifact under inspection using a large language model (LLM); analyzing (310), via the one or more hardware processors, the structured defect log to map a defect name, a defect type, and a defect location to each sub-section of the physical entity represented in the domain model using the large language model (LLM); and converting (312), via the one or more hardware processors, the analyzed one or more structured defects into a voice format to facilitate an effective communication to a user for repair and for reporting purpose.
The processor-implemented method (300) as claimed in claim 1, wherein a voice to text conversion model is integrated with the large language model and the domain model of artifact to convert one or more defects spoken by the inspector into the one or more structured defects represented by the domain model.
The processor-implemented method (300) as claimed in claim 1, wherein the defect type, the defect location and the sub-section of the artifact is identified from the one or more voice inputs from the inspector.
The processor-implemented method (300) as claimed in claim 1, wherein the one or more structured defects are represented in at least one of an orthogonal view, a two-dimensional (2D) view, a three-dimensional (3D) view of the artifact being inspected for verification.
A system (100) comprising: a memory (110) storing instructions; one or more Input/Output (I/O) interfaces (104); and one or more hardware processors (108) coupled to the memory (110) via the one or more I/O interfaces (104), wherein the one or more hardware processors (108) are configured by the instructions to: receive one or more voice inputs from an inspector while inspecting an artifact, wherein the artifact is a physical entity subjected to inspection; pre-process the one or more voice inputs of the inspector to remove noise using a predefined filtering technique; convert the one or more pre-processed voice inputs into a predefined language text using a predefined language translation technique to ensure uniformity; characterised in that the predefined language translation technique includes a universal language model, and a custom speech model, and in that the one or more hardware processors (108) are further configured by the instructions to: transform the predefined language text into one or more structured defects for a structured defect log on a domain model of artifact under inspection using a large language model (LLM); analyze the structured defect log to map a defect name, a defect type, and a defect location to each sub-section of the physical entity represented in the domain model using the large language model (LLM); and convert the analyzed one or more structured defects into a voice format to facilitate an effective communication to a user for repair and for reporting purpose.
The system (100) as claimed in claim 5, wherein a voice to text conversion model is integrated with the large language model and the domain model of artifact to convert one or more defects spoken by the inspector into the one or more structured defects represented by the domain model.
The system (100) as claimed in claim 5, wherein the defect type, the defect location and the sub-section of the artifact is identified from the one or more voice inputs from the inspector.
The system (100) as claimed in claim 5, wherein the one or more structured defects are represented in at least one of an orthogonal view, a two-dimensional (2D) view, a three-dimensional (3D) view of the artifact being inspected for verification.
One or more non-transitory machine-readable information storage mediums comprising one or more instructions which when executed by one or more hardware processors cause: receiving, via an Input/Output (I/O) interface, one or more voice inputs from an inspector while inspecting an artifact, wherein the artifact is a physical entity subjected to inspection; pre-processing the one or more voice inputs of the inspector to remove noise using a predefined filtering technique; converting the one or more pre-processed voice inputs into a predefined language text using a predefined language translation technique to ensure uniformity; characterised in that the predefined language translation technique includes a universal language model, and a custom speech model, and in that the one or more instructions which when executed by the one or more hardware processors further cause: transforming the predefined language text into one or more structured defects for a structured defect log on a domain model of artifact under inspection using a large language model (LLM); analyzing the structured defect log to map a defect name, a defect type, and a defect location to each sub-section of the physical entity represented in the domain model using the large language model (LLM); and converting the analyzed one or more structured defects into a voice format to facilitate an effective communication to a user for repair and for reporting purposes.
The one or more non-transitory machine-readable information storage mediums of claim 9, wherein a voice to text conversion model is integrated with the large language model and the domain model of artifact to convert one or more defects spoken by the inspector into the one or more structured defects represented by the domain model.
The one or more non-transitory machine-readable information storage mediums of claim 9, wherein the defect type, the defect location and the sub-section of the artifact is identified from the one or more voice inputs from the inspector.
The one or more non-transitory machine-readable information storage mediums of claim 9, wherein the one or more structured defects are represented in at least one of an orthogonal view, a two-dimensional (2D) view, a three-dimensional (3D) view of the artifact being inspected for verification.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY The present application claims priority from Indian patent application number 202421010270, filed on February 14, 2024. TECHNICAL FIELD The disclosure herein generally relates to the field of a voice assisted defect inspection, and more particularly, a method and system for a voice assisted defect logging and quality inspection. BACKGROUND At present, industries like paint shops use handwritten notes or digital diaries for logging paint defects during quality inspection. These kind of inspections in the industries are time bound. This method of registering defects during quality inspection can take a lot of time as inspectors have to inspect the parts and then note down the defects in a notepad or a tablet, in certain sensitive inspection cases inspectors are required to use gloves while inspecting which leads to added overhead of working with gloves while noting down the defects. Sharing of these defect logs also becomes difficult as they first must be digitized to be shared across the different units in the industry. Some of the technical component such as voice processor with noise reduction, large language models (LLM) are available as siloed technological advancement. However, these solutions focus on leveraging the domain model of artifact under inspection to utilise the listed technologies in a way that provide radical benefits, including higher accuracy of speech to text conversion by combining voice processor and LLM with domain models. It is due to domain model structure ability to map defect on artifact during visualisation, due to domain-based persistence seamless digitised integration with other ecosystems. CN 111 127 699 A discloses an automobile defect data automatic input method, an automobile defect data automatic input system, an automobile defect data automatic input device and a medium. The automobile defect data automatic input method comprises the steps of: receiving audio data of a user; performing voice recognition on the audio data, and determining defect information corresponding to the audio data; acquiring a defect region image of a defect position of an automobile; identifying the defect region image, and determining identification information of a defect region; and storing the defect information and the identification information of the defect region. According to the automobile auditing auxiliary scheme provided by this document, an intelligent auxiliary function is provided for an auditing process by combining voice recognition, image recognition and other technologies, defect information and identification information of the defect region are automatically input, and the auditing efficiency and precision are improved. SUMMARY Embodiments of the present disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a method for a voice assisted defect logging and quality inspection is provided. The processor-implemented method includes receiving, via an Input/Output (I/O) interface, one or more voice inputs from an inspector while inspecting an artifact, wherein the artifact is a physical entity subjected to inspection. The one or more voice inputs of the inspector are pre-processed to remove noise using a predefined filtering technique. The one or more pre-processed voice inputs is converted into a predefined language text using a predefined language translation technique to ensure uniformity, wherein the predefined language translation technique includes a universal language model, and a custom speech model. Herein, a voice to text conversion model is integrated with the large language model and a domain model of artifact to convert one or more defects spoken by the inspector into the one or more structured defects represented by the domain model. The defect type, the defect location and the sub-section of the artifact is identified from the one or more voice inputs from the inspector. The predefined language text is transformed into one or more structured defects for a structured defect log on a domain model of artifact under inspection using a large language model (LLM). Further, the structured defect log is analyzed to map a defect name, a defect type, and a defect location to each sub-section of the physical entity represented in the domain model using the large language model (LLM). Finally, the analyzed one or more structured defects are converted into a voice format to facilitate an effective communication to a user for repair and for reporting purpose. Herein, the one or more structured defects are represented in at least one of an orthogonal view, a two-dimensional (2D) view, a three-dimensional (3D) view of the artifact being inspected for verification. In another embodiment, a system for a voice assisted defect logging and quality inspection is provided. The system comprises