US-20260127393-A1 - Computing System and Method for Answering Questions About Construction Documents Using Generative Artificial Intelligence
Abstract
An example computing platform is configured to: (i) receive from a client device associated with a user, a question regarding a construction project, (ii) receive, from the client device associated with the user, one or more construction documents related to the construction project, (iii) based on the received question and the one or more construction documents, prepare input data for a generative AI model architecture, (iv) provide the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question, and (v) cause the client device to present the produced response to the user.
Inventors
- Reza Mohebbian
- Jiazi LIU
- Mohammad Mostafa Soltani
- Azadeh Yazdan Panah Gohar Rizi
Assignees
- Procore Technologies, Inc.
Dates
- Publication Date
- 20260507
- Application Date
- 20241104
Claims (20)
- 1 . A computing platform comprising: at least one communication interface; at least one processor; at least one non-transitory computer-readable medium; and program instructions stored on the at least one non-transitory computer-readable medium that, when executed by the at least one processor, cause the computing platform to: receive from a client device associated with a user, a question regarding a construction project; receive, from the client device associated with the user, one or more construction documents related to the construction project; based on the received question and the one or more construction documents, prepare input data for a generative AI model; provide the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question; and cause the client device to present the produced response to the user.
- 2 . The computing platform of claim 1 , wherein the generative AI model comprises: one or more image transformers configured to produce image embeddings; one or more textual transformers configured to produce text embeddings; one or more first feed forward neural networks configured to produce transformed image embeddings; and one or more second feed forward neural networks configured to produce transformed text embeddings.
- 3 . The computing platform of claim 1 , wherein the program instructions that, when executed by the at least one processor, cause the computing platform to, based on the received question and the one or more construction documents, prepare input data for a generative AI model comprise program instructions that, when executed by the at least one processor, cause the computing platform to: extract image data associated with the one or more construction documents; extract textual data from the received question and from the one or more construction documents.
- 4 . The computing platform of claim 3 , wherein the program instructions that, when executed by the at least one processor, cause the computing platform to provide the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question comprise program instructions that, when executed by the at least one processor, cause the computing platform to: route the extracted image data to the one or more image transformers to cause the one or more image transformers to produce one or more image embeddings; route the one or more image embeddings to the one or more first feed forward neural networks to cause the one or more first feed forward neural networks to produce transformed image embeddings; route the extracted textual data to the one or more text transformers to cause the one or more text transformers to produce one or more text embeddings; and route the one or more text embeddings to the one or more second feed forward neural networks to cause the one or more second feed forward neural networks to produce transformed text embeddings.
- 5 . The computing platform of claim 4 , wherein the generative AI model comprises: a router configured to combine the transformed image embeddings and the transformed text embeddings in accordance with learnable temperature parameters; and an output transformer configured to produce, from the combination of the transformed image embeddings and the transformed text embeddings a response to the question, and wherein the program instructions that, when executed by the at least one processor, cause the computing platform to provide the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question comprise program instructions that, when executed by the at least one processor, cause the computing platform to: determine a set of respective temperature parameters, with each respective temperature parameter corresponding to one of the first and second feed forward neural networks; route the transformed image embeddings and transformed text embeddings to the router to cause the router to combine the transformed image embeddings and the transformed text embeddings in accordance with the respective temperature parameters into a combined transformed embedding; and route the combined transformed embedding to the output transformer to cause the output transformer to produce a response to the question based on the combined transformed embedding.
- 6 . The computing platform of claim 4 , wherein the one or more image embeddings comprise a set of vector embeddings, each vector embedding in the set of vector embeddings having a first embedding dimension, wherein the set of vector embeddings represents an encoding of token data for tokens identified in the image; wherein the program instructions that, when executed by the at least one processor, cause the computing platform to provide the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question comprise program instructions that, when executed by the at least one processor, cause the computing platform to: reduce the embedding dimension of the set of vector embeddings from the first embedding dimension to a second embedding dimension.
- 7 . The computing platform of claim 4 , wherein the program instructions that, when executed by the at least one processor, cause the computing platform to, extract image data from the one or more construction documents, comprise program instructions that, when executed by the at least one processor, cause the computing platform to: divide the extracted image data associated with the one or more construction documents into a plurality of image patches, wherein the plurality of image patches collectively represent the image data associated with the one or more construction documents, and wherein the program instructions that, when executed by the at least one processor, cause the computing platform to provide the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question comprise program instructions that, when executed by the at least one processor, cause the computing platform to: route the plurality of image patches to the one or more image transformers to cause the one or more image transformers to produce a respective image embedding for each of the plurality of image patches; route the respective image embeddings to the one or more first feed forward neural networks to cause the one or more first feed forward neural networks to produce a respective transformed image embedding for each of the respective image embeddings.
- 8 . A non-transitory computer-readable medium, wherein the non-transitory computer-readable medium is provisioned with program instructions that, when executed by at least one processor, cause a computing platform to: receive from a client device associated with a user, a question regarding a construction project; receive, from the client device associated with the user, one or more construction documents related to the construction project; based on the received question and the one or more construction documents, prepare input data for a generative AI model; provide the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question; and cause the client device to present the produced response to the user.
- 9 . The non-transitory computer-readable medium of claim 8 , wherein the generative AI model comprises: one or more image transformers configured to produce image embeddings; one or more textual transformers configured to produce text embeddings; one or more first feed forward neural networks configured to produce transformed image embeddings; and one or more second feed forward neural networks configured to produce transformed text embeddings.
- 10 . The non-transitory computer-readable medium of claim 8 , wherein the program instructions that, when executed by the at least one processor, cause the computing platform to, based on the received question and the one or more construction documents, prepare input data for a generative AI model comprise program instructions that, when executed by the at least one processor, cause the computing platform to: extract image data associated with the one or more construction documents; extract textual data from the received question and from the one or more construction documents.
- 11 . The non-transitory computer-readable medium of claim 10 , wherein the program instructions that, when executed by the at least one processor, cause the computing platform to provide the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question comprise program instructions that, when executed by the at least one processor, cause the computing platform to: route the extracted image data to the one or more image transformers to cause the one or more image transformers to produce one or more image embeddings; route the one or more image embeddings to the one or more first feed forward neural networks to cause the one or more first feed forward neural networks to produce transformed image embeddings; route the extracted textual data to the one or more text transformers to cause the one or more text transformers to produce one or more text embeddings; and route the one or more text embeddings to the one or more second feed forward neural networks to cause the one or more second feed forward neural networks to produce transformed text embeddings.
- 12 . The non-transitory computer-readable medium of claim 11 , wherein the generative AI model comprises: a router configured to combine the transformed image embeddings and the transformed text embeddings in accordance with learnable temperature parameters; and an output transformer configured to produce, from the combination of the transformed image embeddings and the transformed text embeddings a response to the question, and wherein the program instructions that, when executed by the at least one processor, cause the computing platform to provide the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question comprise program instructions that, when executed by the at least one processor, cause the computing platform to: determine a set of respective temperature parameters, with each respective temperature parameter corresponding to one of the first and second feed forward neural networks; route the transformed image embeddings and transformed text embeddings to the router to cause the router to combine the transformed image embeddings and the transformed text embeddings in accordance with the respective temperature parameters into a combined transformed embedding; and route the combined transformed embedding to the output transformer to cause the output transformer to produce a response to the question based on the combined transformed embedding.
- 13 . The non-transitory computer-readable medium of claim 11 , wherein the one or more image embeddings comprise a set of vector embeddings, each vector embedding in the set of vector embeddings having a first embedding dimension, wherein the set of vector embeddings represents an encoding of token data for tokens identified in the image; wherein the program instructions that, when executed by the at least one processor, cause the computing platform to provide the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question comprise program instructions that, when executed by the at least one processor, cause the computing platform to: reduce the embedding dimension of the set of vector embeddings from the first embedding dimension to a second embedding dimension.
- 14 . The non-transitory computer-readable medium of claim 11 , wherein the program instructions that, when executed by the at least one processor, cause the computing platform to, extract image data from the one or more construction documents, comprise program instructions that, when executed by the at least one processor, cause the computing platform to: divide the extracted image data associated with the one or more construction documents into a plurality of image patches, wherein the plurality of image patches collectively represent the image data associated with the one or more construction documents, and wherein the program instructions that, when executed by the at least one processor, cause the computing platform to provide the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question comprise program instructions that, when executed by the at least one processor, cause the computing platform to: route the plurality of image patches to the one or more image transformers to cause the one or more image transformers to produce a respective image embedding for each of the plurality of image patches; route the respective image embeddings to the one or more first feed forward neural networks to cause the one or more first feed forward neural networks to produce a respective transformed image embedding for each of the respective image embeddings.
- 15 . A method comprising: receiving from a client device associated with a user, a question regarding a construction project; receiving, from the client device associated with the user, one or more construction documents related to the construction project; based on the received question and the one or more construction documents, preparing input data for a generative AI model; providing the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question; and causing the client device to present the produced response to the user.
- 16 . The method of claim 15 , wherein the generative AI model comprises: one or more image transformers configured to produce image embeddings; one or more textual transformers configured to produce text embeddings; one or more first feed forward neural networks configured to produce transformed image embeddings; and one or more second feed forward neural networks configured to produce transformed text embeddings.
- 17 . The method of claim 15 , wherein, based on the received question and the one or more construction documents, preparing input data for a generative AI model comprises: extracting image data associated with the one or more construction documents; extracting textual data from the received question and from the one or more construction documents.
- 18 . The method of claim 17 , wherein providing the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question comprises: routing the extracted image data to the one or more image transformers to cause the one or more image transformers to produce one or more image embeddings; routing the one or more image embeddings to the one or more first feed forward neural networks to cause the one or more first feed forward neural networks to produce transformed image embeddings; routing the extracted textual data to the one or more text transformers to cause the one or more text transformers to produce one or more text embeddings; and routing the one or more text embeddings to the one or more second feed forward neural networks to cause the one or more second feed forward neural networks to produce transformed text embeddings.
- 19 . The method of claim 17 , wherein the generative AI model comprises: a router configured to combine the transformed image embeddings and the transformed text embeddings in accordance with learnable temperature parameters; and an output transformer configured to produce, from the combination of the transformed image embeddings and the transformed text embeddings a response to the question, and wherein providing the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question comprises: determining a set of respective temperature parameters, with each respective temperature parameter corresponding to one of the first and second feed forward neural networks; routing the transformed image embeddings and transformed text embeddings to the router to cause the router to combine the transformed image embeddings and the transformed text embeddings in accordance with the respective temperature parameters into a combined transformed embedding; and routing the combined transformed embedding to the output transformer to cause the output transformer to produce a response to the question based on the combined transformed embedding.
- 20 . The method of claim 17 , wherein the one or more image embeddings comprise a set of vector embeddings, each vector embedding in the set of vector embeddings having a first embedding dimension, wherein the set of vector embeddings represents an encoding of token data for tokens identified in the image, and wherein providing the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question comprises: reducing the embedding dimension of the set of vector embeddings from the first embedding dimension to a second embedding dimension.
Description
BACKGROUND Increasingly, parties involved in construction projects are beginning to use software applications to manage those construction projects. One example of such a software application is the software-as-a-service (SaaS) application for construction management offered by Procore Technologies, Inc. (“Procore”), who is the current applicant. Using construction management software applications such as these, parties can create a digital representation of a given construction project that is to be managed and then create, store, view, and/or interact with various types of digital project data associated with the given construction project. Such digital project data may include specifications, drawings, building information model (BIM) files, requests for information (RFIs), punch lists (e.g., which list work that has not yet been completed or has been completed incorrectly), risk management plans, safety plans, work breakdown structures, change orders, inspection documents (e.g., which record information about the results of inspections), construction submittals (e.g., mock-ups or other documents that contractors create to depict proposed plans), construction site observation reports, project management records (e.g., project schedules and project budgets), third-party records (e.g., applicable zoning restrictions, real-estate title records and purchase records, records of public hearings pertinent to the given construction project), directories, invoices, timesheets, meeting minutes, sensor data, and daily logs (e.g., which record information about each day work is done at a work site of the construction project), among many other examples of project data that may be stored for a construction project. OVERVIEW Disclosed herein is new software technology for using generative artificial intelligence (AI) in order to answer questions about a construction project. At a high level, the disclosed software technology may involve a new generative AI model architecture. This architecture may comprise, among other aspects, pre-processing functionality, transformer functionality for producing image embeddings, transformer functionality for producing text embeddings, dimension reduction functionality for reducing the embedding dimension of the image embedding, normalization functionality for producing normalized image embeddings, feed forward neural network expert functionality for producing transformed imaged embeddings, feed forward neural network expert functionality for producing transformed text embeddings, learnable temperature functionality for determining temperature parameters by which to scale the transformed embeddings, router functionality to combine the transformed embeddings according to the temperature parameters, and output transformer technology for producing a response based on the combined transformed embeddings. In one aspect, the disclosed technology may take the form of a method to be carried out by a computing system that involves (i) receiving from a client device associated with a user, a question regarding a construction project, (ii) receiving, from the client device associated with the user, one or more construction documents related to the construction project, (iii) based on the received question and the one or more construction documents, preparing input data for a generative AI model, (iv) providing the prepared input data to the generative AI model architecture to cause the generative AI model to produce a response to the question, and (v) causing the client device to present the produced response to the user. In yet another aspect, disclosed herein is a computing platform that includes at least one communication interface, at least one processor, at least one non-transitory computer-readable medium, and program instructions stored on the at least one non-transitory computer-readable medium that, when executed by the at least one processor, cause the computing platform to carry out the functions disclosed herein, including (but not limited to) any of the functions of the foregoing method. In yet another aspect, disclosed herein is a non-transitory computer-readable medium provisioned with program instructions that, when executed by at least one processor, cause a computing platform to carry out the functions disclosed herein, including (but not limited to) any of the functions of the foregoing method. One of ordinary skill in the art will appreciate these as well as numerous other aspects in reading the following disclosure. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 depicts an example network environment in which a construction management software application may be implemented, according to the present disclosure. FIG. 2 depicts an illustrative example of a generative AI model architecture, according to the present disclosure. FIG. 3 depicts example functionality of the disclosed software technology in the form of a flow diagram, according to the present disclosure. FIG. 4 depicts e