CN-122027675-A - Method and gateway for calling client tool
Abstract
A method for calling a client tool and a gateway are carried out by the gateway, and the method comprises the steps of receiving a query statement to be input into a server large model and a connection channel identifier of the first connection from the client, sending the query statement and the connection channel identifier to the server through a second connection, transmitting data based on a stateless protocol, receiving a tool calling request and the connection channel identifier from the server through the second connection, wherein the tool calling request comprises a request identifier and a tool name of a tool provided by the client, acquiring the first connection based on the connection channel identifier, sending the tool calling request to the client through the first connection, receiving a tool calling result from the client through the first connection, wherein the tool calling result comprises the request identifier, and sending the tool calling result to the server through the second connection.
Inventors
- ZHOU WEI
- ZHANG TIANSHUO
- XIA WEIYI
- LI HONGJIAN
Assignees
- 支付宝(杭州)数字服务技术有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260209
Claims (11)
- 1. A method of invoking a client tool performed by a gateway, the gateway establishing a first connection with a client, the first connection comprising a bi-directional long connection, the method comprising: Receiving a query statement to be input into a server large model and a connection channel identifier of the first connection from the client through the first connection; transmitting the query statement and the connection channel identifier to the server through a second connection, wherein the second connection is used for transmitting data based on a stateless protocol; Receiving a tool call request and the connection channel identifier from a server through the second connection, wherein the tool call request comprises a request identifier and a tool name of a tool provided by the client; acquiring the first connection based on the connection channel identifier; sending the tool call request to the client through the first connection; Receiving a tool call result from the client through the first connection, wherein the tool call result comprises the request identifier; and sending the tool calling result to the server through the second connection.
- 2. The method of claim 1, the first connection comprising a web socket WebSocket connection, the WebSocket connection transmitting data based on a WebSocket protocol, the second connection comprising a hypertext transfer protocol, HTTP, connection transmitting data based on an HTTP protocol.
- 3. The method of claim 2, wherein the gateway comprises an external gateway and an internal gateway, wherein the client establishes a WebSocket connection with the external gateway, The method for receiving the tool call request and the connection channel identifier from the server through the second connection comprises the steps that the internal gateway receives the tool call request and the connection channel identifier from the server through the HTTP connection; the method further comprises the steps that the internal gateway sends the tool call request and the connection channel identifier to the external gateway through a Remote Procedure Call (RPC); The step of obtaining the first connection based on the connection channel identifier, and sending the tool call request to the client through the first connection includes the step of obtaining the WebSocket connection based on the connection channel identifier by the external gateway, and sending the tool call request to the client through the WebSocket connection.
- 4. The method of claim 2 or 3, the gateway comprising an external gateway and a plurality of internal gateways, the client establishing a WebSocket connection with the external gateway, The receiving, by the first internal gateway of the plurality of internal gateways, the tool call request and the connection channel identifier from the server through the second connection includes receiving, by the first internal gateway of the plurality of internal gateways, the tool call request and the connection channel identifier from the server through the second connection; The first internal gateway sends the tool call request, the connection channel identifier and the connection information of the first internal gateway to the external gateway; The step of receiving the tool calling result from the client through the first connection comprises the step of receiving the tool calling result from the client through the WebSocket connection by the external gateway, and sending the tool calling result to the first internal gateway based on the connection information.
- 5. The method of claim 2 or 3, the gateway comprising an external gateway and a plurality of internal gateways, the client establishing a WebSocket connection with the external gateway, The receiving, by the first internal gateway of the plurality of internal gateways, a tool call request and the connection channel identifier from a server through the second connection, wherein a request identifier in the tool call request is generated based on connection information of the first internal gateway; the first internal gateway sends the tool call request and the connection channel identification to the external gateway; The step of receiving the tool calling result from the client through the first connection comprises the step of receiving the tool calling result from the client through the WebSocket connection by the external gateway, acquiring the connection information from the request identifier in the tool calling result, and sending the tool calling result to the first internal gateway based on the connection information.
- 6. The method of claim 2 or 3, the gateway comprising a plurality of external gateways and a plurality of internal gateways, the client establishing a WebSocket connection with a first external gateway of the plurality of external gateways, The receiving, by the first internal gateway of the plurality of internal gateways, a tool call request and the connection channel identifier from a server through the second connection, wherein a request identifier in the tool call request is generated based on connection information of the first internal gateway; The first internal gateway determines the first external gateway based on the connection channel identification, and sends the tool call request and the connection channel identification to the first external gateway; The step of receiving the tool calling result from the client through the first connection comprises the step of receiving the tool calling result from the client through the WebSocket connection, obtaining the connection information from the request identifier in the tool calling result, and sending the tool calling result to the first internal gateway based on the connection information.
- 7. The method of claim 2, further comprising: receiving authentication information from the client over an HTTP connection; transmitting a connection establishment token corresponding to the client under the condition that authentication based on the authentication information passes; Establishing the WebSocket connection with the client based on the connection establishment token; And after the WebSocket connection is established, the connection channel identifier is sent to the client through the WebSocket connection.
- 8. The method of claim 2, wherein the sending the tool call request to the client over the first connection based on the connection channel identifier includes searching for a WebSocket connection handle corresponding to the connection channel identifier based on the connection channel identifier, and sending the tool call request to the client based on the WebSocket connection handle.
- 9. The method of claim 1, further comprising storing the request identification in association with a context of the tool call request after receiving a tool call request from a server, the sending the tool call result to the server over a second connection comprising obtaining the context based on the request identification in the tool call result, and sending the tool call result to the server over a second connection according to the context.
- 10. The method of claim 1, the tool comprising a file reading tool, the tool call result comprising a file read from a device in which the client resides by executing the tool.
- 11. A gateway comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-10.
Description
Method and gateway for calling client tool Technical Field The embodiment of the specification belongs to the technical field of network connection, and particularly relates to a method for calling a client tool and a gateway. Background In a scenario where a user dialogues with an artificial intelligence Agent (AI Agent) at his client, as shown in fig. 1, for data security, the server of the Agent is located in an intranet of the Agent server (i.e., an intranet private to the enterprise), the client is located in an extranet (public internet), and the client cannot directly access the server. The junction between the internal and external networks is called an isolation zone (DMZ zone), and a gateway is arranged in the DMZ zone, and the gateway comprises a connection address (IP address) in the public network. The client and gateway may be interconnected based on their IP addresses. Meanwhile, the gateway and the Agent server can be connected with each other through an intranet. With the development of large model (LLM) technology, agents may need to call external tools to accomplish complex tasks. For example, an Agent is an intelligent development assistant (CodeAgent), which may read/write a code file of a client by calling a file system tool local to the client based on a model context protocol (Model Context Protocol, MCP) to modify the code file. Wherein, the MCP provides a unified, machine-readable description manner for all external services (e.g., tools, databases, etc.), i.e., provides an interaction specification of agents in LLM applications with external services. The MCP architecture comprises a MCP host, a MCP client and a MCP server. The MCP host, typically referred to as an AI application, is the initiator of the interaction. The MCP client is located in the MCP host and used for discovering services provided by the MCP server and transferring information between the LLM and the MCP server. The MCP protocol includes identity authentication and access authorization capabilities based on open authorization (Open Authorization, OAuth). OAuth is an open security protocol for providing authorization standard for applications to access user resources, and is characterized in that grant of resource access rights is realized through a token mechanism, and sensitive information such as account passwords is avoided. In order to enable an Agent to call a tool local to a client, in a related art, as shown in fig. 2, an HTTP connection is established between the client and a gateway, and an HTTP connection is established between the gateway and an Agent server. The HTTP connection between the client and the server includes, for example, a connection based on hypertext transfer protocol 1.0 (HTTP 1.0) or HTTP1.1, or the like. In HTTP communication, a client typically sends a request to a server, which responds to the client's request. The request sent by the client may include a GET method for requesting a specified resource and a POST method for submitting data to the server. Under the network connection, when the client side performs a session with the Agent, the client side transmits a Query statement (Query) to the gateway, and the gateway transmits the Query statement to the Agent server side. And under the condition that the Agent judges that the tool call needs to be carried out, the Agent server sends tool call information back to the client through the gateway. The client can call the corresponding tool according to the tool call information, and send a new request to the Agent server through the gateway, wherein the new request comprises the previous query statement and the tool call result obtained by calling the tool. The Agent can then output a final result based on the query statement and the tool call result and return the final result to the client. Under the HTTP-based network connection, each time an Agent needs to obtain more information from a client, the Agent relies on the client to actively carry the full amount of context in Query (Query and tool call result in previous request), resulting in redundancy of data transmission. Due to the characteristic of the HTTP protocol that a request is initiated by a client, an Agent cannot actively initiate a tool call. In addition, under the network connection, in order to timely acquire the tool call requirement of the Agent, the client needs to periodically poll the server for whether the tool call instruction exists, and the polling mechanism usually has delay and cannot meet the requirement of real-time interaction. In another related art, as shown in fig. 3, a WebSocket connection is established between a client and a gateway, and a WebSocket connection is established between a gateway and a server. WebSocket supports bi-directional data flow between the server and the client, i.e., the server can actively send messages to the client. However, although this network connection method solves the problem that the server cannot actively ini