文字轉 SQL 實例

在本教學中，我們將瞭解如何利用 SQL 來實作代理程式。 smolagents.

讓我們從黃金問題開始：為何不保持簡單，使用標準的文字轉 SQL 管道？

標準的 text-to-sql pipeline 很脆弱，因為產生的 SQL 查詢可能不正確。更糟的是，查詢可能不正確，但不會產生錯誤，反而會提供一些不正確/無用的輸出，但不會發出警報。

👉 相反地，代理系統能夠嚴謹地檢查輸出，並決定是否需要變更查詢，從而大幅提升效能。

讓我們建立這個代理！💪

首先，我們設定 SQL 環境：

複製

從 sqlalchemy import (
    create_engine、
    MetaData、
    表、
    欄位、
    字串
    整數
    浮點數、
    插入、
    檢查
    文字、
)

engine = create_engine("sqlite:///:memory:")
metadata_obj = MetaData()

# 建立城市 SQL 資料表
table_name = 「收據」
receipts = Table(
    table_name、
    metadata_obj、
    Column("receipt_id", Integer, primary_key=True)、
    Column("customer_name", String(16), primary_key=True)、
    Column("price", Float)、
    Column("tip", Float)、
)
metadata_obj.create_all(engine)

rows = [
    {"receipt_id"：1, "customer_name"："Alan Payne", "price"：12.06, "tip"：1.20},
    {"receipt_id"：2, "customer_name"："Alex Mason", "price"：23.86, "tip"：0.24},
    {"receipt_id"：3, "customer_name"："Woodrow Wilson", "price"：53.43, "tip"：5.43},
    {"receipt_id"：4, "customer_name"："Margaret James", "price"：21.11, "tip"：1.00},
]
for row in rows：
    stmt = insert(receipts).values(**row)
    with engine.begin() as connection：
        cursor = connection.execute(stmt)

建立我們的代理 #

現在讓我們利用工具來檢索我們的 SQL 資料表。

工具的描述屬性將被代理系統嵌入 LLM 的提示中：它提供 LLM 關於如何使用工具的資訊。這就是我們要描述 SQL 表的地方。

複製

inspector = inspect(engine)
columns_info = [(col["name"], col["type"]) for col in inspector.get_columns("receipts")]

table_description = "Columns:\n" + "\n".join([f" - {name}: {col_type}" for name, col_type in columns_info])
print(table_description)

複製

欄位：
  - receipt_id：INTEGER
  - customer_name: VARCHAR(16)
  - 價格：FLOAT
  - 小費：FLOAT

現在讓我們建立我們的工具。它需要以下條件：(讀取工具文件如需詳細資訊)

帶有 Args： 列出論點的部分。
輸入和輸出的類型提示。

複製

從 smolagents 匯入工具

@tool
def sql_engine(query: str) -> str：
    """
    允許您對資料表執行 SQL 查詢。返回結果的字串表示。
    資料表名為 'receipts'。其描述如下：
        欄位：
        - receipt_id：INTEGER
        - customer_name: VARCHAR(16)
        - 價格：FLOAT
        - 小費：FLOAT

    副檔名：
        query：要執行的查詢。這應該是正確的 SQL。
    """
    output = ""
    with engine.connect() as con：
        rows = con.execute(text(query))
        for row in rows：
            output += "\n" + str(row)
    返回輸出

現在，讓我們建立一個利用此工具的代理程式。

我們使用 CodeAgent，這是 smolagents 的主要代理程式類別：一個以程式碼寫入動作的代理程式，可以根據 ReAct 架構迭代之前的輸出。

模型是為代理系統提供動力的 LLM。HfApiModel 允許您使用 HF 的 Inference API 來呼叫 LLM，可以透過 Serverless 或 Dedicated endpoint，但您也可以使用任何專屬 API。

複製

from smolagents import CodeAgent, HfApiModel

agent = CodeAgent(
    tools=[sql_engine]、
    model=HfApiModel("meta-llama/Meta-Llama-3.1-8B-Instruct")、
)
agent.run("Can you give me the name of the client who got the most expensive receipt?")

第 2 級：表連結 #

現在讓我們來增加挑戰性！我們希望代理程式能處理跨多個資料表的連接。

因此，讓我們製作第二個表，記錄每個 receipt_id 的服務生姓名！

複製

table_name = 「服務生」
收據 = Table(
    table_name、
    metadata_obj、
    Column("receipt_id", Integer, primary_key=True)、
    Column("waiter_name", String(16), primary_key=True)、
)
metadata_obj.create_all(engine)

rows = [
    {"receipt_id"：1, "waiter_name"："Corey Johnson"}、
    {"receipt_id"：2, "waiter_name"："Michael Watts"}、
    {"receipt_id"：3, "waiter_name"："Michael Watts"}、
    {"receipt_id"：4, "waiter_name"："Margaret James"}、
]
for row in rows：
    stmt = insert(receipts).values(**row)
    with engine.begin() as connection：
        cursor = connection.execute(stmt)

由於我們變更了資料表，因此更新 SQLExecutorTool 與此表的描述，讓 LLM 能適當地利用此表的資訊。

複製

updated_description = """"可讓您對資料表執行 SQL 查詢。請注意，此工具的輸出是執行輸出的字串表示。
它可以使用下列資料表："""

inspector = inspect(engine)
for table in ["receipts", "waiters"]：
    columns_info = [(col["name"], col["type"]) for col in inspector.get_columns(table)].

    table_description = f "Table '{table}':\n"

    table_description += "Columns:\n" + "\n".join([f" - {name}: {col_type}" for name, col_type in columns_info])
    updated_description += "\n\n" + table_description

print(updated_description)

由於這個請求比前一個難一些，我們將切換 LLM 引擎，使用功能更強大的 Qwen/Qwen2.5-Coder-32B-Instruct!

複製

sql_engine.description = updated_description

agent = CodeAgent(
    tools=[sql_engine]、
    model=HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct")、
)

agent.run("Which waiter got more total money from tips?")

它直接運作！設定出奇的簡單，不是嗎？

這個範例就完成了！我們已經觸及這些概念：