Building good Smolagents

Es besteht ein himmelweiter Unterschied zwischen einem Agenten, der funktioniert, und einem, der nicht funktioniert. Wie können wir Agenten erstellen, die in die letztere Kategorie fallen? In diesem Leitfaden werden wir uns bewährte Verfahren für die Erstellung von Agenten ansehen.

Wenn Sie neu im Bereich der Bauagenten sind, sollten Sie zuerst die Einleitung an Agenten und die geführte Besichtigung von smolagents.

Die besten Agentensysteme sind die einfachsten: vereinfachen Sie den Arbeitsablauf so weit wie möglich #

Die Einbindung eines LLM in Ihren Arbeitsablauf birgt ein gewisses Fehlerrisiko.

Gut programmierte Agentensysteme verfügen ohnehin über eine gute Fehlerprotokollierung und Wiederholungsmechanismen, so dass die LLM-Engine eine Chance hat, ihren Fehler selbst zu korrigieren. Aber um das Risiko von LLM-Fehlern auf ein Maximum zu reduzieren, sollten Sie Ihren Arbeitsablauf vereinfachen!

Schauen wir uns noch einmal das Beispiel aus [intro_agents] an: ein Bot, der Benutzeranfragen für ein Surfreiseunternehmen beantwortet. Anstatt den Agenten jedes Mal, wenn er nach einem neuen Surfspot gefragt wird, zwei verschiedene Aufrufe für die "Reiseentfernungs-API" und die "Wetter-API" machen zu lassen, könnte man einfach ein einheitliches Tool "return_spot_information" erstellen, eine Funktion, die beide APIs auf einmal aufruft und deren verkettete Ausgaben an den Benutzer zurückgibt.

Dadurch werden Kosten, Latenzzeiten und das Fehlerrisiko reduziert!

Die wichtigste Leitlinie ist: Verringern Sie die Anzahl der Anrufe von LLM so weit wie möglich.

Daraus ergeben sich einige Schlussfolgerungen:

Wenn immer möglich, gruppieren Sie 2 Werkzeuge in einem, wie in unserem Beispiel der beiden APIs.
Wann immer möglich, sollte die Logik auf deterministischen Funktionen und nicht auf agentenbasierten Entscheidungen beruhen.

Verbesserung des Informationsflusses zur LLM-Engine #

Denken Sie daran, dass Ihre LLM-Engine wie ein ~intelligenter~ Roboter ist, der in einem Raum eingeschlossen ist und dessen einzige Kommunikation mit der Außenwelt aus Notizen besteht, die unter einer Tür hindurchgeführt werden.

Es weiß nicht, was passiert ist, wenn Sie dies nicht ausdrücklich in der Eingabeaufforderung angeben.

Fangen Sie also damit an, dass Sie Ihre Aufgabe sehr klar formulieren! Da ein Agent von einem LLM angetrieben wird, können geringfügige Abweichungen bei der Formulierung Ihrer Aufgabe zu völlig anderen Ergebnissen führen.

Verbessern Sie dann den Informationsfluss zu Ihrem Agenten bei der Verwendung von Tools.

Besondere Leitlinien zu beachten:

Jedes Werkzeug sollte protokollieren (durch einfache Verwendung von drucken Anweisungen innerhalb des Werkzeugs weiter Methode) alles, was für die LLM-Maschine nützlich sein könnte.
- Insbesondere die detaillierte Protokollierung von Ausführungsfehlern wäre eine große Hilfe!

Hier ist zum Beispiel ein Tool, das Wetterdaten auf der Grundlage von Standort und Datum/Zeit abruft:

Hier ist zunächst eine schlechte Version:

Kopiert

importieren datetime
von smolagents importieren tool

def get_weather_report_at_coordinates(coordinates, date_time):
    # Dummy-Funktion, liefert eine Liste von [Temperatur in °C, Regenrisiko auf einer Skala von 0-1, Wellenhöhe in m]
    return [28.0, 0.35, 0.85]

def get_coordinates_from_location(location):
    # Gibt Dummy-Koordinaten zurück
    return [3.3, -42.0]

@tool
def get_weather_api(location: str, date_time: str) -> str:
    """
    Gibt den Wetterbericht zurück.

    Args:
        location: der Name des Ortes, für den Sie den Wetterbericht wünschen.
        date_time: das Datum und die Uhrzeit, für die Sie den Bericht haben möchten.
    """
    lon, lat = convert_location_to_coordinates(ort)
    date_time = datetime.strptime(date_time)
    return str(get_weather_report_at_coordinates((lon, lat), date_time))

Warum ist das schlecht?

Es gibt keine genaue Angabe des Formats, das für Datum_Zeit
Es gibt keine Angaben darüber, wie der Standort angegeben werden sollte.
Es gibt keinen Protokollierungsmechanismus für explizite Fehlerfälle, z. B. wenn der Ort nicht im richtigen Format vorliegt oder die Datumsangabe nicht richtig formatiert ist.
das Ausgabeformat ist schwer zu verstehen

Wenn der Werkzeugaufruf fehlschlägt, kann die im Speicher protokollierte Fehlerspur dem LLM helfen, das Werkzeug zurückzuentwickeln und die Fehler zu beheben. Aber warum sollte man ihm so viel Arbeit aufbürden?

Ein besserer Weg, dieses Instrument zu entwickeln, wäre der folgende gewesen:

Kopiert

@tool
def get_weather_api(location: str, date_time: str) -> str:
    """
    Gibt den Wetterbericht zurück.

    Args:
        location: der Name des Ortes, für den Sie das Wetter haben möchten. Es sollte ein Ortsname sein, gefolgt von einem Städtenamen und einem Land, z. B. "Anchor Point, Taghazout, Marokko".
        date_time: das Datum und die Uhrzeit, für die Sie den Bericht wünschen, formatiert als '%m/%d/%y %H:%M:%S'.
    """
    lon, lat = convert_location_to_coordinates(location)
    try:
        date_time = datetime.strptime(date_time)
    except Exception as e:
        raise ValueError("Konvertierung von `date_time` in datetime-Format fehlgeschlagen, stellen Sie sicher, dass Sie einen String im Format '%m/%d/%y %H:%M:%S' bereitstellen. Vollständiger Trace:" + str(e))
    temperature_celsius, risk_of_rain, wave_height = get_weather_report_at_coordinates((lon, lat), date_time)
    return f "Wetterbericht für {Ort}, {Datum_Zeit}: Die Temperatur wird {Temperatur_celsius}°C betragen, das Regenrisiko ist {Risiko_des_Regens*100:.0f}%, die Wellenhöhe ist {wave_height}m."

Um Ihr LLM zu entlasten, sollten Sie sich generell die Frage stellen: "Wie einfach wäre es für mich, wenn ich dumm wäre und dieses Tool zum ersten Mal benutzen würde, mit diesem Tool zu programmieren und meine eigenen Fehler zu korrigieren?".

Geben Sie dem Agenten mehr Argumente #

Um Ihrem Agenten neben der einfachen Zeichenkette, die die Aufgabe beschreibt, weitere Objekte zu übergeben, können Sie die zusätzliche_args Argument, um jede Art von Objekt zu übergeben:

Kopiert

von smolagents importieren CodeAgent, HfApiModel

model_id = "meta-llama/Llama-3.3-70B-Instruct"

agent = CodeAgent(tools=[], model=HfApiModel(model_id=model_id), add_base_tools=True)

agent.run(
    "Warum kennt Mike nicht viele Leute in New York?",
    additional_args={"mp3_sound_file_url":'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/recording.mp3'}
)

Sie können zum Beispiel Folgendes verwenden zusätzliche_args Argument, um Bilder oder Zeichenketten zu übergeben, die Ihr Agent nutzen soll.

Wie Sie Ihren Agenten debuggen #

1. Verwenden Sie ein stärkeres LLM #

In einem agentenbasierten Arbeitsablauf sind einige der Fehler tatsächliche Fehler, andere sind der Fehler Ihrer LLM-Engine, die nicht richtig argumentiert. Betrachten Sie zum Beispiel diesen Trace für eine CodeAgent that I asked to create a car picture:

Kopiert

==================================================================================================== New task ====================================================================================================
Make me a cool car picture
──────────────────────────────────────────────────────────────────────────────────────────────────── New step ────────────────────────────────────────────────────────────────────────────────────────────────────
Agent is executing the code below: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
image_generator(prompt="A cool, futuristic sports car with LED headlights, aerodynamic design, and vibrant color, high-res, photorealistic")
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Last output from code snippet: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png
Step 1:

- Time taken: 16.35 seconds
- Input tokens: 1,383
- Output tokens: 77
──────────────────────────────────────────────────────────────────────────────────────────────────── New step ────────────────────────────────────────────────────────────────────────────────────────────────────
Agent is executing the code below: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
final_answer("/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png")
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Print outputs:

Last output from code snippet: ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png
Final answer:
/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png

The user sees, instead of an image being returned, a path being returned to them. It could look like a bug from the system, but actually the agentic system didn’t cause the error: it’s just that the LLM brain did the mistake of not saving the image output into a variable. Thus it cannot access the image again except by leveraging the path that was logged while saving the image, so it returns the path instead of an image.

The first step to debugging your agent is thus “Use a more powerful LLM”. Alternatives like Qwen2/5-72B-Instruct wouldn’t have made that mistake.

2. Provide more guidance / more information #

You can also use less powerful models, provided you guide them more effectively.

Put yourself in the shoes of your model: if you were the model solving the task, would you struggle with the information available to you (from the system prompt + task formulation + tool description) ?

Would you need some added clarifications?

To provide extra information, we do not recommend to change the system prompt right away: the default system prompt has many adjustments that you do not want to mess up except if you understand the prompt very well. Better ways to guide your LLM engine are:

If it ‘s about the task to solve: add all these details to the task. The task could be 100s of pages long.
If it’s about how to use tools: the description attribute of your tools.

3. Change the system prompt (generally not advised) #

If above clarifications above are not sufficient, you can change the system prompt.

Let’s see how it works. For example, let us check the default system prompt for the CodeAgent (below version is shortened by skipping zero-shot examples).

Kopiert

print(agent.system_prompt_template)

Here is what you get:

Kopiert

You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can.
To do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code.
To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences.

At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use.
Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '<end_code>' sequence.
During each intermediate step, you can use 'print()' to save whatever important information you will then need.
These print outputs will then appear in the 'Observation:' field, which will be available as input for the next step.
In the end you have to return a final answer using the `final_answer` tool.

Here are a few examples using notional tools:
---
{examples}

Above example were using notional tools that might not exist for you. On top of performing computations in the Python code snippets that you create, you only have access to these tools:

{{tool_descriptions}}

{{managed_agents_descriptions}}

Here are the rules you should always follow to solve your task:
1. Always provide a 'Thought:' sequence, and a 'Code:\n```py' sequence ending with '```<end_code>' sequence, else you will fail.
2. Use only variables that you have defined!
3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in 'answer = wiki({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = wiki(query="What is the place where James Bond lives?")'.
4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block.
5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
7. Never create any notional variables in our code, as having these in your logs might derail you from the true variables.
8. You can use imports in your code, but only from the following list of modules: {{authorized_imports}}
9. The state persists between code executions: so if in one step you've created variables or imported modules, these will all persist.
10. Don't give up! You're in charge of solving the task, not providing directions to solve it.

Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000.

As you can see, there are placeholders like "{{tool_descriptions}}": these will be used upon agent initialization to insert certain automatically generated descriptions of tools or managed agents.

So while you can overwrite this system prompt template by passing your custom prompt as an argument to the system_prompt parameter, your new system prompt must contain the following placeholders:

"{{tool_descriptions}}" to insert tool descriptions.
"{{managed_agents_description}}" to insert the description for managed agents if there are any.
For CodeAgent only: "{{authorized_imports}}" to insert the list of authorized imports.

Then you can change the system prompt as follows:

Kopiert

from smolagents.prompts import CODE_SYSTEM_PROMPT

modified_system_prompt = CODE_SYSTEM_PROMPT + "\nHere you go!" # Change the system prompt here

agent = CodeAgent(
    tools=[], 
    model=HfApiModel(), 
    system_prompt=modified_system_prompt
)

This also works with the ToolCallingAgent.

4. Extra planning #

We provide a model for a supplementary planning step, that an agent can run regularly in-between normal action steps. In this step, there is no tool call, the LLM is simply asked to update a list of facts it knows and to reflect on what steps it should take next based on those facts.

Kopiert

from smolagents import load_tool, CodeAgent, HfApiModel, DuckDuckGoSearchTool
from dotenv import load_dotenv

load_dotenv()

# Import tool from Hub
image_generation_tool = load_tool("m-ric/text-to-image", trust_remote_code=True)

search_tool = DuckDuckGoSearchTool()

agent = CodeAgent(
    tools=[search_tool],
    model=HfApiModel("Qwen/Qwen2.5-72B-Instruct"),
    planning_interval=3 # This is where you activate planning!
)

# Run it!
result = agent.run(
    "How long would a cheetah at full speed take to run the length of Pont Alexandre III?",
)