
Under the hood, functions are injected into the system message in a syntax the model has been trained onWhen a tool call is selected by the LLM, it is the responsibility of the client code or framework to actually call the function and send the result back to the LLM. An example tool calling workflow might look like
System: You are a helpful assistant. You have access to the function: check_weather_in_city(name: str) -> dict
User: What’s the weather in San Francisco?
Assistant: __tool_call(check_weather_in_city, {"name": "San Francisco"})
At this point, your code would call the function, and add the result to the chain, and send the whole conversation back to the LLM
Tool: {"temperature_farenheit": "72", "weather": "sunny"}
The LLM can then produce a response based on the results of the tool call:
Assistant: “The weather in San Francisco is 72 degrees and sunny!”
User: “Thanks!”Learn more:
- Function Calling - OpenAI Docs
- Louis Dupont’s excellent Transforming Software Interactions with Tool Calling and LLMs
- Leverage OpenAI Tool Calling: Building a Reliable AI Agent from Scratch
- How Does Tool Calling Work in Langchain
- Tool Calling in Crew AI
- The Berkeley Function Calling Leaderboard
The Challenge
The most useful functions we can give to an LLM are also the most risky. We can all imagine the value of an AI Database Administrator that constantly tunes and refactors our SQL database, but most teams wouldn’t give an LLM access to run arbitrary SQL statements against a production database (heck, we mostly don’t even let humans do that).Even with state-of-the-art agentic reasoning and prompt routing, LLMs are not
sufficiently reliable to be given access to high-stakes functions without
human oversight
Function Stakes

Low Stakes
- Read Access to public data (e.g. search wikipedia, access public APIs and DataSets)
- Communicate with agent author (e.g. an engineer might empower an agent to send them a private Slack message with updates on progress)
Medium Stakes
- Read Access to Private Data (e.g. read emails, access calendars, query a CRM)
- Communicate with strict rules (e.g. sending based on a specific sequence of hard-coded email templates)
High Stakes
- Communicate on my Behalf or on behalf of my Company (e.g. send emails, post to slack, publish social/blog content)
- Write Access to Private Data (e.g. update CRM records, modify feature toggles, update billing information)
The Solution
The high stakes functions are the ones that are the most valuable and promise the most impact in automating away human workflows. But they are also the ones where “90% accuracy” is not acceptable. Reliability is further impacted by today’s LLMs’ tendency to hallucinate or craft low-quality text that is clearly AI generated. HumanLayer provides a set of tools to deterministically guarantee human oversight of high stakes function calls. Even if the LLM makes a mistake or hallucinates, HumanLayer is baked into the tool/function itself, guaranteeing a human in the loop.