Interpretability at Runtime: Enterprise AI That Thinks Out Loud
You wouldn't enjoy working with a colleague who just does things and never explains why. Someone who spends an hour making dozens of edits to your data, changing queries, restructuring reports, updating views, and when you ask why, just shrugs. You'd stop trusting them inside a week.
So why do we accept that from AI agents?
Agent interpretability
Anthropic and others have done serious work on interpretability at the model level: understanding why a neural network produces a particular output, which internal circuits activate, what features drive behavior. That work matters for AI safety in ways that are hard to overstate.
We're talking about something adjacent. Interpretability at the level of agent actions.
Not what's happening inside the weights. What's happening inside your workflow. When your AI agent joins two tables, applies a fiscal year filter, and excludes adjustment entries: why those choices, in that order, on that data? The analyst doesn't need to understand attention mechanisms. They need to know why the agent ran that specific query against the data warehouse, right now, in this context.
Same idea. Different layer. And in production enterprise AI, it's the layer that determines whether you can ship.
Action logs aren't enough
Good teammates don't just hand you a stack of finished work. They tell you why they made the choices they made.
Most agent frameworks give you a detailed record of what happened. Tool call history. Input and output for every step. A complete trace you can scroll through after the fact.
This is necessary. It's not sufficient.
An action log tells you what changed. It doesn't tell you whether the agent understood what it was doing or stumbled into a plausible-looking answer. Those two things produce identical logs.
We've seen this play out with our own enterprise users. The teams that became comfortable delegating real work to the agent were the ones who could see its reasoning. Showing the thought didn't just make the output auditable. It shortened the time it took for users to trust the agent enough to let it run.
When an agent explains itself, failures become debuggable. But just as importantly, competence becomes visible during the runs that go right, and that's what actually builds trust over time. Users don't develop confidence in systems they can't read.
The obvious fixes don't work
Two approaches look promising. Neither holds up.
Model reasoning traces (Claude's extended thinking, OpenAI's reasoning tokens) capture some of the model's internal deliberation. When they're available, they're genuinely useful. The problem: they're controlled by the model provider, not you. They exist for some models, in some configurations, when enabled. You can't build enterprise accountability on a feature you don't own and can't guarantee will be there.
Post-hoc follow-ups, like asking the agent "why did you do X?" after the fact, are worse. You're adding a round-trip to every action, and you're asking the model to reconstruct reasoning after its context has already moved on. That's not capturing a decision. That's generating a plausible story about one. The model isn't remembering. It's improvising.
What you actually need is reasoning captured in-stream, at the moment of decision, built into the tool contract.
The "thought" parameter
Good teammates narrate as they work. They say "I'm restructuring this because the current join is going to break when we add the new dimension." The thought parameter is the mechanism that enforces that habit.
Every tool in our agent has a thought parameter. It's the first argument. That ordering is intentional.
By placing thought first, the model must commit to a rationale before it specifies the action. It can't fill in view_name and sql and then rationalize afterward. The schema forces justification before execution. The contract requires the why before the what.
def update_view(
thought: str,
view_name: str,
sql: str,
) -> ToolResult:
"""
thought: Your thought process for using this tool. It will be displayed
in the chat to the user. Talk in first person and present
reasoning as to why you are using this tool.
"""
...
This isn't a logging hook. It's a structural requirement. The agent cannot call update_view without first articulating why it's updating a view. That thought is stored in the audit log and surfaced directly to the user alongside the action, not buried in a debug panel, not available on request. Right there, inline, every time.
Here's the concept in action. Watch an agent walk through a real pipeline analysis, narrating each decision as it makes it:
What the user actually sees
Here's what it actually looks like from a real analysis:
I'm normalizing the week 10 close_date from TIMESTAMP to DATE so week 8 and week 10 use the same date type for quarter comparisons.
I'm assigning each opportunity to the standard waterfall movement categories (closed, pushed, new, progressed, etc.) using the agreed open/closed and quarter rules, so we can aggregate to count/ACV waterfalls and also list underlying deals per category.
I'm creating the drill-down table for 'pushed out of qtr' so you can see which opportunities remained open but slipped beyond the quarter end.
Not summaries. Not placeholders. Real reasoning, from a real analysis, verbatim.
That's the difference between a coworker who Slacks you "done" and one who says "I did X because Y. Let me know if you'd approach it differently."
The reasoning is the product
If you're building for enterprise, a right answer with no explanation is still a problem. Every decision needs to be explainable to someone who didn't make it but has to stand behind it.
When an edge case surfaces, and eventually one will, the agent that thinks out loud isn't just more auditable. It's the kind of collaborator you'd actually want on your team.
An agent that can show its work is one worth trusting with yours.
About the Author
Di Wu
Co-founder & CTO
Principal Engineer at Snowflake, Distinguished Engineer at Rubrik, CTO at BetterWorks, and Engineer at Palantir.
LinkedIn