Provider-Agnostic Agents: Why Adapters Alone Aren't Enough

Dr. Florian DrechslerFebruary 19, 20269 min read
AI AgentsLLMArchitectureMCP

An agent that reliably worked with an OpenAI frontier model yesterday fails on Claude today. Not because the code is broken, but because OpenAI expects tool calls as a tools array, Anthropic uses tool_use content blocks, and Google relies on function_declarations. Three providers, three incompatible schemas for the same concept.

Coupling your agent to a single provider means accepting their pricing changes, rate limits, and model deprecations as immutable laws of nature. LLM agnosticism isn't a nice-to-have. It's a survival question.

TL;DR: Adapter patterns (LiteLLM, LangChain) solve the syntactic problem, making API calls look the same. MCP solves the tool problem, making tools work with any model. But neither solves the semantic problem: models behave differently even with identical inputs. True portability requires all three layers.

A quick note on terminology: Provider = API platform (OpenAI, Anthropic, Google). Model = specific model ID. Framework = orchestration layer that abstracts both.


The Adapter Pattern: Where Everyone Starts

The solution that all major frameworks converged on independently is the adapter pattern.

How the Big Three Do It

  • LiteLLM implements it most directly: a single completion() call accepts a model string and transparently routes to over 100 providers.
  • LangChain's BaseChatModel inverts the dependency. All provider integrations implement the same interface contract with BaseMessage objects and a unified bind_tools() method.
  • AutoGen v0.4 formalizes the boundary through a protocol-based design where agents are defined by their roles and communication protocols, not the underlying model.

Sounds like a solved problem. The reality is considerably more complex.


Where API Divergence Hits in Practice

The real engineering challenge lives in the details of bidirectional schema translation.

Three Providers, Three Schemas

A simplified example makes the scope visible:

# Pseudocode - simplified schema, real APIs differ in details

# OpenAI: tools array with function wrapper
tools = [{"type": "function", "function": {"name": "get_weather", "parameters": {...}}}]

# Anthropic: tool_use content blocks
tools = [{"name": "get_weather", "input_schema": {...}}]

# Google: function_declarations (format varies by API version/SDK)
tools = [{"function_declarations": [{"name": "get_weather", "parameters": {...}}]}]

# LiteLLM: accepts OpenAI-like inputs and normalizes internally
response = litellm.completion(
    model="anthropic/claude-sonnet-4-20250514",
    messages=messages,
    tools=tools  # OpenAI format - LiteLLM handles conversion
)

LiteLLM operates as an adapter at the HTTP/SDK boundary: outgoing requests are translated into the respective provider schema, incoming responses are converted back into a canonical ModelResponse object. Full parity across all providers and features isn't guaranteed, but the common cases are covered.

Response Normalization

The differences go deeper than request formats:

  • Stop reasons: Anthropic returns stop_reason: "end_turn". LiteLLM rewrites it into a normalized finish_reason.
  • Response structure: Gemini's deeply nested candidates[0].content.parts structure gets flattened before agent code sees it.
  • Consistent access: Your code always reads choices[0].message.content and choices[0].message.tool_calls, regardless of the provider behind it.

Tool Result Passing

Passing tool results back also diverges fundamentally:

ProviderHow Tool Results Are Passed
OpenAIDedicated tool role message
Anthropicuser messages with tool_result content blocks
GooglefunctionCall parts
No native supportLiteLLM serializes schemas into system prompt, parses freetext via regex

That last row is telling. It works, but it shows the limits of syntactic normalization.

Framework-Level Output Parsing

At the framework level, LangChain addresses response heterogeneity through a layered BaseOutputParser hierarchy:

  • .with_structured_output() automatically dispatches to OpenAI function calling, Anthropic tool_use blocks, or Google's response_schema, based on each provider's capabilities.
  • An OutputFixingParser implements self-healing retry loops for malformed JSON.

This reflects a hard truth: you can't assume model output will be consistent.

Key insight: Schema translation and semantic equivalence are two different problems. Parallel tool-call handling, tool_choice enforcement, streaming finish reasons, all of this remains provider-specific, even after syntactic normalization.


MCP: The Second Standardization Axis

While LiteLLM and LangChain abstract the model layer, a second standardization vector emerged in parallel, this time not for the model, but for the tools.

The N×M Problem

Before MCP, connecting N agent frameworks with M tools required N×M custom integrations. Every framework and every provider used incompatible tool schema formats.

Anthropic's Model Context Protocol (MCP), released in late 2024, reduces this to N+M by defining a single JSON-RPC 2.0-based protocol through which tool servers expose their capabilities and agent clients consume them.

How It Works

Agent Host → MCP tools/list → Tool Schemas → Model Call → Tool Call → Tool Result → next step

Think of it like the Language Server Protocol (LSP): just as LSP standardized communication between IDEs and language servers, MCP creates a universal interface between agents and tools. USB-C for AI: standardized discovery, unified invocations, normalized error handling.

Why It Matters Architecturally

MCP operates one layer below the LLM provider abstraction:

  • An MCP server doesn't know and doesn't care which LLM calls its tools. Only the host application needs provider-specific integration code.
  • Swap LLM providers without rewriting tool servers.
  • Add new tools without touching agent code.

MCP's model agnosticism is a protocol property, not a library configuration. Tool schemas live on MCP servers and are dynamically discovered at runtime via standardized tools/list calls. The sampling primitive (sampling/createMessage) takes this further: MCP servers can request LLM completions through the host without embedding a model SDK.

The Two-Layer Model

Together with LiteLLM, a clean separation emerges:

LayerWhat It NormalizesExample
Framework layerLLM API (requests, responses, auth)LiteLLM, LangChain adapters
Tool layerTool discovery, invocation, resultsMCP servers

Both sources of provider lock-in are eliminated simultaneously.

Note: MCP standardizes the interface, not the model's behavior. How reliably a model follows tool schemas, how many parallel tool calls it can plan, how it handles tool errors, these are model quality properties that no protocol can structurally solve. MCP is a necessary but not sufficient condition for provider portability.

The rapid integration by OpenAI, Google DeepMind, and Microsoft within a year of release confirms: MCP addressed a real, widely felt coordination problem.


How Frameworks Combine Both Layers

MCP and model abstraction are building blocks. The framework landscape shows three architectural approaches that share the same core.

The Common Principle

All major frameworks implement dependency inversion: high-level agent logic depends on abstractions, not on concrete LLM implementations:

  • LangChain uses BaseChatModel
  • AutoGen uses ChatCompletionClient
  • smolagents uses a unified Model interface

In each case, switching providers only requires changing the model instantiation. Anthropic's Engineering Guide explicitly recommends keeping the orchestration layer decoupled from the underlying model.

Where the Approaches Differ

FrameworkArchitectureProvider AbstractionWhere Does the Abstraction Break?
LangChain / LangGraphGraph-based orchestrationBaseChatModel + broad adapter libraryStructured output, streaming behavior
AutoGen v0.4Event-driven actor modelChatCompletionClient protocoltool_choice enforcement, parallel calls
smolagentsMinimal core (~1,000 lines)Unified Model interfaceProvider-specific grounding features

The Rise of Multi-Model Graphs

The most significant architectural evolution is the shift from single-provider agent loops to heterogeneous multi-model graphs.

LangGraph encodes agent workflows as stateful directed graphs where different nodes can use different LLM backends, a fast, cheap model for routing and classification, a more capable one for generation.

AutoGen goes further: a planner agent on one frontier model can delegate tasks to specialized execution and critic agents on other providers, all within a single multi-agent workflow.

At the other end of the spectrum sits Amazon Bedrock Agents, where the infrastructure itself enforces provider neutrality instead of requiring developers to implement adapter patterns.

For a practical look at how these multi-model graphs are deployed in production pipelines with specialized agents, quality gates, and workspace isolation, see How Agent-Based Development Workflows Work.


Interface Compatibility Is Not Portability

Frameworks solve the interface problem elegantly. But interface compatibility and true portability are two different things.

The same prompt and the same agent graph produce different emergent behaviors depending on the model behind it.

Three Dimensions of Behavioral Variance

1. Instruction-Following Fidelity

Models differ in how precisely they follow agent loop instructions like ReAct prompting or structured output requests.

2. Tool-Call Reliability

Parallel tool-call behavior, tool_choice enforcement, and streaming finish reasons vary per provider, even after syntactic normalization.

3. Prompt Portability

A system prompt optimized for one model behaves differently with another provider. Prompt templates require re-engineering per model family. This isn't a one-time configuration effort. It's an ongoing cost.

The Non-Linear Drop-Off

Studies like AgentBench show that performance gaps between frontier and mid-tier models aren't gradual degradations but sometimes drastic, non-linear drops.

Anthropic's Engineering Guide states it directly: models below a reliability threshold break entire agent graphs instead of degrading gracefully.


What This Means for Your Architecture

A dedicated model evaluation layer alongside the abstraction layer isn't optional overhead. It belongs to the architecture.

Four Practices That Matter

  1. Maintain registries of known model/task pairings for reliable production deployments
  2. Configure fallback chains that activate when a model fails a runtime capability check
  3. Implement observability hooks to detect behavioral drift before it cascades into downstream failures
  4. Write per-provider behavioral tests covering tool schema adherence, parallel planning, and error recovery

The Fundamental Tradeoff

Every provider-agnostic framework confronts you with the same tension: universal abstraction versus access to model-specific capabilities. Extended thinking, enforced JSON schemas, grounding capabilities, all of this requires provider-specific code paths that break through the abstraction.

Checklist: Provider-Agnostic Agent Architecture

Anyone who takes portability seriously needs more than an adapter pattern:

  • Canonical message model + canonical tool-call model: a unified internal representation that doesn't directly depend on any provider
  • Provider capability matrix: which provider supports streaming, parallel tool calls, schema enforcement, grounding?
  • Contract tests for tool adherence: automated tests that verify per provider whether tool schemas are correctly followed
  • Golden traces: reference runs (prompt + tools + expected calls) per provider that serve as regression tests
  • Runtime fallback policy: defined escalation when a model fails runtime capability checks
  • Observability: tool-call error taxonomy + drift detection to catch behavioral shifts between provider versions

The Bottom Line

Start with the abstraction as default. Drop down to provider-specific code when you need the features, but explicitly document every such escape.

Portability as the norm, provider optimization as a deliberate exception.

The question that should guide your architecture decision isn't "Which provider is best?" but "Which parts of my agent are allowed to depend on the provider, and which aren't?"

Share this article

Related Articles