Overview - Orizon QA

Agent testing in Orizon QA lets you evaluate AI agents built with any major framework. Upload your agent code or describe your agent using a template, and Orizon QA automatically detects the framework, generates test cases across four categories, runs them, and delivers a scored report with actionable recommendations. Use this feature before deploying a new agent, after updating a system prompt or model, or as part of a regular safety audit.

Supported frameworks

Orizon QA detects and generates tests for the following frameworks out of the box:

Framework	Detection method	Test generation	Export format
LangChain	Import patterns (`from langchain`, `@langchain/core`), tool decorators, chain constructors	Tools, chains, agents, memory	LangSmith datasets
CrewAI	Import patterns (`from crewai`), `@agent`, `@task`, `@crew` decorators, `Crew()` constructor	Crew execution, agent roles, task outputs	Promptfoo red team
AutoGen	Import patterns (`from autogen`), `ConversableAgent`, `UserProxyAgent`, `AssistantAgent`, `GroupChat`	Conversation flows, group chat orchestration, code execution safety	AutoGenBench
Google ADK	Import patterns (`from google.adk`, `from google.generativeai`), `AgentDefinition`, `genai.Agent`	Tool invocations, multi-turn context, orchestration trajectories	Vertex AI eval
Claude SDK	Import patterns (`from anthropic`, `@anthropic-ai/sdk`), `Anthropic()` constructor, Claude model strings	Tool calls, hook behavior, rules compliance	Self-evaluation
Solace Mesh	`solace-agent-mesh`, `SolaceAgentMesh`, `AgentMesh`, `solace.messaging` patterns	Agent registration, A2A messaging, event handlers	Event flow tests

How it works

Upload or describe your agent

Provide your agent to Orizon QA by uploading code files or filling out a template that describes your agent’s purpose and tools. See Upload or Describe for details.

Auto-detect framework

Orizon QA scans your code for framework-specific imports, decorators, and constructors. Detection runs automatically — you can also manually specify the framework if needed.

Configure tests

Choose which test categories to run (functional, safety, performance, robustness), how many times to run each test (1x–10x), and which evaluation model to use (Claude Haiku, Sonnet, or Opus).

Run tests

Orizon QA generates test cases for your specific agent — including tool invocations, adversarial prompts, edge cases, and performance benchmarks — and executes them against your agent.

Review results

Get a scored report with pass/fail breakdowns per category, specific failure details, and recommendations. Export results in your framework’s native format for further analysis.

When to use agent testing

Before production deploy

Catch safety issues and functional regressions before your agent reaches real users. Run functional and safety tests as a mandatory gate before shipping.

Safety audits

Run a comprehensive safety evaluation — including adversarial prompts, jailbreak attempts, PII leakage, and bias checks — to document your agent’s safety posture for stakeholders or compliance requirements.

Regression checks

After changing your system prompt, switching models, or modifying tools, re-run the same test suite to verify behavior hasn’t degraded. Test history lets you compare scores across runs.

Explore this section

Upload or Describe

Learn how to provide your agent to Orizon QA — by uploading source code or using a describe template.

Test Categories

Understand what each of the four test categories covers and how to choose the right mix for your needs.

Results & Exports

Read your test report, interpret category scores, and export results for your framework.

​Supported frameworks

​How it works

​When to use agent testing