Skip to main content
Agent testing in Orizon QA lets you evaluate AI agents built with any major framework. Upload your agent code or describe your agent using a template, and Orizon QA automatically detects the framework, generates test cases across four categories, runs them, and delivers a scored report with actionable recommendations. Use this feature before deploying a new agent, after updating a system prompt or model, or as part of a regular safety audit.

Supported frameworks

Orizon QA detects and generates tests for the following frameworks out of the box:
FrameworkDetection methodTest generationExport format
LangChainImport patterns (from langchain, @langchain/core), tool decorators, chain constructorsTools, chains, agents, memoryLangSmith datasets
CrewAIImport patterns (from crewai), @agent, @task, @crew decorators, Crew() constructorCrew execution, agent roles, task outputsPromptfoo red team
AutoGenImport patterns (from autogen), ConversableAgent, UserProxyAgent, AssistantAgent, GroupChatConversation flows, group chat orchestration, code execution safetyAutoGenBench
Google ADKImport patterns (from google.adk, from google.generativeai), AgentDefinition, genai.AgentTool invocations, multi-turn context, orchestration trajectoriesVertex AI eval
Claude SDKImport patterns (from anthropic, @anthropic-ai/sdk), Anthropic() constructor, Claude model stringsTool calls, hook behavior, rules complianceSelf-evaluation
Solace Meshsolace-agent-mesh, SolaceAgentMesh, AgentMesh, solace.messaging patternsAgent registration, A2A messaging, event handlersEvent flow tests

How it works

1

Upload or describe your agent

Provide your agent to Orizon QA by uploading code files or filling out a template that describes your agent’s purpose and tools. See Upload or Describe for details.
2

Auto-detect framework

Orizon QA scans your code for framework-specific imports, decorators, and constructors. Detection runs automatically — you can also manually specify the framework if needed.
3

Configure tests

Choose which test categories to run (functional, safety, performance, robustness), how many times to run each test (1x–10x), and which evaluation model to use (Claude Haiku, Sonnet, or Opus).
4

Run tests

Orizon QA generates test cases for your specific agent — including tool invocations, adversarial prompts, edge cases, and performance benchmarks — and executes them against your agent.
5

Review results

Get a scored report with pass/fail breakdowns per category, specific failure details, and recommendations. Export results in your framework’s native format for further analysis.

When to use agent testing

Before production deploy

Catch safety issues and functional regressions before your agent reaches real users. Run functional and safety tests as a mandatory gate before shipping.

Safety audits

Run a comprehensive safety evaluation — including adversarial prompts, jailbreak attempts, PII leakage, and bias checks — to document your agent’s safety posture for stakeholders or compliance requirements.

Regression checks

After changing your system prompt, switching models, or modifying tools, re-run the same test suite to verify behavior hasn’t degraded. Test history lets you compare scores across runs.

Explore this section

Upload or Describe

Learn how to provide your agent to Orizon QA — by uploading source code or using a describe template.

Test Categories

Understand what each of the four test categories covers and how to choose the right mix for your needs.

Results & Exports

Read your test report, interpret category scores, and export results for your framework.