R RockAI docs

Build your agent

A/B experiments

A/B experiments let you trial different agent personas against each other on real visitor traffic and pick the winner based on engagement / lead-capture / conversation length. Pitchbar assigns each visitor stickily โ€” the same person always sees the same variant โ€” so the comparison is honest.

Where it lives

Open any agent โ†’ A/B experiments tab in the agent nav (Beaker icon). URL: /app/agents/{id}/experiments. Owners and Admins can create / start / stop experiments; Editors can't.

Creating an experiment

  1. Click New experiment.
  2. Pick a kind:
    • persona โ€” variants override the agent's name + tone in the system prompt for assigned visitors. Use this to test "Aria" vs "Max", "friendly" vs "punchy", etc.
    • cta โ€” variants record assignment but don't yet alter the runtime CTA payload. Recorded for measurement; full runtime application ships in a future release.
    • trigger โ€” same as cta โ€” measurement only.
  3. Add at least 2 variants. Default is control + treatment at 50/50. You can change weights (any positive integers โ€” they're normalised) and names.
  4. For persona kind, each variant's config JSON should hold a persona object:
    { "persona": { "name": "Aria", "tone": "warm and concise" } }
    The widget renders the variant's persona name in the chat header, and the LLM speaks under that name + tone for the assigned conversation.
  5. Save. Status starts at draft โ€” no visitors are assigned yet.
  6. Click Start. Status flips to running. Every subsequent first-turn visitor is bucketed.

How assignment works

On the first message of a conversation, the MessageStreamController calls ExperimentResolver::resolveForConversation:

  1. If conversation.variant_id is already set, use it (sticky).
  2. Otherwise, look up the most recently started running experiment for this agent. Only ONE active experiment per agent โ€” if you start a second one while the first is running, the resolver picks the most-recent. To run a different kind, stop the previous one first.
  3. Hash (visitor_id + experiment_id) into a bucket on the weighted variant list (Assigner). The same visitor returning days later lands in the same variant โ€” the assignment row is durable.
  4. Persist conversation.variant_id. Every future turn for this conversation reads the same variant.
  5. For kind = persona, the variant's config[persona] overrides the agent's default persona in PromptBuilder::build for that turn.

Seeing it in action

The fastest way to confirm the wiring:

  1. Create a persona experiment with two clearly different variants โ€” e.g. { "persona": { "name": "Helpfulbot" } } vs { "persona": { "name": "Snarkbot" } }.
  2. Start the experiment.
  3. Open your widget in two different browsers (or one normal + one incognito โ€” different cookies = different visitor_id).
  4. Ask the same question in each. The chat panel header should read "Helpfulbot" in one and "Snarkbot" in the other, and the answers should sound noticeably different.
  5. Open /admin/conversations and confirm each conversation row has a variant_id stamped.

Measuring results

All persisted: experiment_assignments rows + the variant_id on every conversations row. Join those two tables against messages and leads for any analysis you want โ€” e.g.:

SELECT v.name,
       COUNT(DISTINCT c.id) AS conversations,
       COUNT(DISTINCT l.id) AS leads_captured,
       AVG(c.message_count) AS avg_messages
FROM variants v
LEFT JOIN conversations c ON c.variant_id = v.id
LEFT JOIN leads l ON l.conversation_id = c.id
WHERE v.experiment_id = '...'
GROUP BY v.name;

A built-in stats panel inside /app/agents/{id}/experiments is on the roadmap; for now you'll need to run that query manually (workspace API token + /api/v1/db read access if you're on the self-host build, or ask support).

Stop / delete

Stop sets status to stopped and flushes the running-experiment cache so new conversations immediately stop getting assigned. Existing conversations keep their assigned variant for consistency in mid-flight chats.

Delete hard-deletes the experiment row. Variant rows cascade. Existing conversations.variant_id values become foreign-key orphans โ€” that's intentional; we keep the historical record of which conversation got which variant even after the experiment ends.