Prompts
4
last updated 2h ago
Active Releases
2
1 passing, 1 failing
Sessions Today
284
142 evaluated
Customer Support Agent · v12 → v13
Started 2 hours ago
Active
Progress 34 / 100
Avg Score 3.8 / 5
Traffic Split 10%
Latest Evaluated Sessions
sess_a8k2 4/5 Empathetic response, resolved billing issue correctly
sess_b3m7 5/5 Excellent — proactively offered discount for inconvenience
sess_c9p1 3/5 Correct answer but missed emotional cues from customer
Onboarding Flow · v7 → v8
Started 3 weeks ago
Active
Progress 38 / 50
Avg Score 2.4 / 5
Traffic Split 20%
Latest Evaluated Sessions
sess_m3r7 2/5 Customer wrote in Spanish, agent responded entirely in English
sess_n5s9 1/5 Mixed Spanish and English in same sentence, confusing output
sess_p8t2 3/5 Acknowledged French but couldn't sustain it past greeting
Name Stable Versions Status Last Modified
Customer Support Agent v12 14 In Release 2 hours ago
Code Review Assistant v8 8 Stable 3 days ago
Onboarding Flow v5 7 Stable 1 week ago
Data Analysis Agent v3 3 Stable 2 weeks ago
Use any template syntax — we return the raw text for you to template in your code.
You are a helpful customer support agent for {{company_name}}. Your role is to assist customers with their questions about our products and services. Always be polite, professional, and thorough in your responses. Key guidelines: • Greet the customer warmly • Listen carefully to their issue • Provide clear, step-by-step solutions • If you cannot resolve the issue, escalate to a human agent • Always confirm the customer's issue is resolved before closing Available tools: {{available_tools}} Customer context: {{customer_context}}

Versions

v14
Draft · Dolev
v13 candidate
2 hours ago · Dolev
v12 stable
3 days ago · Dolev
v11
1 week ago · Dolev
v10
1 week ago · Dolev
v9
2 weeks ago · Dolev
Prompt Status Candidate Split Sessions Score Started Duration
Customer Support Agent Active v13 10% 34 / 100 3.8 2 hours ago 2h
Customer Support Agent Ready v12 15% 100 / 100 4.2 3 days ago 18h
Customer Support Agent Rolled Back v11 10% 47 / 100 2.1 5 days ago 6h
Code Review Assistant Promoted v10 10% 100 / 100 4.0 2 weeks ago 24h
Onboarding Flow Active v8 20% 38 / 50 2.4 3 weeks ago 12h

Release: v12 → v13

Active Started 2 hours ago
Progress
34 / 100
Avg Score
3.8 / 5
Traffic Split
10%
Threshold
< 3.0

Evaluation Plan

Goal: Improve empathy in customer responses while maintaining accuracy Score 5: Response shows genuine empathy, addresses root cause, offers proactive help Score 4: Empathetic tone, resolves the issue correctly Score 3: Correct resolution but mechanical tone Score 2: Partially correct, misses emotional context Score 1: Incorrect resolution or dismissive tone

Evaluated Sessions

Session ID Started Completed Score Reasoning
14:32 14:35 4 / 5 Empathetic response, resolved billing issue correctly
14:28 14:31 5 / 5 Excellent — proactively offered discount for inconvenience
14:15 14:22 3 / 5 Correct answer but missed emotional cues from customer
14:02 14:08 4 / 5 Good empathy, clear step-by-step resolution
13:58 14:01 2 / 5 Missed the actual question, responded to wrong issue

Release: v11 → v12

Ready Started 3 days ago
Progress
100 / 100
Avg Score
4.2 / 5
Traffic Split
15%
Threshold
< 3.0

Evaluation Plan

Goal: Improve empathy in customer responses while maintaining accuracy Score 5: Response shows genuine empathy, addresses root cause, offers proactive help Score 4: Empathetic tone, resolves the issue correctly Score 3: Correct resolution but mechanical tone Score 2: Partially correct, misses emotional context Score 1: Incorrect resolution or dismissive tone

Evaluated Sessions

Session IDStartedCompletedScoreReasoning
09:1209:185 / 5Excellent empathy, proactive resolution, offered follow-up
09:0509:114 / 5Warm tone, resolved correctly, could have been more proactive
08:5208:584 / 5Good empathy, clear step-by-step, slight delay in resolution
08:4108:475 / 5Acknowledged frustration beautifully, resolved and followed up
08:3008:353 / 5Correct resolution but tone was slightly mechanical

Release: v7 → v8

Active Started 3 weeks ago
Progress
38 / 50
Avg Score
2.4 / 5
Traffic Split
20%
Threshold
< 3.0

Evaluation Plan

Goal: Add multi-language support to greetings Score 5: Correctly detects language, responds naturally in the customer's language Score 4: Detects language, responds correctly but slightly awkward phrasing Score 3: Responds in English but acknowledges the customer's language Score 2: Ignores language cues, responds only in English Score 1: Produces garbled mixed-language output

Evaluated Sessions

Session IDStartedCompletedScoreReasoning
11:4211:482 / 5Customer wrote in Spanish, agent responded entirely in English
11:3011:361 / 5Mixed Spanish and English in same sentence, confusing output
11:1511:223 / 5Acknowledged French but couldn't sustain it past greeting
10:5811:052 / 5Ignored German entirely, no acknowledgement of language
10:4210:504 / 5Good Portuguese response, slightly formal but accurate

Release: v10 → v11

Rolled Back Started 5 days ago
Sessions
47 / 100
Final Score
2.1 / 5
Traffic Split
10%
Threshold
< 3.0
Auto-rolled back
Average score dropped below threshold (2.1 < 3.0) after 47 sessions. Traffic was returned to stable version v10.

Evaluation Plan

Goal: Shorten response length while maintaining helpfulness Score 5: Concise, complete, no unnecessary words Score 4: Mostly concise, resolves correctly Score 3: Correct but still verbose in places Score 2: Too terse, misses important context Score 1: Incomplete or unhelpful due to brevity

Evaluated Sessions

Session IDStartedCompletedScoreReasoning
16:2216:251 / 5One-sentence response that missed half the customer's question
16:1016:142 / 5Too terse, skipped troubleshooting steps entirely
15:5516:023 / 5Decent length but still cut corners on explanation
15:4015:462 / 5Dropped the follow-up question, conversation felt abrupt
15:2815:332 / 5Answered correctly but customer had to ask twice for clarification

Release: v9 → v10

Promoted Started 2 weeks ago
Sessions
100 / 100
Final Score
4.0 / 5
Traffic Split
10%
Threshold
< 3.0
Promoted to stable
Version 10 passed evaluation with an average score of 4.0 / 5 across 100 sessions. It is now serving as stable.

Evaluation Plan

Goal: Improve accuracy of billing-related answers Score 5: Correct billing answer with proactive account-specific context Score 4: Correct answer, references the right policy Score 3: Mostly correct but vague on specifics Score 2: Partially wrong, could mislead the customer Score 1: Incorrect billing information given

Evaluated Sessions

Session IDStartedCompletedScoreReasoning
08:1208:184 / 5Correct refund policy cited, directed to right form
08:0008:085 / 5Proactively checked account, identified pending charge, resolved
07:4507:524 / 5Accurate plan comparison, recommended upgrade correctly
07:3007:383 / 5Right answer but missed mentioning the grace period
07:1507:224 / 5Clear invoice breakdown, customer confirmed understanding
1 Select Version
2 Review Diff
3 Describe Change
4 Evaluation Plan
5 Configure

1. Select Candidate Version

Choose which prompt version to evaluate as the next stable.

Version 13
2 hours ago · Dolev
Selected
Version 14
Just now · Dolev

2. Review Changes

Diff between stable (v12) and candidate (v13).

v12 (stable) v13 (candidate)
  You are a helpful customer support agent for {{company_name}}.
 
− Your role is to assist customers with their questions about
− our products and services. Always be polite and professional.
+ Your role is to assist customers with empathy and care. Listen
+ to the emotional context behind their question, not just the
+ technical issue. Be warm, understanding, and thorough.
 
  Key guidelines:
− • Greet the customer warmly
+ • Greet the customer warmly and acknowledge their frustration
  • Listen carefully to their issue
  • Provide clear, step-by-step solutions
+ • Express understanding before jumping to solutions
  • If you cannot resolve the issue, escalate to a human agent

3. Describe the Change

Help us understand your intent so we can generate a good evaluation plan.

4. Evaluation Plan

Generated from your inputs. Edit below, or change your inputs and regenerate.

This plan will be used by the AI judge to score each completed candidate session.

5. Configure Release

Set the parameters for this rollout.

Score 1–5. Rollback if avg is below this.
Time since last message before session is marked completed.
Session Status Prompt Version Channel Messages Started Duration
sess_b3m7 idle Research Assistant v13 candidate 6 20:39 3m
sess_a8k2 completed Customer Support Agent v12 stable 4 14:15 5m
sess_c9p1 completed⚠ has errors Onboarding Flow v5 stable 8 14:02 12m
sess_d2x5 running Code Review Assistant v10 stable 2 13:58 3m
sess_e7n4 completed Onboarding Flow v8 candidate 10 13:30 18m
sess_f2k8 idle Customer Support Agent v12 stable 4 13:15 6m
sess_g4m1 completed Customer Support Agent v13 candidate 6 12:50 8m

sess_b3m7

idle
Research Assistant · v13 · candidate · 3 turns · 20:39 – 20:42
Evaluation
Score: 4 / 5
Empathetic response, acknowledged frustration before solving. Proactively offered refund. Could have apologized more explicitly for the delay.
Release: v12 → v13
User
hi!
Assistant
Hi there! How can I help you today? Whether it's coding, research, file operations, or anything else — I'm ready to assist!
User
what files do we have in our dir
Assistant
It looks like the uploads directory is currently empty — no files have been uploaded yet. Would you like to upload a file for me to work with, create a new file from scratch, or work on something else entirely?
User
search the web how to design traces, strategize, and summarise to a markdown file in our dir
Assistant
Done! I've created a comprehensive markdown file at /mnt/session/outputs/how-to-design-traces.md. It covers 11 sections including core concepts, instrumentation approaches, context propagation, sampling strategies, naming conventions, tooling, common pitfalls, and a strategic getting-started checklist.
×