Prompts
4
last updated 2h ago
Active Releases
2
1 passing, 1 failing
Sessions Today
284
142 evaluated
Customer Support Agent · v12 → v13
Progress
34 / 100
Avg Score
3.8 / 5
Traffic Split
10%
Latest Evaluated Sessions
sess_a8k2
4/5
Empathetic response, resolved billing issue correctly
sess_b3m7
5/5
Excellent — proactively offered discount for inconvenience
sess_c9p1
3/5
Correct answer but missed emotional cues from customer
Onboarding Flow · v7 → v8
Progress
38 / 50
Avg Score
2.4 / 5
Traffic Split
20%
Latest Evaluated Sessions
sess_m3r7
2/5
Customer wrote in Spanish, agent responded entirely in English
sess_n5s9
1/5
Mixed Spanish and English in same sentence, confusing output
sess_p8t2
3/5
Acknowledged French but couldn't sustain it past greeting
Prompts
| Name | Stable | Versions | Status | Last Modified |
|---|---|---|---|---|
| Customer Support Agent | v12 | 14 | In Release | 2 hours ago |
| Code Review Assistant | v8 | 8 | Stable | 3 days ago |
| Onboarding Flow | v5 | 7 | Stable | 1 week ago |
| Data Analysis Agent | v3 | 3 | Stable | 2 weeks ago |
Customer Support Agent
Use any template syntax — we return the raw text for you to template in your code.
You are a helpful customer support agent for {{company_name}}.
Your role is to assist customers with their questions about our products and services. Always be polite, professional, and thorough in your responses.
Key guidelines:
• Greet the customer warmly
• Listen carefully to their issue
• Provide clear, step-by-step solutions
• If you cannot resolve the issue, escalate to a human agent
• Always confirm the customer's issue is resolved before closing
Available tools: {{available_tools}}
Customer context: {{customer_context}}
Versions
v14
v13 candidate
v12 stable
v11
v10
v9
Releases
| Prompt | Status | Candidate | Split | Sessions | Score | Started | Duration |
|---|---|---|---|---|---|---|---|
| Customer Support Agent | Active | v13 | 10% | 34 / 100 | 3.8 | 2 hours ago | 2h |
| Customer Support Agent | Ready | v12 | 15% | 100 / 100 | 4.2 | 3 days ago | 18h |
| Customer Support Agent | Rolled Back | v11 | 10% | 47 / 100 | 2.1 | 5 days ago | 6h |
| Code Review Assistant | Promoted | v10 | 10% | 100 / 100 | 4.0 | 2 weeks ago | 24h |
| Onboarding Flow | Active | v8 | 20% | 38 / 50 | 2.4 | 3 weeks ago | 12h |
Release: v12 → v13
Active Started 2 hours agoProgress
34 / 100
Avg Score
3.8 / 5
Traffic Split
10%
Threshold
< 3.0
Evaluation Plan
Goal: Improve empathy in customer responses while maintaining accuracy
Score 5: Response shows genuine empathy, addresses root cause, offers proactive help
Score 4: Empathetic tone, resolves the issue correctly
Score 3: Correct resolution but mechanical tone
Score 2: Partially correct, misses emotional context
Score 1: Incorrect resolution or dismissive tone
Evaluated Sessions
| Session ID | Started | Completed | Score | Reasoning |
|---|---|---|---|---|
| sess_a8k2... | 14:32 | 14:35 | 4 / 5 | Empathetic response, resolved billing issue correctly |
| sess_b3m7... | 14:28 | 14:31 | 5 / 5 | Excellent — proactively offered discount for inconvenience |
| sess_c9p1... | 14:15 | 14:22 | 3 / 5 | Correct answer but missed emotional cues from customer |
| sess_d2x5... | 14:02 | 14:08 | 4 / 5 | Good empathy, clear step-by-step resolution |
| sess_e7n4... | 13:58 | 14:01 | 2 / 5 | Missed the actual question, responded to wrong issue |
Release: v11 → v12
Ready Started 3 days agoProgress
100 / 100
Avg Score
4.2 / 5
Traffic Split
15%
Threshold
< 3.0
Evaluation Plan
Goal: Improve empathy in customer responses while maintaining accuracy
Score 5: Response shows genuine empathy, addresses root cause, offers proactive help
Score 4: Empathetic tone, resolves the issue correctly
Score 3: Correct resolution but mechanical tone
Score 2: Partially correct, misses emotional context
Score 1: Incorrect resolution or dismissive tone
Evaluated Sessions
| Session ID | Started | Completed | Score | Reasoning |
|---|---|---|---|---|
| sess_f2k8... | 09:12 | 09:18 | 5 / 5 | Excellent empathy, proactive resolution, offered follow-up |
| sess_g4m1... | 09:05 | 09:11 | 4 / 5 | Warm tone, resolved correctly, could have been more proactive |
| sess_h7n3... | 08:52 | 08:58 | 4 / 5 | Good empathy, clear step-by-step, slight delay in resolution |
| sess_j1p6... | 08:41 | 08:47 | 5 / 5 | Acknowledged frustration beautifully, resolved and followed up |
| sess_k9q2... | 08:30 | 08:35 | 3 / 5 | Correct resolution but tone was slightly mechanical |
Release: v7 → v8
Active Started 3 weeks agoProgress
38 / 50
Avg Score
2.4 / 5
Traffic Split
20%
Threshold
< 3.0
Evaluation Plan
Goal: Add multi-language support to greetings
Score 5: Correctly detects language, responds naturally in the customer's language
Score 4: Detects language, responds correctly but slightly awkward phrasing
Score 3: Responds in English but acknowledges the customer's language
Score 2: Ignores language cues, responds only in English
Score 1: Produces garbled mixed-language output
Evaluated Sessions
| Session ID | Started | Completed | Score | Reasoning |
|---|---|---|---|---|
| sess_m3r7... | 11:42 | 11:48 | 2 / 5 | Customer wrote in Spanish, agent responded entirely in English |
| sess_n5s9... | 11:30 | 11:36 | 1 / 5 | Mixed Spanish and English in same sentence, confusing output |
| sess_p8t2... | 11:15 | 11:22 | 3 / 5 | Acknowledged French but couldn't sustain it past greeting |
| sess_q1u4... | 10:58 | 11:05 | 2 / 5 | Ignored German entirely, no acknowledgement of language |
| sess_r6v8... | 10:42 | 10:50 | 4 / 5 | Good Portuguese response, slightly formal but accurate |
Release: v10 → v11
Rolled Back Started 5 days agoSessions
47 / 100
Final Score
2.1 / 5
Traffic Split
10%
Threshold
< 3.0
Auto-rolled back
Average score dropped below threshold (2.1 < 3.0) after 47 sessions. Traffic was returned to stable version v10.
Evaluation Plan
Goal: Shorten response length while maintaining helpfulness
Score 5: Concise, complete, no unnecessary words
Score 4: Mostly concise, resolves correctly
Score 3: Correct but still verbose in places
Score 2: Too terse, misses important context
Score 1: Incomplete or unhelpful due to brevity
Evaluated Sessions
| Session ID | Started | Completed | Score | Reasoning |
|---|---|---|---|---|
| sess_w2a5... | 16:22 | 16:25 | 1 / 5 | One-sentence response that missed half the customer's question |
| sess_x4b8... | 16:10 | 16:14 | 2 / 5 | Too terse, skipped troubleshooting steps entirely |
| sess_y7c1... | 15:55 | 16:02 | 3 / 5 | Decent length but still cut corners on explanation |
| sess_z9d3... | 15:40 | 15:46 | 2 / 5 | Dropped the follow-up question, conversation felt abrupt |
| sess_a1e6... | 15:28 | 15:33 | 2 / 5 | Answered correctly but customer had to ask twice for clarification |
Release: v9 → v10
Promoted Started 2 weeks agoSessions
100 / 100
Final Score
4.0 / 5
Traffic Split
10%
Threshold
< 3.0
Promoted to stable
Version 10 passed evaluation with an average score of 4.0 / 5 across 100 sessions. It is now serving as stable.
Evaluation Plan
Goal: Improve accuracy of billing-related answers
Score 5: Correct billing answer with proactive account-specific context
Score 4: Correct answer, references the right policy
Score 3: Mostly correct but vague on specifics
Score 2: Partially wrong, could mislead the customer
Score 1: Incorrect billing information given
Evaluated Sessions
| Session ID | Started | Completed | Score | Reasoning |
|---|---|---|---|---|
| sess_b3f7... | 08:12 | 08:18 | 4 / 5 | Correct refund policy cited, directed to right form |
| sess_c5g9... | 08:00 | 08:08 | 5 / 5 | Proactively checked account, identified pending charge, resolved |
| sess_d8h2... | 07:45 | 07:52 | 4 / 5 | Accurate plan comparison, recommended upgrade correctly |
| sess_e1j4... | 07:30 | 07:38 | 3 / 5 | Right answer but missed mentioning the grace period |
| sess_f4k6... | 07:15 | 07:22 | 4 / 5 | Clear invoice breakdown, customer confirmed understanding |
New Release
1 Select Version
2 Review Diff
3 Describe Change
4 Evaluation Plan
5 Configure
1. Select Candidate Version
Choose which prompt version to evaluate as the next stable.
Version 13
Version 14
2. Review Changes
Diff between stable (v12) and candidate (v13).
v12 (stable)
v13 (candidate)
You are a helpful customer support agent for {{company_name}}.
− Your role is to assist customers with their questions about
− our products and services. Always be polite and professional.
+ Your role is to assist customers with empathy and care. Listen
+ to the emotional context behind their question, not just the
+ technical issue. Be warm, understanding, and thorough.
Key guidelines:
− • Greet the customer warmly
+ • Greet the customer warmly and acknowledge their frustration
• Listen carefully to their issue
• Provide clear, step-by-step solutions
+ • Express understanding before jumping to solutions
• If you cannot resolve the issue, escalate to a human agent
3. Describe the Change
Help us understand your intent so we can generate a good evaluation plan.
4. Evaluation Plan
Generated from your inputs. Edit below, or change your inputs and regenerate.
This plan will be used by the AI judge to score each completed candidate session.
5. Configure Release
Set the parameters for this rollout.
Score 1–5. Rollback if avg is below this.
Time since last message before session is marked completed.
Sessions
| Session | Status | Prompt | Version | Channel | Messages | Started | Duration |
|---|---|---|---|---|---|---|---|
| sess_b3m7 | idle | Research Assistant | v13 | candidate | 6 | 20:39 | 3m |
| sess_a8k2 | completed | Customer Support Agent | v12 | stable | 4 | 14:15 | 5m |
| sess_c9p1 | completed⚠ has errors | Onboarding Flow | v5 | stable | 8 | 14:02 | 12m |
| sess_d2x5 | running | Code Review Assistant | v10 | stable | 2 | 13:58 | 3m |
| sess_e7n4 | completed | Onboarding Flow | v8 | candidate | 10 | 13:30 | 18m |
| sess_f2k8 | idle | Customer Support Agent | v12 | stable | 4 | 13:15 | 6m |
| sess_g4m1 | completed | Customer Support Agent | v13 | candidate | 6 | 12:50 | 8m |
sess_b3m7
idleResearch Assistant · v13 · candidate · 3 turns · 20:39 – 20:42
Evaluation
Score: 4 / 5
Empathetic response, acknowledged frustration before solving. Proactively offered refund. Could have apologized more explicitly for the delay.
User
hi!Assistant
Hi there! How can I help you today? Whether it's coding, research, file operations, or anything else — I'm ready to assist!User
what files do we have in our dirAssistant
It looks like the uploads directory is currently empty — no files have been uploaded yet.
Would you like to upload a file for me to work with, create a new file from scratch, or work on something else entirely?User
search the web how to design traces, strategize, and summarise to a markdown file in our dirAssistant
Done! I've created a comprehensive markdown file at /mnt/session/outputs/how-to-design-traces.md. It covers 11 sections including core concepts, instrumentation approaches, context propagation, sampling strategies, naming conventions, tooling, common pitfalls, and a strategic getting-started checklist.