Prompts

4

last updated 2h ago

Active Releases

2

1 passing, 1 failing

Sessions Today

284

142 evaluated

Customer Support Agent · v12 → v13

Started 2 hours ago

Active

Progress 34 / 100

Avg Score 3.8 / 5

Traffic Split 10%

Latest Evaluated Sessions

sess_a8k2 4/5 Empathetic response, resolved billing issue correctly

sess_b3m7 5/5 Excellent — proactively offered discount for inconvenience

sess_c9p1 3/5 Correct answer but missed emotional cues from customer

Onboarding Flow · v7 → v8

Started 3 weeks ago

Active

Progress 38 / 50

Avg Score 2.4 / 5

Traffic Split 20%

Latest Evaluated Sessions

sess_m3r7 2/5 Customer wrote in Spanish, agent responded entirely in English

sess_n5s9 1/5 Mixed Spanish and English in same sentence, confusing output

sess_p8t2 3/5 Acknowledged French but couldn't sustain it past greeting

Name	Stable	Versions	Status	Last Modified
Customer Support Agent	v12	14	In Release	2 hours ago
Code Review Assistant	v8	8	Stable	3 days ago
Onboarding Flow	v5	7	Stable	1 week ago
Data Analysis Agent	v3	3	Stable	2 weeks ago

Use any template syntax — we return the raw text for you to template in your code.

You are a helpful customer support agent for {{company_name}}. Your role is to assist customers with their questions about our products and services. Always be polite, professional, and thorough in your responses. Key guidelines: • Greet the customer warmly • Listen carefully to their issue • Provide clear, step-by-step solutions • If you cannot resolve the issue, escalate to a human agent • Always confirm the customer's issue is resolved before closing Available tools: {{available_tools}} Customer context: {{customer_context}}

Versions

v14

Draft · Dolev

v13 candidate

2 hours ago · Dolev

v12 stable

3 days ago · Dolev

v11

1 week ago · Dolev

v10

1 week ago · Dolev

v9

2 weeks ago · Dolev

Prompt	Status	Candidate	Split	Sessions	Score	Started	Duration
Customer Support Agent	Active	v13	10%	34 / 100	3.8	2 hours ago	2h
Customer Support Agent	Ready	v12	15%	100 / 100	4.2	3 days ago	18h
Customer Support Agent	Rolled Back	v11	10%	47 / 100	2.1	5 days ago	6h
Code Review Assistant	Promoted	v10	10%	100 / 100	4.0	2 weeks ago	24h
Onboarding Flow	Active	v8	20%	38 / 50	2.4	3 weeks ago	12h

Release: v12 → v13

Active Started 2 hours ago

Progress

34 / 100

Avg Score

3.8 / 5

Traffic Split

10%

Threshold

< 3.0

Evaluation Plan

Goal: Improve empathy in customer responses while maintaining accuracy Score 5: Response shows genuine empathy, addresses root cause, offers proactive help Score 4: Empathetic tone, resolves the issue correctly Score 3: Correct resolution but mechanical tone Score 2: Partially correct, misses emotional context Score 1: Incorrect resolution or dismissive tone

Evaluated Sessions

Session ID	Started	Completed	Score	Reasoning
sess_a8k2...	14:32	14:35	4 / 5	Empathetic response, resolved billing issue correctly
sess_b3m7...	14:28	14:31	5 / 5	Excellent — proactively offered discount for inconvenience
sess_c9p1...	14:15	14:22	3 / 5	Correct answer but missed emotional cues from customer
sess_d2x5...	14:02	14:08	4 / 5	Good empathy, clear step-by-step resolution
sess_e7n4...	13:58	14:01	2 / 5	Missed the actual question, responded to wrong issue

Release: v11 → v12

Ready Started 3 days ago

Progress

100 / 100

Avg Score

4.2 / 5

Traffic Split

15%

Threshold

< 3.0

Evaluation Plan

Goal: Improve empathy in customer responses while maintaining accuracy Score 5: Response shows genuine empathy, addresses root cause, offers proactive help Score 4: Empathetic tone, resolves the issue correctly Score 3: Correct resolution but mechanical tone Score 2: Partially correct, misses emotional context Score 1: Incorrect resolution or dismissive tone

Evaluated Sessions

Session ID	Started	Completed	Score	Reasoning
sess_f2k8...	09:12	09:18	5 / 5	Excellent empathy, proactive resolution, offered follow-up
sess_g4m1...	09:05	09:11	4 / 5	Warm tone, resolved correctly, could have been more proactive
sess_h7n3...	08:52	08:58	4 / 5	Good empathy, clear step-by-step, slight delay in resolution
sess_j1p6...	08:41	08:47	5 / 5	Acknowledged frustration beautifully, resolved and followed up
sess_k9q2...	08:30	08:35	3 / 5	Correct resolution but tone was slightly mechanical

Release: v7 → v8

Active Started 3 weeks ago

Progress

38 / 50

Avg Score

2.4 / 5

Traffic Split

20%

Threshold

< 3.0

Evaluation Plan

Goal: Add multi-language support to greetings Score 5: Correctly detects language, responds naturally in the customer's language Score 4: Detects language, responds correctly but slightly awkward phrasing Score 3: Responds in English but acknowledges the customer's language Score 2: Ignores language cues, responds only in English Score 1: Produces garbled mixed-language output

Evaluated Sessions

Session ID	Started	Completed	Score	Reasoning
sess_m3r7...	11:42	11:48	2 / 5	Customer wrote in Spanish, agent responded entirely in English
sess_n5s9...	11:30	11:36	1 / 5	Mixed Spanish and English in same sentence, confusing output
sess_p8t2...	11:15	11:22	3 / 5	Acknowledged French but couldn't sustain it past greeting
sess_q1u4...	10:58	11:05	2 / 5	Ignored German entirely, no acknowledgement of language
sess_r6v8...	10:42	10:50	4 / 5	Good Portuguese response, slightly formal but accurate

Release: v10 → v11

Rolled Back Started 5 days ago

Sessions

47 / 100

Final Score

2.1 / 5

Traffic Split

10%

Threshold

< 3.0

Auto-rolled back

Average score dropped below threshold (2.1 < 3.0) after 47 sessions. Traffic was returned to stable version v10.

Evaluation Plan

Goal: Shorten response length while maintaining helpfulness Score 5: Concise, complete, no unnecessary words Score 4: Mostly concise, resolves correctly Score 3: Correct but still verbose in places Score 2: Too terse, misses important context Score 1: Incomplete or unhelpful due to brevity

Evaluated Sessions

Session ID	Started	Completed	Score	Reasoning
sess_w2a5...	16:22	16:25	1 / 5	One-sentence response that missed half the customer's question
sess_x4b8...	16:10	16:14	2 / 5	Too terse, skipped troubleshooting steps entirely
sess_y7c1...	15:55	16:02	3 / 5	Decent length but still cut corners on explanation
sess_z9d3...	15:40	15:46	2 / 5	Dropped the follow-up question, conversation felt abrupt
sess_a1e6...	15:28	15:33	2 / 5	Answered correctly but customer had to ask twice for clarification

Release: v9 → v10

Promoted Started 2 weeks ago

Sessions

100 / 100

Final Score

4.0 / 5

Traffic Split

10%

Threshold

< 3.0

Promoted to stable

Version 10 passed evaluation with an average score of 4.0 / 5 across 100 sessions. It is now serving as stable.

Evaluation Plan

Goal: Improve accuracy of billing-related answers Score 5: Correct billing answer with proactive account-specific context Score 4: Correct answer, references the right policy Score 3: Mostly correct but vague on specifics Score 2: Partially wrong, could mislead the customer Score 1: Incorrect billing information given

Evaluated Sessions

Session ID	Started	Completed	Score	Reasoning
sess_b3f7...	08:12	08:18	4 / 5	Correct refund policy cited, directed to right form
sess_c5g9...	08:00	08:08	5 / 5	Proactively checked account, identified pending charge, resolved
sess_d8h2...	07:45	07:52	4 / 5	Accurate plan comparison, recommended upgrade correctly
sess_e1j4...	07:30	07:38	3 / 5	Right answer but missed mentioning the grace period
sess_f4k6...	07:15	07:22	4 / 5	Clear invoice breakdown, customer confirmed understanding

1 Select Version

2 Review Diff

3 Describe Change

4 Evaluation Plan

5 Configure

1. Select Candidate Version

Choose which prompt version to evaluate as the next stable.

Version 13

2 hours ago · Dolev

Selected

Version 14

Just now · Dolev

2. Review Changes

Diff between stable (v12) and candidate (v13).

          v12 (stable)
          v13 (candidate)
        
  You are a helpful customer support agent for {{company_name}}.
 
− Your role is to assist customers with their questions about
− our products and services. Always be polite and professional.
+ Your role is to assist customers with empathy and care. Listen
+ to the emotional context behind their question, not just the
+ technical issue. Be warm, understanding, and thorough.
 
  Key guidelines:
− • Greet the customer warmly
+ • Greet the customer warmly and acknowledge their frustration
  • Listen carefully to their issue
  • Provide clear, step-by-step solutions
+ • Express understanding before jumping to solutions
  • If you cannot resolve the issue, escalate to a human agent

3. Describe the Change

Help us understand your intent so we can generate a good evaluation plan.

What is the goal of this change?

What does "good" look like?

What does "bad" look like?

4. Evaluation Plan

Generated from your inputs. Edit below, or change your inputs and regenerate.

This plan will be used by the AI judge to score each completed candidate session.

5. Configure Release

Set the parameters for this rollout.

Traffic Split

Min. Sessions

Rollback Threshold

Score 1–5. Rollback if avg is below this.

Session TTL

Time since last message before session is marked completed.

Session	Status	Prompt	Version	Channel	Messages	Started	Duration
sess_b3m7	idle	Research Assistant	v13	candidate	6	20:39	3m
sess_a8k2	completed	Customer Support Agent	v12	stable	4	14:15	5m
sess_c9p1	completed⚠ has errors	Onboarding Flow	v5	stable	8	14:02	12m
sess_d2x5	running	Code Review Assistant	v10	stable	2	13:58	3m
sess_e7n4	completed	Onboarding Flow	v8	candidate	10	13:30	18m
sess_f2k8	idle	Customer Support Agent	v12	stable	4	13:15	6m
sess_g4m1	completed	Customer Support Agent	v13	candidate	6	12:50	8m

sess_b3m7

idle

Research Assistant · v13 · candidate · 3 turns · 20:39 – 20:42

Evaluation

Score: 4 / 5

Empathetic response, acknowledged frustration before solving. Proactively offered refund. Could have apologized more explicitly for the delay.

Release: v12 → v13

User

hi!

Assistant

Hi there! How can I help you today? Whether it's coding, research, file operations, or anything else — I'm ready to assist!

User

what files do we have in our dir

Assistant

It looks like the uploads directory is currently empty — no files have been uploaded yet. Would you like to upload a file for me to work with, create a new file from scratch, or work on something else entirely?

User

search the web how to design traces, strategize, and summarise to a markdown file in our dir

Assistant

Done! I've created a comprehensive markdown file at /mnt/session/outputs/how-to-design-traces.md. It covers 11 sections including core concepts, instrumentation approaches, context propagation, sampling strategies, naming conventions, tooling, common pitfalls, and a strategic getting-started checklist.

User

hi!

0:00

Thinking

Thinking...

0:04 · 3.8s · 69 out

Assistant

Hi there! How can I help you today? Whether it's coding, research, file operations, or anything else...

0:04

Idle

Session idle

29s

User

what files do we have in our dir

0:34

Thinking

Thinking...

0:39 · 4.8s · 88 out

Tool

bash

0:39 · { command: "ls /mnt/session/uploads" }

Result

(no output)

0:43

Assistant

It looks like the uploads directory is currently empty — no files have been uploaded yet...

0:46 · 3.1s · 68 out

Idle

Session idle

15s

User

search the web how to design traces, strategize, and summarise to a markdown file in our dir

1:02

Thinking

Thinking...

1:09 · 6.9s · 292 out

Tool

web_search

1:09 · { query: "how to design traces observability best practices" }

Tool

web_search

1:09 · { query: "trace design strategy distributed systems 2025" }

Tool

web_search

1:09 · { query: "OpenTelemetry trace design patterns guide" }

Result

10 search results each · 3 queries

1:12

Thinking

Thinking...

1:16 · 4.0s · 95 out

Tool

web_search

1:16 · { query: "trace design strategy sampling naming conventions best practices" }

Result

10 search results

1:17

Thinking

Thinking...

3:01 · 1m 44s · 4,705 out

Assistant

Now let me compile all this research into a comprehensive markdown file:

3:01

Tool

write

3:01 · { file_path: "/mnt/session/outputs/how-to-design-traces.md" }

Result

File created: /mnt/session/outputs/how-to-design-traces.md

3:02

Assistant

Done! I've created a comprehensive markdown file at /mnt/session/outputs/how-to-design-traces.md...

3:13 · 11.1s · 473 out

Idle

Session idle

end_turn

Prompts

Customer Support Agent

Versions

Releases

Release: v12 → v13

Evaluation Plan

Evaluated Sessions

Release: v11 → v12

Evaluation Plan

Evaluated Sessions

Release: v7 → v8

Evaluation Plan

Evaluated Sessions

Release: v10 → v11

Evaluation Plan

Evaluated Sessions

Release: v9 → v10

Evaluation Plan

Evaluated Sessions

New Release

1. Select Candidate Version

2. Review Changes

3. Describe the Change

4. Evaluation Plan

5. Configure Release

Sessions

sess_b3m7