Case Study: How Morgan Stanley Achieved 185% Coaching Performance Improvement in 6 Weeks

The Challenge

Morgan Stanley faced a problem that exists at every major wealth management firm: their leadership team was having high-stakes conversations with financial advisors every day, but there was no visibility into the quality of those conversations. Were they asking the right questions? Were they diagnosing real blockers or accepting surface-level answers? Were they closing with specific commitments or vague next steps?

In a contact center, every call is recorded, scored, and analyzed. In wealth management, leadership conversations happen behind closed doors. The gap between what the organization believed was happening and what was actually happening was completely invisible.

The team needed a way to create that visibility without disrupting the relationship-driven culture that makes wealth management work. And they needed to do it fast enough to prove the concept before scaling it across the broader organization.

The Approach

BlueEye Advisory designed a multi-phase AI coaching program that gave each team member a private, low-stakes environment to practice high-stakes conversations. The program spanned 6 weeks and evolved through three distinct scoring phases, each building on the behavioral patterns revealed by the last.

Every conversation was scored against behavioral criteria developed from real advisor interactions. The scenarios were calibrated to mirror actual advisor behaviors: resistance patterns, stall tactics, and the conversational nuances unique to wealth management. When team members pushed back that something felt unrealistic, the scenarios were refined immediately. Realism was non-negotiable.

Design principle: Scenarios were built to feel real, not to feel like a test. If the AI behaved in a way that would get someone hung up on in real life, it got rewritten. Realism drives engagement. Without it, the data means nothing.

The Scoring Model

Every conversation was scored across six behavioral dimensions, weighted by their impact on real-world conversation outcomes:

Dimension	Weight	What It Measures
Opening & Framing	15%	Clean, confident opening that establishes purpose and control
Discovery Depth	25%	Open-ended questions that go beyond surface answers
Active Listening	20%	Paraphrasing and reflection that demonstrates real understanding
Diagnostic Quality	20%	Accurate identification of the real blocker, not the stated one
Resistance Handling	10%	Composure under pressure without premature solutioning
Commitment & Next Steps	10%	Specific, time-bound action items, not "let's circle back"

The Weekly Trajectory

What happened over 6 weeks tells the real story. The team's average score didn't climb in a straight line. It followed the pattern that every organization goes through when they first get honest measurement of something that was previously invisible.

Week 1

Avg 3.4 / 8 calls

Week 2

26.2

31 calls

Week 3

42.0

28 calls

Week 4

55.7

45 calls / Peak

Week 5

50.1

47 calls

Week 6

40.2

12 calls / New module

Week 1 was humbling. The team averaged 3.4 out of 100. Not because they were bad at their jobs. Because they had never been measured against specific behavioral criteria before. The gap between "I think I do this well" and "the data says otherwise" hit hard.

By Week 2, heavy practice kicked in. 31 calls in a single week. Scores jumped to 26.2 as the coaching frameworks started taking hold. By Week 4, the team hit its peak: a 55.7 average across 45 conversations. That's a 16x improvement from where they started.

The Week 6 dip is actually the most telling data point. We introduced a harder module targeting a completely different conversation type. Scores dropped, but the team's response was immediate: they leaned in. That's the behavioral shift. The instinct to practice, not avoid.

The pattern that matters: Week 1 scores are always low. Always. The organizations that transform are the ones where the team treats that score as fuel, not as a reason to disengage. Morgan Stanley's team treated it as fuel.

Individual Transformation

The team average tells one story. The individual trajectories tell the real one. Across 13 team members, the range of improvement was dramatic. Here are four that stand out:

The Breakout

Team Member A

9.2 → 96.7

+951%

22 completed conversations. Started near the bottom, finished at the very top of the team. This is what happens when someone with natural ability finally gets structured feedback on specific behaviors.

The Late Starter

Team Member B

0 → 88.3

From zero

20 conversations. Scored 0 on initial attempts because the approach didn't match the scorecard criteria at all. By the end, consistently scoring in the top tier. Complete behavioral reset.

The Steady Climber

Team Member C

17.5 → 81.1

+363%

19 conversations. Consistent, methodical improvement week over week. No dramatic swings. Just steady gains from deliberate practice. The kind of trajectory that tells you the system is working.

The High-Floor Grinder

Team Member D

30.8 → 78.2

+154%

24 conversations. The highest volume on the team. Started with the highest baseline, which means the habits were already partially formed. The challenge was refinement, not reinvention.

Not every trajectory was a success story. One team member completed only 6 conversations and regressed from 23.3 to 2.7. That data point is just as valuable. It tells leadership exactly where to focus, who needs a different kind of support, and whether the issue is skill, motivation, or something else entirely.

What the Data Revealed

Across 172 scored conversations, six behavioral patterns emerged that no amount of observation or self-reporting would have surfaced:

Gap 1

Weak Openings

Vague or filler-heavy openings that immediately ceded conversational control. The first 30 seconds predicted the quality of the entire conversation.

Gap 2

Yes/No Questions

Closed-ended questions that shut down explanation and created shallow, transactional conversations instead of diagnostic ones.

Gap 3

Surface-Level Discovery

Accepting the first answer without going deeper. The real blocker almost never surfaces in the initial response.

Gap 4

No Paraphrasing

The single biggest gap. Generic acknowledgments ("got it," "makes sense") instead of paraphrasing that signals genuine understanding.

Gap 5

Premature Solutioning

Jumping to recommendations before earning the transition. Stronger conversations build the case before offering the answer.

Gap 6

Vague Next Steps

"Let's reconnect next week" instead of "I'll send the analysis by Thursday and we'll review it together Friday at 2."

Scorecard Evolution

The program didn't stay static. As the team improved, the measurement framework evolved with them. Three distinct scorecard phases pushed the team progressively harder:

Phase	Calls	Avg Score	Focus
Phase 1: Foundation	62	28.8	Baseline coaching conversation skills. Opening, discovery, active listening, next steps.
Phase 2: Refinement	97	50.7	Recalibrated scoring based on Phase 1 data. Higher bar for discovery depth and diagnostic quality.
Phase 3: Advanced	13	55.6	Entirely new conversation type: re-engagement after commitment stalls. Diagnosing inertia, handling skepticism.

The jump from Phase 1 (28.8 avg) to Phase 2 (50.7 avg) represents genuine behavioral improvement. The Phase 3 score of 55.6 is especially telling: when the team faced a completely new, harder scenario type, they still outperformed their Phase 2 average. The skills had become transferable.

Full Team Performance

Across all 13 team members, 172 conversations, and 6 weeks of practice:

Team Member	Calls	First 3 Avg	Last 3 Avg	Gain
Member A	22	9.2	96.7	+951%
Member B	20	0.0	88.3	From 0
Member C	19	17.5	81.1	+363%
Member D	24	30.8	78.2	+154%
Member E	17	32.5	66.8	+106%
Member F	13	46.6	57.2	+23%
Member G	11	6.7	40.8	+509%
Member H	16	0.0	39.2	From 0
Member I	16	9.3	37.9	+308%
Member J	4	11.7	28.3	+142%
Member K	3	43.7	43.7	0%
Member L	6	23.3	2.7	-88%
Team Average	172	18.8	53.6	+185%

Why this table matters: Every row is a coaching decision. The top four need advanced scenarios and stretch assignments. The middle group needs targeted reinforcement. The bottom two need a fundamentally different conversation. Without this data, every person on this team gets the same generic training. With it, every person gets exactly what they need.

What Changed

The transformation wasn't just in the scores. It was in the team's relationship with practice itself. By Week 3, team members were voluntarily completing multiple sessions per week. They were comparing scores, sharing what worked, and asking for harder scenarios. The program created a culture of deliberate practice that didn't exist before.

Leadership's three simplified coaching priorities became:

Get the opening clean. Practice the first 30 seconds until it's second nature. No filler, no rambling, clear purpose.

Go one level deeper. When someone gives you an answer, ask "what's driving that?" before moving on. The real insight is always underneath the first response.

Summarize before moving forward. Paraphrase what you heard before transitioning. This single behavior separates good conversations from great ones.

The Expansion

Based on the results, leadership initiated conversations to bring this approach to additional groups across the organization. The behavioral scoring model and coaching intelligence framework were designed from the start to scale beyond the initial team. The infrastructure, the scorecards, and the measurement methodology are all built to expand.

Why This Matters

This engagement proved something that wealth management has struggled with for decades: you can bring the measurement rigor of a contact center to relationship-driven advisory conversations without sacrificing the human element. The AI doesn't replace the conversation. It makes the invisible visible so that coaching becomes precise, targeted, and measurable.

172 conversations in 6 weeks. A team that went from a 3.4 average to a 55.7 peak. Individual improvements of nearly 1,000%. And most importantly, a leadership team that can now answer the question every organization should be asking: "Are our people actually having the conversations we think they're having?"

The answer was more revealing than anyone expected. And that's the point.

How Morgan Stanley Achieved 185% Coaching Performance Improvement in 6 Weeks