Case Study: How Morgan Stanley Achieved 185% Coaching Performance Improvement in 6 Weeks | BlueEye Advisory
Case Study

How Morgan Stanley Achieved 185% Coaching Performance Improvement in 6 Weeks

172 scored conversations. 13 team members. From invisible gaps to measurable transformation.

172
Scored Conversations
+185%
Average Score Improvement
6
Weeks to Transformation
96.7
Top Performer Score

The Challenge

Morgan Stanley faced a problem that exists at every major wealth management firm: their leadership team was having high-stakes conversations with financial advisors every day, but there was no visibility into the quality of those conversations. Were they asking the right questions? Were they diagnosing real blockers or accepting surface-level answers? Were they closing with specific commitments or vague next steps?

In a contact center, every call is recorded, scored, and analyzed. In wealth management, leadership conversations happen behind closed doors. The gap between what the organization believed was happening and what was actually happening was completely invisible.

The team needed a way to create that visibility without disrupting the relationship-driven culture that makes wealth management work. And they needed to do it fast enough to prove the concept before scaling it across the broader organization.

The Approach

BlueEye Advisory designed a multi-phase AI coaching program that gave each team member a private, low-stakes environment to practice high-stakes conversations. The program spanned 6 weeks and evolved through three distinct scoring phases, each building on the behavioral patterns revealed by the last.

Every conversation was scored against behavioral criteria developed from real advisor interactions. The scenarios were calibrated to mirror actual advisor behaviors: resistance patterns, stall tactics, and the conversational nuances unique to wealth management. When team members pushed back that something felt unrealistic, the scenarios were refined immediately. Realism was non-negotiable.

Design principle: Scenarios were built to feel real, not to feel like a test. If the AI behaved in a way that would get someone hung up on in real life, it got rewritten. Realism drives engagement. Without it, the data means nothing.

The Scoring Model

Every conversation was scored across six behavioral dimensions, weighted by their impact on real-world conversation outcomes:

DimensionWeightWhat It Measures
Opening & Framing15%Clean, confident opening that establishes purpose and control
Discovery Depth25%Open-ended questions that go beyond surface answers
Active Listening20%Paraphrasing and reflection that demonstrates real understanding
Diagnostic Quality20%Accurate identification of the real blocker, not the stated one
Resistance Handling10%Composure under pressure without premature solutioning
Commitment & Next Steps10%Specific, time-bound action items, not "let's circle back"

The Weekly Trajectory

What happened over 6 weeks tells the real story. The team's average score didn't climb in a straight line. It followed the pattern that every organization goes through when they first get honest measurement of something that was previously invisible.

Week 1
Avg 3.4 / 8 calls
Week 2
26.2
31 calls
Week 3
42.0
28 calls
Week 4
55.7
45 calls / Peak
Week 5
50.1
47 calls
Week 6
40.2
12 calls / New module

Week 1 was humbling. The team averaged 3.4 out of 100. Not because they were bad at their jobs. Because they had never been measured against specific behavioral criteria before. The gap between "I think I do this well" and "the data says otherwise" hit hard.

By Week 2, heavy practice kicked in. 31 calls in a single week. Scores jumped to 26.2 as the coaching frameworks started taking hold. By Week 4, the team hit its peak: a 55.7 average across 45 conversations. That's a 16x improvement from where they started.

The Week 6 dip is actually the most telling data point. We introduced a harder module targeting a completely different conversation type. Scores dropped, but the team's response was immediate: they leaned in. That's the behavioral shift. The instinct to practice, not avoid.

The pattern that matters: Week 1 scores are always low. Always. The organizations that transform are the ones where the team treats that score as fuel, not as a reason to disengage. Morgan Stanley's team treated it as fuel.

Individual Transformation

The team average tells one story. The individual trajectories tell the real one. Across 13 team members, the range of improvement was dramatic. Here are four that stand out:

The Breakout

Team Member A

9.2 96.7
+951%

22 completed conversations. Started near the bottom, finished at the very top of the team. This is what happens when someone with natural ability finally gets structured feedback on specific behaviors.

The Late Starter

Team Member B

0 88.3
From zero

20 conversations. Scored 0 on initial attempts because the approach didn't match the scorecard criteria at all. By the end, consistently scoring in the top tier. Complete behavioral reset.

The Steady Climber

Team Member C

17.5 81.1
+363%

19 conversations. Consistent, methodical improvement week over week. No dramatic swings. Just steady gains from deliberate practice. The kind of trajectory that tells you the system is working.

The High-Floor Grinder

Team Member D

30.8 78.2
+154%

24 conversations. The highest volume on the team. Started with the highest baseline, which means the habits were already partially formed. The challenge was refinement, not reinvention.

Not every trajectory was a success story. One team member completed only 6 conversations and regressed from 23.3 to 2.7. That data point is just as valuable. It tells leadership exactly where to focus, who needs a different kind of support, and whether the issue is skill, motivation, or something else entirely.

What the Data Revealed

Across 172 scored conversations, six behavioral patterns emerged that no amount of observation or self-reporting would have surfaced:

Gap 1

Weak Openings

Vague or filler-heavy openings that immediately ceded conversational control. The first 30 seconds predicted the quality of the entire conversation.

Gap 2

Yes/No Questions

Closed-ended questions that shut down explanation and created shallow, transactional conversations instead of diagnostic ones.

Gap 3

Surface-Level Discovery

Accepting the first answer without going deeper. The real blocker almost never surfaces in the initial response.

Gap 4

No Paraphrasing

The single biggest gap. Generic acknowledgments ("got it," "makes sense") instead of paraphrasing that signals genuine understanding.

Gap 5

Premature Solutioning

Jumping to recommendations before earning the transition. Stronger conversations build the case before offering the answer.

Gap 6

Vague Next Steps

"Let's reconnect next week" instead of "I'll send the analysis by Thursday and we'll review it together Friday at 2."

Scorecard Evolution

The program didn't stay static. As the team improved, the measurement framework evolved with them. Three distinct scorecard phases pushed the team progressively harder:

PhaseCallsAvg ScoreFocus
Phase 1: Foundation 62 28.8 Baseline coaching conversation skills. Opening, discovery, active listening, next steps.
Phase 2: Refinement 97 50.7 Recalibrated scoring based on Phase 1 data. Higher bar for discovery depth and diagnostic quality.
Phase 3: Advanced 13 55.6 Entirely new conversation type: re-engagement after commitment stalls. Diagnosing inertia, handling skepticism.

The jump from Phase 1 (28.8 avg) to Phase 2 (50.7 avg) represents genuine behavioral improvement. The Phase 3 score of 55.6 is especially telling: when the team faced a completely new, harder scenario type, they still outperformed their Phase 2 average. The skills had become transferable.

Full Team Performance

Across all 13 team members, 172 conversations, and 6 weeks of practice:

Team MemberCallsFirst 3 AvgLast 3 AvgGain
Member A229.296.7+951%
Member B200.088.3From 0
Member C1917.581.1+363%
Member D2430.878.2+154%
Member E1732.566.8+106%
Member F1346.657.2+23%
Member G116.740.8+509%
Member H160.039.2From 0
Member I169.337.9+308%
Member J411.728.3+142%
Member K343.743.70%
Member L623.32.7-88%
Team Average17218.853.6+185%

Why this table matters: Every row is a coaching decision. The top four need advanced scenarios and stretch assignments. The middle group needs targeted reinforcement. The bottom two need a fundamentally different conversation. Without this data, every person on this team gets the same generic training. With it, every person gets exactly what they need.

What Changed

The transformation wasn't just in the scores. It was in the team's relationship with practice itself. By Week 3, team members were voluntarily completing multiple sessions per week. They were comparing scores, sharing what worked, and asking for harder scenarios. The program created a culture of deliberate practice that didn't exist before.

Leadership's three simplified coaching priorities became:

Get the opening clean. Practice the first 30 seconds until it's second nature. No filler, no rambling, clear purpose.

Go one level deeper. When someone gives you an answer, ask "what's driving that?" before moving on. The real insight is always underneath the first response.

Summarize before moving forward. Paraphrase what you heard before transitioning. This single behavior separates good conversations from great ones.

The Expansion

Based on the results, leadership initiated conversations to bring this approach to additional groups across the organization. The behavioral scoring model and coaching intelligence framework were designed from the start to scale beyond the initial team. The infrastructure, the scorecards, and the measurement methodology are all built to expand.

Why This Matters

This engagement proved something that wealth management has struggled with for decades: you can bring the measurement rigor of a contact center to relationship-driven advisory conversations without sacrificing the human element. The AI doesn't replace the conversation. It makes the invisible visible so that coaching becomes precise, targeted, and measurable.

172 conversations in 6 weeks. A team that went from a 3.4 average to a 55.7 peak. Individual improvements of nearly 1,000%. And most importantly, a leadership team that can now answer the question every organization should be asking: "Are our people actually having the conversations we think they're having?"

The answer was more revealing than anyone expected. And that's the point.

Ready to See What Your Conversations Are Really Saying?

Book a 30-minute diagnostic to explore how AI coaching intelligence can transform your team's performance.

Book a Diagnostic