Что будешь делать с гипотезами после скоринга бэклога?
Комментарии (1)
Ответ сгенерирован нейросетью и может содержать ошибки
От скоринга к гипотезам и экспериментам
Краткий ответ
Одна из главных ошибок PMs: они скорируют бэклог и потом... просто разрабатывают top-scored items.
Мой approach: скоринг это just input в decision-making. Гипотезы и эксперименты это где реальная наука.
Сценарий
Я скорировал бэклог (RICE scoring). Топ-5:
| # | Feature | RICE Score | Problem |
|---|---|---|---|
| 1 | Dashboard redesign | 8500 | Improves UX |
| 2 | API rate limits | 6200 | Fixes stability |
| 3 | Export to CSV | 5800 | Requested by 10 customers |
| 4 | Dark mode | 4100 | Nice to have |
| 5 | Onboarding flow | 3900 | Reduces signup friction |
Теперь я have свой roadmap для следующего квартала.
Но важное: Я НЕ просто говорю инженерам "делайте #1". Я сначала validate мои assumptions.
Шаг 1: Разбей score в гипотезы
Каждый score основан на assumptions. Я их articulate:
Feature #1: Dashboard redesign (Score 8500)
Assumptions в RICE:
- Reach: 50% users (10,000 people) будут использовать новый dashboard
- Impact: 3/5 — улучшит experience но не критично
- Confidence: 80% (based on user interviews)
- Effort: 4 weeks
Теперь я convert в гипотезы:
Hypothesis 1: "50% of users will use new dashboard"
← Is this true? Maybe it's 20%?
Hypothesis 2: "New dashboard improves experience (Impact = 3)"
← By how much? What metric?
← Maybe users find new design confusing?
Hypothesis 3: "80% confidence based on interviews"
← 5 interviews isn't statistically significant
← Maybe results were biased?
Шаг 2: Выбери гипотезы для тестирования
Я НЕ тестирую все. Я выбираю:
Критерии:
- Наибольший риск (если wrong, проекта сломается)
- Easiest to test (быстро validate или invalidate)
- Most uncertain (высокое предположение)
Для dashboard:
| Гипотеза | Risk | Test ease | Uncertainty | Priority |
|---|---|---|---|---|
| 50% users will use | High | Easy | High | TEST THIS |
| New design better UX | Medium | Medium | Medium | Maybe |
| 4 weeks effort | Low | Hard | Low | Skip |
Шаг 3: Дизайн экспериментов
Гипотеза: "50% of users will use new dashboard"
Experiment design:
Setup:
- Create new dashboard (prototype, not full build)
- Show to 10% of users (A/B test)
- Old users: current dashboard
- New users: new dashboard
Metric:
- Activation rate: What % of users visit new dashboard at least once
- Adoption rate: What % use it regularly (2+ times per week)
Expected result: 50% activation, 30% regular adoption
Power: 90% (detect 5% difference with 90% confidence)
Duration: 2 weeks (enough data)
Decision rule:
- If activation < 30%: Hypothesis is wrong
- If activation 30-40%: Hypothesis partially right, need refinement
- If activation > 50%: Hypothesis confirmed
Реальный пример
Предположим я запустил experiment с new dashboard.
День 7 (midway check):
- Activation: 8% (vs expected 50%)
- I'm shocked. Design выглядит good.
- But data doesn't lie.
Action:
- I stop experiment early (это не работает)
- Но я НЕ kill проект
- Я start investigating: "Почему only 8%?"
Гипотеза пересмотра:
- Maybe users don't know that new dashboard exists?
- Maybe it's not obvious where to find it?
- Maybe the new design is confusing?
Я делаю:
- Customer interviews: "Видели ли вы новый dashboard? Почему вы его не использовали?"
- Session replays: Как users navigate?
- Heatmaps: Где они click?
Learnings:
- 60% of users didn't even notice new dashboard (visibility problem)
- 30% tried it but was confused by new UI (onboarding problem)
- 10% used it and liked it (core user loves it)
Refined hypothesis: "If we fix visibility (in-app banner) and add onboarding tooltip, adoption will be 40%"
New experiment:
- Add in-app announcement
- Add 3-step onboarding
- Test again
Result: Adoption jumps to 42%. Success!
Шаг 4: Используй learning в других features
Инсайты из dashboard experiment apply к другим items.
Feature #3: Export to CSV (Score 5800)
Оригинальный score based on:
- "10 customers requested это"
- Assumption: это high impact
Но dashboard taught меня:
- "10 customers request" не значит что 50% users will use
Refined hypothesis для Export:
- "Only 10 customers = 1% of user base"
- "Real adoption probably 5-10% (not 50%)"
- "Effort: 2 weeks"
- "New RICE: (1000 × 2 × 0.6) / 2 = 600"
Вывод: Score drops from 5800 to 600. Это should be lower priority!
But: Maybe есть reason почему exactly эти 10 customers want это?
- "Они enterprise customers with $100k contracts?"
- "Они about to churn without this?"
Тогда это different. Not "nice to have" но "keep customer". Risk-based priority, not score-based.
Шаг 5: Квартальная система
Как я балансирую:
Quarter = 13 weeks
Weeks 1-2: Experimentation sprint
- Test top 3 scored items
- Kill or refine based on results
Weeks 3-8: Build top winners
- Dashboard (refined based on experiment)
- API rate limits (already validated)
Weeks 9-10: Buffer for issues
- Maybe experiment failed
- Maybe new urgent request
Weeks 11-13: Polish and launch
- Testing, customer feedback
- Docs, support materials
Ошибки которых я избегаю
❌ Mistake 1: Blindly trust score
- Я используюScore как starting point, не final decision
❌ Mistake 2: No hypothesis
- Я не say "let's build this because score is 8500"
- Я say "I believe 50% users will use this. Let's test."
❌ Mistake 3: Test everything
- Я test только high-risk assumptions
- Some things obvious (e.g., API rate limits clearly needed)
❌ Mistake 4: Ignore experiment results
- If experiment says "no", I listen
- Even if I personally believe in idea
❌ Mistake 5: No feedback loop
- I test, I learn, но не apply to other decisions
- Every experiment should improve decision-making process
Framework: Scoring → Hypothesis → Experimentation
Backlog items → RICE scoring → Articulate assumptions → Hypothesis → Experiment → Results → Refined roadmap → Build
Это не linear:
- Maybe experiment kills idea (back to scoring next item)
- Maybe experiment refines it (build modified version)
- Maybe experiment confirms (build with confidence)
Пример полного цикла
Week 1:
- Score бэклог
- Top item: "Dashboard redesign"
- Assumption: "50% users will adopt"
Week 2-3:
- Design hypothesis
- Run experiment
- Result: Only 8% adoption (assumption wrong)
Week 4:
- Investigate why
- Find visibility issue
- Create refined hypothesis
- Design new experiment (with fix)
Week 5-6:
- Run new experiment
- Result: 42% adoption (hypothesis confirmed!)
Week 7-8:
- Build full version
- Confidence: High (backed by data)
Result: Instead of building dashboard and hoping it works, I built dashboard WITH visibility improvements, based on learning. User adoption likely 2-3x higher than if I didn't experiment.
Главный принцип
Scoring это discipline. Experiments это science. Together, это art of product management.
Scoring alone = bias and gut feeling. Experiments alone = endless testing, no direction.
Together = data-driven decisions with strategic intent.