166 lines
7.3 KiB
Markdown
166 lines
7.3 KiB
Markdown
---
|
|
name: Data Analysis
|
|
slug: data-analysis
|
|
version: 1.0.2
|
|
homepage: https://clawic.com/skills/data-analysis
|
|
description: "数据分析和可视化。"
|
|
changelog: Added metric contracts, chart guidance, and decision brief templates for more reliable analysis.
|
|
metadata: {"clawdbot":{"emoji":"D","requires":{"bins":[]},"os":["linux","darwin","win32"]}}
|
|
---
|
|
|
|
## When to Use
|
|
|
|
Use this skill when the user needs to analyze, explain, or visualize data from SQL, spreadsheets, notebooks, dashboards, exports, or ad hoc tables.
|
|
|
|
Use it for KPI debugging, experiment readouts, funnel or cohort analysis, anomaly reviews, executive reporting, and quality checks on metrics or query logic.
|
|
|
|
Prefer this skill over generic coding or spreadsheet help when the hard part is analytical judgment: metric definition, comparison design, interpretation, or recommendation.
|
|
|
|
User asks about: analyzing data, finding patterns, understanding metrics, testing hypotheses, cohort analysis, A/B testing, churn analysis, or statistical significance.
|
|
|
|
## Core Principle
|
|
|
|
Analysis without a decision is just arithmetic. Always clarify: **What would change if this analysis shows X vs Y?**
|
|
|
|
## Methodology First
|
|
|
|
Before touching data:
|
|
1. **What decision** is this analysis supporting?
|
|
2. **What would change your mind?** (the real question)
|
|
3. **What data do you actually have** vs what you wish you had?
|
|
4. **What timeframe** is relevant?
|
|
|
|
## Statistical Rigor Checklist
|
|
|
|
- [ ] Sample size sufficient? (small N = wide confidence intervals)
|
|
- [ ] Comparison groups fair? (same time period, similar conditions)
|
|
- [ ] Multiple comparisons? (20 tests = 1 "significant" by chance)
|
|
- [ ] Effect size meaningful? (statistically significant != practically important)
|
|
- [ ] Uncertainty quantified? ("12-18% lift" not just "15% lift")
|
|
|
|
## Architecture
|
|
|
|
This skill does not require local folders, persistent memory, or setup state.
|
|
|
|
Use the included reference files as lightweight guides:
|
|
- `metric-contracts.md` for KPI definitions and caveats
|
|
- `chart-selection.md` for visual choice and chart anti-patterns
|
|
- `decision-briefs.md` for stakeholder-facing outputs
|
|
- `pitfalls.md` and `techniques.md` for analytical rigor and method choice
|
|
|
|
## Quick Reference
|
|
|
|
Load only the smallest relevant file to keep context focused.
|
|
|
|
| Topic | File |
|
|
|-------|------|
|
|
| Metric definition contracts | `metric-contracts.md` |
|
|
| Visual selection and chart anti-patterns | `chart-selection.md` |
|
|
| Decision-ready output formats | `decision-briefs.md` |
|
|
| Failure modes to catch early | `pitfalls.md` |
|
|
| Method selection by question type | `techniques.md` |
|
|
|
|
## Core Rules
|
|
|
|
### 1. Start from the decision, not the dataset
|
|
- Identify the decision owner, the question that could change a decision, and the deadline before doing analysis.
|
|
- If no decision would change, reframe the request before computing anything.
|
|
|
|
### 2. Lock the metric contract before calculating
|
|
- Define entity, grain, numerator, denominator, time window, timezone, filters, exclusions, and source of truth.
|
|
- If any of those are ambiguous, state the ambiguity explicitly before presenting results.
|
|
|
|
### 3. Separate extraction, transformation, and interpretation
|
|
- Keep query logic, cleanup assumptions, and analytical conclusions distinguishable.
|
|
- Never hide business assumptions inside SQL, formulas, or notebook code without naming them in the write-up.
|
|
|
|
### 4. Choose visuals to answer a question
|
|
- Select charts based on the analytical question: trend, comparison, distribution, relationship, composition, funnel, or cohort retention.
|
|
- Do not add charts that make the deck look fuller but do not change the decision.
|
|
|
|
### 5. Brief every result in decision format
|
|
- Every output should include the answer, evidence, confidence, caveats, and recommended next action.
|
|
- If the output is going to a stakeholder, translate the method into business implications instead of leading with technical detail.
|
|
|
|
### 6. Stress-test claims before recommending action
|
|
- Segment by obvious confounders, compare the right baseline, quantify uncertainty, and check sensitivity to exclusions or time windows.
|
|
- Strong-looking numbers without robustness checks are not decision-ready.
|
|
|
|
### 7. Escalate when the data cannot support the claim
|
|
- Block or downgrade conclusions when sample size is weak, the source is unreliable, definitions drifted, or confounding is unresolved.
|
|
- It is better to say "unknown yet" than to produce false confidence.
|
|
|
|
## Common Traps
|
|
|
|
- Reusing a KPI name after changing numerator, denominator, or exclusions -> trend comparisons become invalid.
|
|
- Comparing daily, weekly, and monthly grains in one chart -> movement looks real but is mostly aggregation noise.
|
|
- Showing percentages without underlying counts -> leadership overreacts to tiny denominators.
|
|
- Using a pretty chart instead of the right chart -> the output looks polished but hides the actual decision signal.
|
|
- Hunting for interesting cuts after seeing the result -> narrative follows chance instead of evidence.
|
|
- Shipping automated reports without metric owners or caveats -> bad numbers spread faster than they can be corrected.
|
|
- Treating observational patterns as causal proof -> action plans get built on correlation alone.
|
|
|
|
## Approach Selection
|
|
|
|
| Question type | Approach | Key output |
|
|
|---------------|----------|------------|
|
|
| "Is X different from Y?" | Hypothesis test | p-value + effect size + CI |
|
|
| "What predicts Z?" | Regression/correlation | Coefficients + R² + residual check |
|
|
| "How do users behave over time?" | Cohort analysis | Retention curves by cohort |
|
|
| "Are these groups different?" | Segmentation | Profiles + statistical comparison |
|
|
| "What's unusual?" | Anomaly detection | Flagged points + context |
|
|
|
|
For technique details and when to use each, see `techniques.md`.
|
|
|
|
## Output Standards
|
|
|
|
1. **Lead with the insight**, not the methodology
|
|
2. **Quantify uncertainty** - ranges, not point estimates
|
|
3. **State limitations** - what this analysis can't tell you
|
|
4. **Recommend next steps** - what would strengthen the conclusion
|
|
|
|
## Red Flags to Escalate
|
|
|
|
- User wants to "prove" a predetermined conclusion
|
|
- Sample size too small for reliable inference
|
|
- Data quality issues that invalidate analysis
|
|
- Confounders that can't be controlled for
|
|
|
|
## External Endpoints
|
|
|
|
This skill makes no external network requests.
|
|
|
|
| Endpoint | Data Sent | Purpose |
|
|
|----------|-----------|---------|
|
|
| None | None | N/A |
|
|
|
|
No data is sent externally.
|
|
|
|
## Security & Privacy
|
|
|
|
Data that leaves your machine:
|
|
- Nothing by default.
|
|
|
|
Data that stays local:
|
|
- Nothing by default.
|
|
|
|
This skill does NOT:
|
|
- Access undeclared external endpoints.
|
|
- Store credentials or raw exports in hidden local memory files.
|
|
- Create or depend on local folder systems for persistence.
|
|
- Create automations or background jobs without explicit user confirmation.
|
|
- Rewrite its own instruction source files.
|
|
|
|
## Related Skills
|
|
Install with `clawhub install <slug>` if user confirms:
|
|
- `sql` - query design and review for reliable data extraction.
|
|
- `csv` - cleanup and normalization for tabular inputs before analysis.
|
|
- `dashboard` - implementation patterns for KPI visualization layers.
|
|
- `report` - structured stakeholder-facing deliverables after analysis.
|
|
- `business-intelligence` - KPI systems and operating cadence beyond one-off analysis.
|
|
|
|
## Feedback
|
|
|
|
- If useful: `clawhub star data-analysis`
|
|
- Stay updated: `clawhub sync`
|