Stop Guessing Why Your Prompt Isn't Working
PromptLab diagnoses your prompt across 12 dimensions, generates targeted improvements using distinct strategies, and auto-tests all variants to prove which one wins — in one command. No dataset required.
The Problem
Most prompting tools make you guess.
Without PromptLab
- ❌Rewrite tools give you a new prompt with no explanation of what was wrong
- ❌Testing frameworks require a labelled dataset you don't have
- ❌Chrome extensions work per-session with no history or comparison
With PromptLab
- ✅PromptLab scores every dimension and tells you exactly why each one is weak
- ✅Auto-generates test cases from your prompt — no dataset needed
- ✅CLI-first, local-first — sessions saved, history browsable, works offline
Live Demo
See It In Action
Real output from promptlab analyse on a weak prompt.
$ promptlab analyse "You are a helpful assistant. Answer questions."
Analysing prompt across 12 dimensions...
Overall Score: 2.1 / 5.0 ──────────────────── NEEDS WORK
5 critical issues · 4 low · 1 medium · 1 good
→ Run: promptlab improve "You are a helpful assistant..." --test
How It Works
Three Commands. End to End.
Analyse
promptlab analyse "your prompt"Runs a structured diagnostic across 12 dimensions. Each gets a score from 1–5, a rationale for why it's weak, and an actionable suggestion. Critical issues are flagged immediately.
Improve
promptlab improve "your prompt"Generates 3 improved variants using distinct strategies — structured enhancement, role & context expansion, and few-shot augmentation. Not random rewrites. Each change is explained.
Test & Win
promptlab improve "your prompt" --testAuto-generates test cases, runs your original and all 3 variants against them, scores every output, and recommends the winner with reasoning. No dataset. No manual grading.
Diagnostic Framework
The 12 Dimensions
Every prompt is scored 1–5 across these dimensions. Most prompts score under 2.5 on the first pass.
Role Definition
Does the prompt define a clear expert persona?
Task Clarity
Is the primary task unambiguous and single-focused?
Output Format
Does it specify the desired structure and format?
Input Specification
Does it describe what inputs to expect?
Constraints
Are restrictions and limits stated explicitly?
Examples
Are few-shot examples provided to guide output style?
Tone & Style
Is the desired register and voice specified?
Edge Cases
Does it handle unexpected or ambiguous inputs?
Reasoning
Is chain-of-thought or step-by-step thinking instructed?
Context Management
Is the prompt self-contained with all context?
Specificity Balance
Specific enough without over-constraining?
Token Efficiency
Concise and free of redundant instructions?
Competitive Landscape
Why Not Just Use...
PromptLab fills a specific gap — no other tool explains why a prompt is weak and proves the fix.
| Feature | PromptLab | DSPy | Promptfoo | Braintrust | Chrome Ext. |
|---|---|---|---|---|---|
| No dataset needed | ✅ | ❌ | ❌ | ❌ | ✅ |
| Explains why it's weakunique | ✅ | ❌ | ❌ | ❌ | ❌ |
| Auto-tests improvements | ✅ | ✅ | ✅ | ✅ | ❌ |
| Multi-provider support | ✅ | ✅ | ✅ | ✅ | ❌ |
| Local-first / offline | ✅ | ✅ | ✅ | ❌ | ❌ |
| Free & open source | ✅ | ✅ | ✅ | ❌ | ❌ |
Stack
Built With
It's open source. Use it, break it, improve it.
Built in spare time as a genuine tool I use for my own prompts. PRs welcome — especially new providers and diagnostic dimensions.