OPEN SOURCE · v0.1.0

Stop Guessing Why Your Prompt Isn't Working

PromptLab diagnoses your prompt across 12 dimensions, generates targeted improvements using distinct strategies, and auto-tests all variants to prove which one wins — in one command. No dataset required.

View on GitHub →

12 Dimensions3 ProvidersMIT LicencePython 3.10+

The Problem

Most prompting tools make you guess.

Without PromptLab

❌Rewrite tools give you a new prompt with no explanation of what was wrong
❌Testing frameworks require a labelled dataset you don't have
❌Chrome extensions work per-session with no history or comparison

With PromptLab

✅PromptLab scores every dimension and tells you exactly why each one is weak
✅Auto-generates test cases from your prompt — no dataset needed
✅CLI-first, local-first — sessions saved, history browsable, works offline

Live Demo

See It In Action

Real output from promptlab analyse on a weak prompt.

terminal

$ promptlab analyse "You are a helpful assistant. Answer questions."

Analysing prompt across 12 dimensions...

Overall Score: 2.1 / 5.0 ──────────────────── NEEDS WORK

DimensionScoreStatus

Role Definition1/5⚠ CRITICAL

Task Clarity2/5▲ LOW

Output Format1/5⚠ CRITICAL

Input Specification1/5⚠ CRITICAL

Constraints2/5▲ LOW

Examples (Few-shot)1/5⚠ CRITICAL

Tone & Style2/5▲ LOW

Edge Cases1/5⚠ CRITICAL

Reasoning Instructions1/5⚠ CRITICAL

Context Management2/5▲ LOW

Specificity Balance3/5● MEDIUM

Token Efficiency4/5✓ GOOD

5 critical issues · 4 low · 1 medium · 1 good

→ Run: promptlab improve "You are a helpful assistant..." --test

How It Works

Three Commands. End to End.

🔍01

Analyse

promptlab analyse "your prompt"

Runs a structured diagnostic across 12 dimensions. Each gets a score from 1–5, a rationale for why it's weak, and an actionable suggestion. Critical issues are flagged immediately.

✨02

Improve

promptlab improve "your prompt"

Generates 3 improved variants using distinct strategies — structured enhancement, role & context expansion, and few-shot augmentation. Not random rewrites. Each change is explained.

🏆03

Test & Win

promptlab improve "your prompt" --test

Auto-generates test cases, runs your original and all 3 variants against them, scores every output, and recommends the winner with reasoning. No dataset. No manual grading.

Diagnostic Framework

The 12 Dimensions

Every prompt is scored 1–5 across these dimensions. Most prompts score under 2.5 on the first pass.

🎭

Role Definition

Does the prompt define a clear expert persona?

🎯

Task Clarity

Is the primary task unambiguous and single-focused?

📋

Output Format

Does it specify the desired structure and format?

📥

Input Specification

Does it describe what inputs to expect?

🚧

Constraints

Are restrictions and limits stated explicitly?

💡

Examples

Are few-shot examples provided to guide output style?

🗣️

Tone & Style

Is the desired register and voice specified?

⚡

Edge Cases

Does it handle unexpected or ambiguous inputs?

🧠

Reasoning

Is chain-of-thought or step-by-step thinking instructed?

📦

Context Management

Is the prompt self-contained with all context?

⚖️

Specificity Balance

Specific enough without over-constraining?

✂️

Token Efficiency

Concise and free of redundant instructions?

Competitive Landscape

Why Not Just Use...

PromptLab fills a specific gap — no other tool explains why a prompt is weak and proves the fix.

Feature	PromptLab	DSPy	Promptfoo	Braintrust	Chrome Ext.
No dataset needed	✅	❌	❌	❌	✅
Explains why it's weakunique	✅	❌	❌	❌	❌
Auto-tests improvements	✅	✅	✅	✅	❌
Multi-provider support	✅	✅	✅	✅	❌
Local-first / offline	✅	✅	✅	❌	❌
Free & open source	✅	✅	✅	❌	❌

Stack

Built With

Python 3.10+FastAPIReactTypeScriptClickRichPydanticAnthropic SDKOpenAI SDKOllamapytestGitHub ActionsRuff

It's open source. Use it, break it, improve it.

Built in spare time as a genuine tool I use for my own prompts. PRs welcome — especially new providers and diagnostic dimensions.

⭐ Star on GitHub ← Back to Projects

← Back to Projects