14.1 Spec: "Summarize any article into structured bullets"

On this page

Goal and non-goals
Inputs (what the app accepts)
Outputs (schema-first)
Quality rules (grounding, uncertainty)
Acceptance criteria (mini test suite)
Edge cases you must define now
How to prompt for this project (spec → plan → scaffold)
Ship points (what “done” means per stage)
Copy-paste spec templates
Where to go next

Goal and non-goals

Goal: build an app that takes article text as input and returns a structured bullet summary that is easy to display and validate.

Non-goals (for this project):

building a full web scraper (URLs are optional and out of scope initially),
building a full RAG system (that’s Part VIII later),
perfect factuality beyond the provided text (we’ll focus on “grounded in input”),
long-term storage/search of summaries (out of scope for Project 1).

Scope choice

This project is intentionally small so you can learn the pipeline. You can add URL fetching, storage, and search later without redesigning the core.

Inputs (what the app accepts)

Define inputs explicitly so the model doesn’t guess and your app doesn’t become a “works for me” prototype.

Primary input: article text

Input type: UTF-8 text
Max length: define a limit (example: 10,000–30,000 characters) and enforce it
Whitespace: treat as insignificant; trim and normalize

Optional metadata (nice-to-have)

title: user-provided title (optional)
source_url: optional string (store as metadata only; do not fetch in v1)
audience: “general”, “technical”, “executive” (optional)

Don’t fetch URLs in v1

URL fetching adds networking, parsing, content extraction, and policy concerns. Start with pasted text. Add URL fetching later behind explicit constraints.

Outputs (schema-first)

The output should be structured so the app can parse and display it reliably. Start with a schema that is:

small enough to be hard to break,
expressive enough to be useful,
easy to validate.

Suggested v1 schema (practical)

This is a good starting point for a “structured bullets” summary:

{
  "title": "string | null",
  "summary_bullets": ["string", "..."],
  "key_entities": ["string", "..."],
  "claims": [
    {"claim": "string", "support": "string | null"}
  ],
  "caveats": ["string", "..."]
}

Output rules (important)

Length cap: cap bullet count (example: 5–10 summary bullets).
No invented facts: claims must be grounded in the input text.
Use null when unknown: don’t guess missing titles.
Strings only: keep it simple; no nested complexity beyond what you need.

Design schema for validation

If your schema can be validated with straightforward rules, your app becomes reliable. The model will still wobble sometimes; your validator makes wobble survivable.

Quality rules (grounding, uncertainty)

Summarization apps fail in predictable ways: hallucinated facts, overconfident tone, and missing caveats. Add explicit quality rules:

Grounding rule: “Only use information from the provided text.”
Uncertainty rule: “If the text doesn’t support a claim, don’t include it.”
Caveat rule: “Include caveats/uncertainties if the text is ambiguous.”
No policy violations: handle refusal/blocked outcomes (status response).

This is where quality comes from

Not from “better prompting vibes.” From explicit constraints + validation + failure-aware UX.

Acceptance criteria (mini test suite)

Write acceptance criteria as if you were writing tests. Here is a strong v1 set:

Functional criteria

Given a non-empty article text input, the app returns JSON matching the schema.
summary_bullets contains 5–10 bullets (configurable, but bounded).
All strings are non-empty after trimming whitespace.

Grounding criteria

The app includes a “caveats” array and it is present even if empty.
The prompt explicitly instructs “do not invent facts” and “use null when unknown.”
If input is too short/empty, the app returns a validation error (no model call).

Error and reliability criteria

The app has a timeout for the model call.
The app categorizes failures into: ok, blocked, timeout, rate_limit, invalid_output, auth_error, unknown.
On invalid_output, the app either retries once with a stricter repair prompt or returns a clear error.

UX criteria (CLI or web)

Success output is clearly displayed (pretty JSON or formatted bullets).
Failure output is user-friendly (no stack traces to the user).
Inputs are not stored by default (privacy-first baseline).

Acceptance criteria drive prompting

Paste these criteria into your prompts. Require the model to explain how each criterion is satisfied before you accept code.

Edge cases you must define now

These are the “silent assumption” areas that cause bugs later if you don’t define them:

Empty input: error message and exit code / HTTP status?
Very long input: reject, truncate, or summarize first?
Non-article text: do you still summarize, or ask for clarification?
Language mismatch: summarize in same language or force English?
Profanity/sensitive content in quoted text: how do you avoid accidental safety blocks?

For v1, choose simple answers (reject or handle gracefully) and document them.

How to prompt for this project (spec → plan → scaffold)

Use your Part III patterns:

plan first (6.1),
constraints (6.2),
define done (6.3),
examples as mini tests (6.4),
diff-only changes (7.5) once the repo exists.

A practical prompt sequence

Prompt A: ask for spec feedback and a plan (no code).
Prompt B: scaffold repo structure (stubs + README + schema file).
Prompt C: implement the walking skeleton (one end-to-end path).
Prompt D: add tests + validation + error taxonomy.

Do not request “the full app” in one prompt

That produces long, brittle output and makes review impossible. Keep changes incremental and verifiable.

Ship points (what “done” means per stage)

SP1: CLI/web app runs locally; returns a summary for a short input.
SP2: output validated against schema; invalid outputs handled.
SP3: failure categories implemented; timeouts/retries in place.
SP4: prompts versioned as files; logs include prompt version.

Copy-paste spec templates

Template: authoritative spec block

SPEC (authoritative)

Goal:
Summarize an article into structured bullets.

Inputs:
- article_text: string (max N chars)
- title: optional string

Output (JSON schema):
- schema: summarize/v1.json

Constraints:
- Use only provided text (no invented facts)
- If unknown, use null or omit per schema
- No new dependencies unless approved
- Handle blocked/timeout/rate_limit/invalid_output outcomes

Acceptance criteria:
- Output validates against schema
- summary_bullets length 5–10
- empty input returns validation error (no model call)

END SPEC

Template: mini tests (examples)

Mini tests:
1) Input: short article (2 paragraphs)
   Expected: valid JSON, 5–10 bullets
2) Input: empty string
   Expected: validation error, no model call
3) Input: very long text
   Expected: rejected with clear message (v1)

14.1 Spec: "Summarize any article into structured bullets"

Goal and non-goals

Inputs (what the app accepts)

Primary input: article text

Optional metadata (nice-to-have)

Outputs (schema-first)

Suggested v1 schema (practical)

Output rules (important)

Quality rules (grounding, uncertainty)

Acceptance criteria (mini test suite)

Functional criteria

Grounding criteria

Error and reliability criteria

UX criteria (CLI or web)

Edge cases you must define now

How to prompt for this project (spec → plan → scaffold)

A practical prompt sequence

Ship points (what “done” means per stage)

Copy-paste spec templates

Template: authoritative spec block

Template: mini tests (examples)

Where to go next