subagentprompts

.com prompt engineering
verified 2026-07-01

Caching, hallucination reduction, and consistency — the parts most likely to be skipped

Prompt caching

Prompt caching resumes from specific prefixes to cut processing time and cost on repeated or consistent prompts. Two modes: automatic caching — add a single cache_control field at the top level of the request, and the system applies the cache breakpoint to the last cacheable block, moving it forward as the conversation grows (best for multi-turn conversations) — and explicit cache breakpointscache_control placed directly on individual content blocks for fine-grained control. It is eligible for Zero Data Retention: under a ZDR arrangement, cached data is not stored after the response returns.

Reducing hallucinations

  • Allow "I don't know." Explicit permission to admit uncertainty drastically reduces false information.
  • Extract quotes first, on long documents. For documents over roughly 20k tokens, ask Claude to pull word-for-word quotes before analyzing — this grounds the response in the actual text.
  • Verify with citations. Have Claude cite a supporting quote for every claim, and retract claims it cannot support.

Increasing consistency

  • Specify the exact output format (JSON, XML, or a template) rather than describing the shape loosely. For guaranteed schema conformance, use Structured Outputs instead of prompt-only formatting.
  • Constrain with examples — worked examples train the model's understanding better than abstract instructions.
  • Prefill, with a real caveat. Prefilling the Assistant turn is a genuine consistency technique, but it is not supported on Claude Fable 5, Claude Mythos 5, Claude Mythos Preview, Claude Opus 4.8/4.7/4.6, or Claude Sonnet 4.6. On those models, use Structured Outputs where supported, or system-prompt instructions, instead.

Sources: prompt-caching (mirrored locally in this repo's docs/ tree) · reduce-hallucinations · increase-consistency — the latter two fetched live this session.