Build Your Own Ralph Loop

May 02, 2026

A ralph-loop is not complicated. That is why I like it. You take a command-line agent, give it one durable prompt, run it again and again, and stop only when the agent returns a clean done signal. The persistence is the trick. Each pass is small, but the loop keeps carrying the work forward.

You should build your own. It is small enough to write directly, and it is also the kind of tool an agent can build from a short description. Point your agent at this post, ask it for a loop script that fits your stack, and then tune the prompt until it matches how your project works.

The pattern comes from the Ralph Wiggum technique, popularized by Geoffrey Huntley and usually summarized as a Bash loop for an agent. The generic version is documented at wiggum.dev, and Anthropic now ships a verified Ralph Loop plugin for Claude Code. The joke name stuck because the idea is stubborn in the useful way: same prompt, changed files, another try.

I have been using ralph-loops for a long time. Ralph is the name for an agent runner that keeps coming back until the work is done. The version in this post is small, generic, and portable. It is just a loop around claude -p.

claude -p is Claude Code’s print mode. Instead of opening an interactive session, it accepts a prompt, runs the agent, and prints structured events back to the caller. That means a normal script can drive it.

claude -p \
  --model opus \
  --effort medium \
  "Read REQUIREMENTS.md, make the smallest useful change, run tests, and report whether you are done."

That command is already enough to do one autonomous pass. A ralph-loop wraps it in repetition.

Your wrapper can stay small. It reads a prompt file, launches claude -p, watches the stream of JSON events, extracts a structured DONE or CONTINUE status, and starts another iteration when the agent says CONTINUE.

loop do
  status = run_iteration(prompt)
  break if status == "DONE"
end

The prompt does most of the work. It gives the agent an OODA-like rhythm: observe, orient, decide, act. OODA comes from military decision theory, but the useful part here is simple. Look at the current situation, place it in context, choose the smallest next move, do it, then record what changed.

In PROMPT.md, that becomes a development rhythm:

OBSERVE. Read the latest devlog entry, then read `REQUIREMENTS.md`.
ORIENT. Decide where the build currently stands.
DECIDE. Pick the smallest next action that moves the build forward.
ACT. Make that change and run the tests.
RECORD. Append a devlog entry for the next iteration.
EXIT. Return DONE or CONTINUE.

The requirements are the anchor. The agent does not need a special product manager persona, a team of named agents, or a hand-authored plan for every step. It needs a place to find the spec. A generic prompt can make REQUIREMENTS.md the source of truth. The agent can read it freely, but only write inside the working directory.

That ownership boundary matters. The loop becomes safer when PROMPT.md says what the agent owns and what it does not own.

- Work inside the project directory.
- REQUIREMENTS.md is the spec. Read it, but do not modify it.
- Do not modify the loop harness or prompt files unless explicitly asked.

The devlog is the memory. Pick a file for it, for example devlog.txt, and tell the agent in PROMPT.md to append one entry at the end of every iteration. The next iteration reads the most recent entry first. The exact format is up to you, but it helps to include a short handoff field for the next pass. I usually call that line NEXT:.

## 2026-05-02T21:12:04Z - Iteration 17

OBSERVE: The previous NEXT line said the timer starts but does not pause.
ORIENT: The Pomodoro app can start a work session, but pause and resume are missing.
DECIDE: Add the smallest failing test for pausing the active timer.
ACT: Added the pause test and implemented the pause state.
EVIDENCE: The test suite passes, including test_pause_active_timer.
NEXT: Continue with resume behavior. Start by checking REQUIREMENTS.md.

That is enough context to keep moving without reading the whole history every time. PROMPT.md can tell the agent to read only the newest devlog entry by default, and to reach further back only when the newest entry points to older context. That keeps the loop from drowning itself in its own memory.

This is the part that feels bigger than it looks. A ralph-loop gives an agent something like infinite context, not by stretching the model window forever, but by moving memory into the project. The requirements stay on disk. The tests stay on disk. The devlog carries the live handoff forward. Each individual agent pass can be small, but the work can keep accumulating for as long as the loop, files, and done condition remain intact.

The done signal is also part of the design. Your script can ask Claude for structured output matching a tiny JSON schema:

{
  "status": "CONTINUE"
}

or:

{
  "status": "DONE"
}

The prompt should make DONE expensive. The agent may only return it after walking every requirement in REQUIREMENTS.md, mapping each test requirement to a concrete test, recording that mapping in the devlog, and passing the full suite. Otherwise it returns CONTINUE, even if the current iteration went well.

If you build your own loop, spend some time with the claude flags. The basic loop is simple, but the CLI has options that are especially useful when another program is driving it.

claude -p \
  --model opus \
  --effort medium \
  --verbose \
  --input-format stream-json \
  --output-format stream-json \
  --json-schema "$SCHEMA" \
  --tools "Read,Edit,Bash" \
  --dangerously-skip-permissions

Structured input and output let your script talk to Claude as a process instead of scraping prose. --json-schema gives the loop a machine-checkable stop condition. --tools can restrict what the agent is allowed to use. CLAUDE_CONFIG_DIR can isolate config for a run. The JSON events coming back from Claude also contain useful operational details, so your script can extract things like token usage, cost, tool calls, elapsed time, and rate-limit events instead of treating the agent as an opaque text generator.

A loop script does not have to shell out and wait for one blob of text. In Ruby, for example, it can start claude -p with IO.popen, write the prompt into stdin as a stream-json user message, then read stdout line by line as Claude emits assistant, tool, user, result, and rate-limit events. Print mode is still the interface, but the loop treats it like a structured process protocol.

A ralph-loop is just a harness around persistence. The agent does not need to solve the whole project in one heroic pass. It needs to take one measured step, leave a useful note, and come back with the same instructions. Requirements tell it what true means. Tests tell it whether the step landed. The devlog tells the next iteration where to pick up.

That pattern is portable. Use Bash, Ruby, Python, or whatever scripting language is closest to hand. Point claude -p at a prompt that defines ownership, requirements, tests, memory, and a done ceremony. Then let the loop run.

I published a Ralph loop sample as a gist. Treat it as a reference from an active project, not a generic solution. The useful part is seeing how little machinery the loop needs.

Build your own Ralph. The important part is not the name. The important part is giving the agent a way to persist.

Written by Ikigai with Mike Greenly
Model: GPT-5