Carmine Paolino

Production Experience Cannot Be Hallucinated

Wed, 13 May 2026 00:00:00 +0000

I paid five dollars to read a Medium article about my own free, open source library. It was sold as hard-won production experience.

It was fabricated.

The first code sample used RubyLLM.client, which does not exist. It called client.chat(messages: ...), which does not exist. Then it invented RubyLLM::StreamInterrupted, RubyLLM::APIError, and a stream: proc API that RubyLLM has never had.

The problem was not merely wrong information. Wrong information can be corrected. This was sold as experience with RubyLLM in production, which is a much more valuable claim.

AI slop is not just filling the web with predictable cadence. It is fabricating experience. It is letting people skip the work, skip the scar tissue, and still write in the voice of someone who has been there.

In open source, that turns into a tax. Maintainers build the thing, write the docs, publish the source, keep the examples working, answer the issues, and then have to police hallucinated articles about their own projects before users start debugging ghosts.

The Four Magic Words in Tech

Production. Scale. Security. Reliability.

In the tech world, attach one of these words to a claim and it immediately becomes true. “This does not scale” can kill a project before anyone measures it. “This is not production ready” can sabotage it without a single deploy.

So when an article says “what broke in production”, it is not just offering advice. It is claiming experience, and experience cannot be hallucinated.

The first version opened by saying the author had spent three weeks on the wrong side of the problem before getting something stable in production. That is a powerful claim. It tells the reader to relax and inherit the author’s scars.

There were no scars. The author had not even run the first example.

This is why fake experience is so dangerous. Bad code fails fast. Fake experience lingers. It gets quoted. It gets summarized. It gets used in meetings by people who do not know enough yet to see the hollow center.

The recipe is familiar. Streaming failures. Token budgets. Provider fallback. Turbo Streams. Redis circuit breakers. nginx buffering. Load testing. They sit near “LLM production” in the LLM training data. Arrange them with enough confidence and the result smells real.

Production experience is not a smell. It is a thing that happened, and none of these things happened.

What Actually Happened

Here is the short version.

Most articles about RubyLLM are good. Since it became popular, I have seen a few confident guides from people who clearly had not run the code. Usually they disappear into LinkedIn or search results. This one made the pattern impossible to ignore.

I called it out:

Author of RubyLLM here.

The very first example does not work.

The article is not merely wrong in a few places. It is fabricated.

…

The author replied:

You were right.

The code in the original article was not verified against the actual gem. RubyLLM.client, RubyLLM::StreamInterrupted, RubyLLM::APIError, stream: proc – none of it exists. You caught every fabrication accurately.

I’ve replaced the article entirely. The new version has been verified against your documentation and source. The fake “production experience” framing is gone. It’s now an honest documentation-based guide with a correction notice at the top explaining what happened.

“I’ve replaced the article entirely.”

It was a long article. The completely rewritten replacement appeared in a few minutes. The fake method names were replaced with real ones, but the posture stayed the same: “RubyLLM in production”, “what tutorials skip”, “streaming failures”, “provider fallback”, “token budgets.”

The method names got real. The experience didn’t.

The new version claimed Puma restarts produce neat RubyLLM streaming errors. They do not. If the worker dies, the Ruby process running the call is gone. It suggested deleting old persisted chat messages as context management. That is destroying conversation history. It described fallback by throwing away the chat and asking another provider the last prompt as a fresh question. That is not conversation fallback. It confused HTTP/SSE buffering with Turbo Streams over ActionCable.

Not battle scars. Guesses presented as authority.

I called the second version what it was: phony. The author then hid responses while keeping the article up.

I reported the article to Medium and contacted the publication that promoted it with the fabricated APIs, the author’s admission, and the hidden corrections. To their credit, the editor replied quickly, apologized, and removed it from the publication. But only the author can take down the original Medium article, so the piece remained available without the maintainer corrections visible next to it.

Do Not Counterfeit Experience

Please do write about your favourite software. Critique it too. Tell us maintainers where the API is wrong, the docs are bad, the abstraction leaks. Preferably in an issue so we can actually see it. That feedback is gold.

But do not counterfeit experience. If you’re using The Four Magic Words in Tech, the bar is even higher.

And if you run a technical publication, please at least check the first example.

RubyLLM 1.15: Image Editing, Cost Tracking and Less Tool Boilerplate

Thu, 07 May 2026 00:00:00 +0000

I released RubyLLM 1.15 today.

It ships image editing, cost tracking, cleaner token accounting, inferred tool parameters, additive callbacks, and Rails fixes.

The theme is simple: stop making me write glue code. If the computer can infer it, RubyLLM should infer it. If a provider reports usage, RubyLLM should turn it into cost. If Rails already has a blob, RubyLLM should not download it and upload it again.

Image Editing

RubyLLM.paint could already generate images:

image = RubyLLM.paint("A watercolor robot holding a Ruby gem")

Now with: turns it into an image edit:

image = RubyLLM.paint(
  "Turn the logo green and keep the background transparent",
  model: "gpt-image-1",
  with: "logo.png"
)

Same method, same attachment shape.

The source can be a path, a URL, an IO-like object, or an Active Storage attachment. Multiple source images work too:

image = RubyLLM.paint(
  "Combine these references into a postcard illustration",
  model: "gpt-image-1",
  with: ["person.png", "style-reference.png"]
)

And if you need to constrain the edit, pass a mask:

image = RubyLLM.paint(
  "Replace only the background with a sunset sky",
  model: "gpt-image-1",
  with: "portrait.png",
  mask: "portrait-mask.png"
)

That’s it. paint paints. Sometimes from scratch, sometimes from an existing image.

Cost Tracking

RubyLLM has tracked tokens since 1.0. But “this used 18,432 tokens” is only half the answer. The next question is always: how much did that cost?

Calculating that was never hard. Take the input tokens, output tokens, cached tokens, maybe reasoning tokens. The pricing is already in RubyLLM’s model registry. Multiply by the per-million rate.

But why should every app have to write that code?

RubyLLM already has the usage. RubyLLM already knows the model. RubyLLM already ships the model registry. So now it does the boring math for you.

Now you can ask:

response = chat.ask("Summarize Ruby's object model.")

response.cost.total
chat.cost.total
agent.cost.total

Same for images:

image = RubyLLM.paint("A small watercolor robot", model: "gpt-image-1")

image.tokens.input
image.tokens.output

image.cost.input
image.cost.output
image.cost.total

If RubyLLM does not have pricing for part of the usage, the cost is nil. Better no answer than a fake one.

A chat with ten messages can tell you the total. An agent can tell you the total. A generated image can tell you the total. No more handrolled sums.

Token Counts That Mean What They Say

Prompt caching made token counts messy.

Some providers include cache reads in prompt tokens. Some report cache creation separately. Some don’t. If you multiply the wrong number by the wrong price, your cost tracking is wrong before it starts.

So 1.15 separates the different kinds of tokens before exposing them:

response.tokens.input       # standard input tokens
response.tokens.output      # billable output tokens
response.tokens.cache_read  # prompt cache reads
response.tokens.cache_write # prompt cache writes

tokens.input now means normal input tokens. Cache reads and cache writes are separate. tokens.output always mean billable output tokens.

The old top-level helpers still work. New code should use response.tokens.*.

No new Rails migration is required if you already ran the 1.9 token migration. If you display token counts directly, read the 1.15 upgrade notes.

Less Tool Boilerplate

Tools in RubyLLM are Ruby classes. But for very simple tools, RubyLLM still made you repeat yourself:

class Weather < RubyLLM::Tool
  description "Gets current weather for a location"
  param :latitude  # why?
  param :longitude # DRY!

  def execute(latitude:, longitude:)
    # ...
  end
end

That is silly. The method signature already says there is a latitude and a longitude.

Now this works:

class Weather < RubyLLM::Tool
  desc "Gets current weather for a location"

  def execute(latitude:, longitude:, units: "metric")
    # ...
  end
end

Required keywords become required string parameters. Optional keywords become optional string parameters.

Ruby method signatures don’t tell us JSON Schema types or descriptions, so if those matter, keep using param:

param :units, type: :string, desc: "metric or imperial", required: false

And when you need nested objects, arrays, enums, or full schema control, use params. Nothing changed there.

Also:

desc is now an alias for description
param accepts description: as an alias for desc:
the tool generator now emits desc
we retain full backwards compatibility!

Callbacks That Stack

The old on_* callbacks were replace-style callbacks. Register another one and you replaced the previous one.

That caused an obvious problem: Rails persistence wants callbacks, and your app also wants callbacks. Logging wants callbacks. Analytics wants callbacks. Replacing the previous callback is the wrong default.

So 1.15 adds additive callbacks:

chat.before_message { ... }
chat.after_message { |message| ... }
chat.before_tool_call { |tool_call| ... }
chat.after_tool_result { |result| ... }

Rails persistence uses these internally now. Your app can layer its own callbacks on top without breaking persistence.

The old on_* callbacks are deprecated. They’ll go away in RubyLLM 2.0.

Rails Fixes

Rails got a lot of boring, important fixes:

Action Text-backed message content is converted to plain text before being sent to the model.
ActiveRecord support no longer sits in the core gem eager-load path, fixing standalone require "ruby_llm" with Zeitwerk eager loading.
The acts_as API follows Rails association inference more closely.
Existing Active Storage blobs and attachments passed through with: are reused instead of downloaded and re-uploaded.

Providers and Models

Empty tool results are now handled consistently across Anthropic, Bedrock, and Gemini. When a tool returns nothing, RubyLLM sends a small placeholder instead of provider-invalid empty content.

Streaming and non-streaming token usage is normalized across OpenAI, OpenRouter, Bedrock, and Gemini before cost calculation.

The model registry has been refreshed too: cache read/write pricing, reasoning output pricing, GPT Image pricing, and new aliases including Claude Opus 4.7, DeepSeek V4, Gemini Embedding 2, Gemma 4, and GPT-5.5.

Use It

gem 'ruby_llm', '~> 1.15'

Then:

bundle update ruby_llm

Full release notes on GitHub.

kamal-backup: Scheduled Rails Backups for Kamal Apps

Tue, 05 May 2026 00:00:00 +0000

I released kamal-backup today.

I run Chat with Work on Kamal, and I needed backups. There are already Kamal accessories for database backups. None of them also back up Active Storage. None use restic, so encryption, deduplication, and repository checks are on you. None ship a CLI with restores and drills. None produce evidence you can hand a security reviewer.

So I built one.

A gem and a Docker image

kamal-backup is two pieces: a Ruby gem you add to your Rails app, and a Docker image you boot as a Kamal accessory. They point at a restic repository you bring yourself.

The gem is your CLI. Local commands run directly on your machine using restic. Production-side commands shell out through Kamal into the accessory. The same kamal-backup binary covers setup (init, validate), on-demand operations (backup, list, check), data movement (restore local, restore production), verification (drill local, drill production), and audit (evidence).

The Docker image (ghcr.io/crmne/kamal-backup) ships with restic, pg_dump, mariadb-dump/mysqldump, and sqlite3 baked in. The default container command is kamal-backup schedule, a loop that fires every backup_schedule_seconds and writes one database snapshot and one Active Storage file snapshot per run.

The restic repository is where the encrypted snapshots end up: S3-compatible object storage, a restic REST server, or a filesystem path. kamal-backup points at it. It doesn’t run it for you.

Why restic

I didn’t want to invent a backup format, and I didn’t want to bolt encryption and deduplication onto shell scripts. Restic does what I needed:

encrypted repositories by default;
a tag system, so the database dump and the Active Storage tree from the same run share a run: and pair up at restore time;
deduplication across runs, so a year of daily backups doesn’t grow linearly;
restic forget --prune for retention;
restic check for repository health;
S3-compatible storage, a restic REST server, or a local filesystem path, so you host the repository wherever fits.

It’s a single binary that drops cleanly into a Docker image, alongside the database client tools. Nothing extra to install on the Rails host. kamal-backup is the Rails- and Kamal-shaped layer on top, and restic does the cryptography, the storage, and the integrity checks.

Setting it up

Add the gem in development:

# Gemfile
group :development do
  gem "kamal-backup"
end

Run init. It creates config/kamal-backup.yml and prints an accessory block you paste into your Kamal deploy config:

bundle install
bundle exec kamal-backup init

config/kamal-backup.yml holds the backup settings:

accessory: backup
app_name: chatwithwork
database_adapter: postgres
database_url: postgres://chatwithwork@chatwithwork-db:5432/chatwithwork_production
backup_paths:
  - /data/storage
restic_repository: s3:https://s3.example.com/chatwithwork-backups
restic_init_if_missing: true
backup_schedule_seconds: 86400

Kamal mounts that file read-only into the accessory, so the accessory block in config/deploy.yml stays small. Only secrets live in env:

accessories:
  backup:
    image: ghcr.io/crmne/kamal-backup:latest
    host: chatwithwork.com
    files:
      - config/kamal-backup.yml:/app/config/kamal-backup.yml:ro
    env:
      secret:
        - PGPASSWORD
        - RESTIC_PASSWORD
        - AWS_ACCESS_KEY_ID
        - AWS_SECRET_ACCESS_KEY
    volumes:
      - "chatwithwork_storage:/data/storage:ro"
      - "chatwithwork_backup_state:/var/lib/kamal-backup"

Validate, boot, and watch the logs:

bundle exec kamal-backup validate
bin/kamal accessory boot backup
bin/kamal accessory logs backup

validate catches missing required settings before the accessory has to be running. Once it’s up, the container loops on kamal-backup schedule.

Then run the first backup and print evidence:

bundle exec kamal-backup backup
bundle exec kamal-backup list
bundle exec kamal-backup evidence

No cron glue. No separate backup host. No “remember to install restic on production.” The accessory image already has it.

Rails data, not just a database dump

A Rails app has two things worth backing up: the database, and file-backed Active Storage. kamal-backup handles both.

Postgres uses pg_dump. MySQL and MariaDB use mariadb-dump or mysqldump. SQLite uses sqlite3 .backup. File-backed Active Storage uses restic backup from mounted volumes.

Each run writes one database snapshot and one file snapshot, both tagged with app:, type:database or type:files, and the same run:. You pair them at restore time using that timestamp.

If your app stores Active Storage blobs directly in S3, there’s no mounted path for backup_paths to capture. kamal-backup still covers the database. The S3 side is on your bucket lifecycle and replication settings.

Restores are part of the product

The backup script is the easy part. The restore path is where most setups fail.

So kamal-backup ships with restore commands:

bundle exec kamal-backup restore local
bundle exec kamal-backup restore production

restore local pulls a production backup down to your laptop. Useful when you want to inspect real data, reproduce a production bug, or prove the backup actually comes back.

restore production prompts before it overwrites anything.

Restore drills

The command I care about most is drill.

bundle exec kamal-backup drill local \
  --check "bin/rails runner 'puts User.count'"

A drill means: restore, check, record the result.

Two modes:

drill local restores onto your machine and runs an optional check.
drill production restores into scratch production-side targets, never the live database.

That second one matters. For Postgres and MySQL, you give it a scratch database. For SQLite, a scratch file path. For Active Storage, a scratch restore directory. The drill uses production infrastructure, without pointing at live production.

That’s the difference between “the backup ran” and “we restored the latest production snapshot into a scratch target on April 30, ran this check, and it passed.”

Evidence for reviews

I went through a security review for Chat with Work this year. The questions were fair:

What’s being backed up?
Where does it go?
Is it encrypted?
When did the last backup run?
When did the last repository check run?
When was the last restore drill?
Can you prove all of that without leaking secrets?

kamal-backup evidence prints redacted JSON: current backup settings, latest snapshots, latest restic check, latest restore drill, retention settings, tool versions.

bundle exec kamal-backup evidence

Secrets are redacted. The output is meant to land in an internal ops record or a CASA packet. Not a screenshot of a green cron job. Actual evidence.

Try it

# Gemfile
gem "kamal-backup"

Docs at kamal-backup.dev, source on GitHub.

Ruby Concurrency: What Actually Happens

Tue, 28 Apr 2026 00:00:00 +0000

Since I wrote about async Ruby and patched Solid Queue to support fibers, people keep asking the same questions. What happens when a fiber blocks? Don’t you still need threads? What about database transactions? What about Ractors?

This post answers all of it. From the ground up.

The four primitives

Ruby gives you four concurrency primitives: processes, threads, fibers, and Ractors. They nest. Every process has an implicit “main Ractor” where your code runs by default, so you never have to think about Ractors unless you explicitly create one. Without Ractors, the hierarchy is simply process – threads – fibers. With Ractors, it becomes:

graph TD P[Process] --> R1["Ractor 1 (GVL 1)"] P --> R2["Ractor 2 (GVL 2)"] R1 --> T1[Thread 1] R1 --> T2[Thread 2] R2 --> T3[Thread 3] T1 --> F1[Fiber A] T1 --> F2[Fiber B] T2 --> F3[Fiber C] T3 --> F4[Fiber D] T3 --> F5[Fiber E] style P fill:#4a90a4,color:#fff style R1 fill:#c084fc,color:#fff style R2 fill:#c084fc,color:#fff style T1 fill:#7fb069,color:#fff style T2 fill:#7fb069,color:#fff style T3 fill:#7fb069,color:#fff style F1 fill:#e8a87c,color:#fff style F2 fill:#e8a87c,color:#fff style F3 fill:#e8a87c,color:#fff style F4 fill:#e8a87c,color:#fff style F5 fill:#e8a87c,color:#fff

Think of your computer as an office building.

Processes are fully isolated: separate offices, each with its own locked door, furniture, and files. Each process has its own memory, its own Ruby VM, and its own GVL. When you run Puma with 3 workers, you get 3 processes. They can’t corrupt each other’s state because they don’t share memory. The OS schedules them independently. The cost: each one loads your entire application into memory.

Ractors sit between processes and threads: offices that share a mailroom but not their filing cabinets. Each Ractor has its own GVL, so threads in different Ractors can execute Ruby code truly in parallel, but they can only pass notes to each other – no shared mutable objects. You communicate via message passing, copying or moving data between them. Every Ruby process has a “main Ractor” where all your code runs by default. Creating additional Ractors is opt-in.

Threads live inside a process and share its memory: workers sharing the same office, accessing the same filing cabinets, coordinating to avoid collisions. In CRuby, they are native threads, with the GVL deciding which one can execute Ruby code at a time. You don’t control when Ruby switches between them. The GVL releases during I/O, so two threads can wait on two different network calls simultaneously, but they can’t crunch numbers at the same time.

Fibers live inside a thread and are cooperatively scheduled: multiple tasks juggled by one worker at their desk. When they’re waiting for something – a phone call, a fax, a response – they set it aside and pick up the next task. A fiber runs until it explicitly yields. When it hits I/O – a network call, a database query, reading a file – it yields to the reactor, and another fiber picks up. No OS thread context switch for the fiber itself, no preemption. One thread can run thousands of fibers.

Here’s what that means for cost:

	Process	Ractor	Thread	Fiber
Memory	full app copy	~thread + Ractor state	~8MB virtual stack reservation	~4KB initial virtual stack, grows as needed
Creation time	~ms	~80μs	~80μs	~3μs
Context switch	kernel	kernel (threads within)	~1.3μs (kernel)	~0.1μs (userspace)
Isolation	Full (own memory)	Share-nothing (messages)	Shared memory	Shared thread
Parallelism	Yes	Yes (own GVL)	No (shared GVL)	No
I/O concurrency	Yes	Yes	Yes	Yes
Rails compatible	Yes	No	Yes	Yes

Creation and switching benchmarks are from Samuel Williams’ fiber-vs-thread performance comparison. Fibers create 20x faster and switch 10x faster than threads. The memory row is about virtual address space reserved by the platform/runtime, not resident memory. The benchmark reports actual RSS, where the gap is much smaller than the virtual stack numbers suggest. But the shape is still real: each thread is a kernel object with scheduler state and a stack reservation, while each fiber is scheduled in userspace. Ractors give you parallelism too, but can’t run Rails. Everything is a tradeoff.

How scheduling works

This is where most of the confusion lives. Let me show you what actually happens.

Thread scheduling

CRuby threads are native threads, but the GVL decides which one can run Ruby code. Your code has no say. A thread can be paused mid-calculation, mid-assignment, mid-anything.

sequenceDiagram participant VM as CRuby / OS participant T1 as Thread 1 participant T2 as Thread 2 participant LLM as LLM API VM->>T1: Run T1->>LLM: Send request Note over T1: Blocks in I/O (parked) VM->>T2: Run T2->>LLM: Send request Note over T2: Blocks in I/O (parked) Note over VM: Both threads parked LLM-->>T1: Response ready LLM-->>T2: Response ready VM->>T1: Wake and run Note over T1: Processing response VM->>VM: Time slice expired VM->>T2: Preempt T1, run T2 Note over T2: Processing response VM->>VM: Time slice expired VM->>T1: Resume T1 Note over T1: Finish response VM->>T2: Resume T2 Note over T2: Finish response

CRuby can switch runnable threads on a time slice, but a thread blocked in I/O is parked until the socket is ready. That part matters: threads do not spin uselessly while waiting for tokens. The switch happens when a thread is runnable – including in the middle of response processing, object allocation, assignment, or any other Ruby code.

For two threads doing I/O, this works fine. The overhead is noise. For 200 threads mostly waiting for LLM tokens, the problem is the one-operation-per-thread shape: 200 kernel threads, 200 stack reservations, 200 scheduler entries, and usually 200 copies of whatever per-thread application resources the worker holds.

This is also why a worker limit means different things in Solid Queue’s current thread mode and in the fiber mode from my patch. threads: 25 is both “run 25 jobs at once” and “create 25 kernel threads.” If all 25 jobs are streaming tokens, job 26 waits. fibers: 250 is mostly an admission limit for the reactor: run up to 250 jobs as fibers on the same thread, park the ones waiting on I/O, and resume them when ready. You still need limits because APIs, sockets, memory, and databases have limits. But the cap is no longer tied to one kernel thread per job.

Cooperative scheduling (fibers)

Fibers switch only when they choose to. In practice, the async gem makes this automatic: your code yields at I/O boundaries without you writing anything special.

sequenceDiagram participant R as Reactor participant F1 as Fiber 1 participant F2 as Fiber 2 participant LLM as LLM API R->>F1: Run F1->>LLM: Send request Note over F1: Yields (I/O wait) R->>F2: Run F2->>LLM: Send request Note over F2: Yields (I/O wait) Note over R: Both waiting, reactor sleeps LLM-->>F1: Response ready R->>F1: Resume immediately Note over F1: Processes response F1->>R: Done LLM-->>F2: Response ready R->>F2: Resume immediately Note over F2: Processes response F2->>R: Done

No OS thread context switch per fiber. No timer-based preemption between fibers. When a fiber yields, the reactor checks which fibers have I/O ready and resumes them. When nothing is ready, the reactor sleeps in the OS until something is. The kernel still does the I/O readiness work; Ruby just avoids one kernel thread per wait.

The GVL: why threads and fibers are more similar than you think

This is the part that makes thread-based Ruby less different from fiber-based Ruby than it first looks.

The GVL means only one thread can execute Ruby code at a time. Threads run in parallel only during I/O, when the GVL is released. So if your workload is I/O-bound – HTTP calls, database queries, LLM streaming – threads give you I/O concurrency, not parallelism.

Fibers give you the same I/O concurrency. One fiber yields at I/O, another picks up. The difference: fibers do it without kernel thread overhead, without the memory cost of a thread stack, and without making job concurrency itself imply one worker thread or one database slot per job.

If threads only help with I/O anyway, why pay their overhead?

There is one case where threads win: CPU-bound work that releases the GVL. Some C extensions (image processing, cryptographic operations) release the GVL while doing heavy computation. Multiple threads can then run those C extensions in parallel. Fibers can’t do that. They share a thread.

For actual Ruby-level CPU parallelism, you need processes or Ractors. Processes are production-ready and Rails-compatible. Ractors are lighter than processes, but still experimental.

What happens when a fiber hits I/O

This is the happy path and the most common question.

# Inside a fiber
response = Net::HTTP.get(URI("https://api.example.com/v1/completions"))

Here’s the full chain:

Net::HTTP opens a socket and sends the request
The socket isn’t readable yet (the server hasn’t responded)
Ruby calls rb_io_wait on the socket
The async gem’s Fiber.scheduler intercepts this call
The scheduler suspends the current fiber and registers the socket with the event loop
The reactor runs other fibers while this one sleeps
When the socket becomes readable, the reactor resumes this fiber
Net::HTTP reads the response as if nothing happened

Your code doesn’t change. No await, no callbacks, no promises. The same Net::HTTP.get call that works in a thread works in a fiber. The yield is invisible.

Bob Nystrom called this the function color problem in 2015. In languages with async/await, every function is either sync or async. An async function can only be called with await, and await can only live inside another async function. The color spreads upward through your entire call stack.

Python:

# Python: the color spreads, and you need different libraries
async def get_user(id):
    async with aiohttp.ClientSession() as session:  # can't use requests
        response = await session.get(f"/users/{id}")  # must await
        return await response.json()                   # must await

async def handle_request():  # must be async because it calls get_user
    user = await get_user(1)  # must await

You can’t use requests in async Python without blocking the event loop. You need aiohttp, httpx in async mode, or a thread wrapper. You can’t use the blocking psycopg2 API as async I/O; you need asyncpg or Psycopg’s async API. The ecosystem splits: sync libraries and async libraries, doing the same thing differently.

JavaScript:

// JavaScript: same problem, less severe (Node has fewer library splits)
async function getUser(id) {
  const response = await fetch(`/users/${id}`);  // must await
  return await response.json();                   // must await
}

async function handleRequest() {  // must be async
  const user = await getUser(1);  // must await
}

Ruby:

# Ruby: no color
def get_user(id)
  response = Net::HTTP.get(URI("https://api.example.com/users/#{id}"))  # just a normal call
  JSON.parse(response)                            # just a normal call
end

def handle_request
  user = get_user(1)  # just a normal call
end

Same Net::HTTP. Same pg. Same call stack, as long as the library uses scheduler-aware Ruby I/O. The fiber scheduler intercepts I/O at the Ruby runtime level, below your code. Your methods don’t know and don’t care whether they’re running in a thread or a fiber.

What happens when a fiber does CPU-bound work

# Inside a fiber
100_000.times { Digest::SHA256.hexdigest("work") }

This blocks the reactor. No other fiber runs until it finishes. There’s no I/O boundary to yield at, so the fiber holds the thread.

sequenceDiagram participant R as Reactor participant F1 as Fiber 1 (CPU) participant F2 as Fiber 2 (I/O) R->>F1: Run Note over F1,F2: F1 doing CPU work... Note over F2: Waiting to run Note over F1,F2: F1 still computing... Note over F2: Still waiting F1->>R: Done R->>F2: Finally runs

This is not a bug. It’s the current tradeoff of cooperative scheduling. Fibers are designed for I/O-bound work; CPU-bound work belongs on a thread, where CRuby can preempt it.

With my fiber-mode patch for Solid Queue, this is a configuration choice:

workers:
  - queues: [ chat, turbo, notifications ]
    fibers: 50       # I/O-bound: use fibers
  - queues: [ cpu ]
    threads: 2        # CPU-bound: use threads

One backend, two modes, matching the concurrency model to the workload.

What happens when a fiber queries the database

The pg gem has supported Fiber.scheduler since v1.3.0. When a fiber executes a query, the pg gem sends it non-blockingly via PQsendQuery, then calls rb_io_wait on the PostgreSQL socket. The scheduler intercepts this, suspends the fiber, and lets others run while PostgreSQL processes the query.

# Inside a fiber
user = User.find(42)  # yields while waiting for PostgreSQL

The fiber yields. Other fibers run. When PostgreSQL responds, the reactor resumes the fiber. Your code doesn’t know the difference.

Pool size follows database work

A database connection is busy until its query finishes. While PostgreSQL works, Ruby can run something else – another thread, or another fiber on the reactor – but that connection stays checked out.

For an LLM job, most of the wall time is not database time. Read a row, call an API, stream tokens, write a status update. The database touches are short. The long waits are external HTTP. So 100 jobs in flight does not mean 100 jobs hitting PostgreSQL at the same instant.

The reactor never preempts a fiber – it only switches when a fiber yields at an I/O boundary:

sequenceDiagram participant R as Reactor participant F1 as Fiber A participant F2 as Fiber B participant Pool as DB Pool (1 conn) participant PG as PostgreSQL participant HTTP as HTTP API R->>F1: Run F1->>Pool: Check out F1->>PG: SELECT * FROM users Note over F1: Yields (waiting for PG) R->>F2: Run F2->>HTTP: GET /api/data Note over F2: Yields (waiting for HTTP) PG-->>R: F1's result ready R->>F1: Resume F1->>Pool: Return F1->>R: Done HTTP-->>R: F2's result ready R->>F2: Resume F2->>Pool: Check out F2->>PG: UPDATE messages SET ... Note over F2: Yields (waiting for PG) PG-->>R: F2's result ready R->>F2: Resume F2->>Pool: Return F2->>R: Done

Read this as a timeline. Fiber A uses the only connection for its query. While PostgreSQL works, Fiber B waits on HTTP. After Fiber A returns the connection, Fiber B can use it for its update. If both fibers tried to query at the same time, one would wait unless the pool had another connection.

Active Record follows the same checkout rules in both cases. The current Solid Queue difference is a guardrail: thread mode expects threads + 2 connections per process, so you don’t run 50 execution threads against a 5-connection pool. Fiber mode can use a smaller baseline because fibers: 100 means “allow 100 jobs to wait,” not “create 100 execution threads.” In my patch, I/O-heavy workers often start at 3 connections per process (1 execution + 2 worker overhead). If the jobs are DB-heavy, raise it.

What happens when a fiber starts a transaction

A transaction changes the timeline. The connection cannot be returned after each statement, because the transaction state lives on that connection.

When a fiber starts a transaction, it keeps its checked-out connection for the entire duration – from BEGIN to COMMIT or ROLLBACK. The connection is not released mid-transaction. Other fibers that need the database wait for the connection to be returned.

sequenceDiagram participant R as Reactor participant F1 as Fiber A participant F2 as Fiber B participant Pool as DB Pool (1 conn) participant PG as PostgreSQL R->>F1: Run F1->>Pool: Check out F1->>PG: BEGIN F1->>PG: UPDATE accounts SET ... Note over F1: Yields (waiting for PG) R->>F2: Run F2->>Pool: Check out Note over F2: Waits (connection held by F1) PG-->>F1: Result R->>F1: Resume F1->>PG: COMMIT F1->>Pool: Return F1->>R: Done Pool->>F2: Connection available F2->>PG: SELECT * FROM accounts Note over F2: Yields (waiting for PG) PG-->>F2: Result R->>F2: Resume F2->>Pool: Return F2->>R: Done

Under fiber isolation (config.active_support.isolation_level = :fiber), Active Support’s execution state is fiber-scoped, so Active Record’s lease is associated with the current fiber instead of the surrounding thread. The connection still gets a real Monitor lock. No other fiber can touch it during a transaction.

Safe. No interleaving. Fiber B just waits.

For the target workload – LLM streaming, HTTP calls – database touches are short reads and status updates. Transactions are brief. The wait is negligible. If your jobs run long transactions, those jobs belong on a thread-based worker.

What happens when you have too many fibers

Fibers aren’t free. Each one uses memory (~4KB), and each one might hold open connections to external services. If you spawn 10,000 fibers that all hit the same API, you’re opening 10,000 connections to that API. The API will not be happy.

Async doesn’t eliminate resource limits; it changes where they show up. With threads, the limit is explicit: 25 threads, 25 concurrent jobs. With fibers, the limit is implicit: you keep going until something else breaks.

The fix is a semaphore. The FiberPool in my Solid Queue patch uses one:

semaphore = Async::Semaphore.new(size)

# Only `size` fibers run concurrently
semaphore.async do
  perform_job
end

When you configure fibers: 100 with the patch, that’s not “unlimited fibers.” It’s a semaphore capping concurrency at 100. You control the ceiling.

“Why not just configure more Solid Queue threads?”

In plain Ruby, more threads can be reasonable. In Solid Queue thread mode, threads: 200 means more than “allow 200 jobs to wait on I/O.”

Kernel threads are the expensive unit. Fibers don’t make I/O complete faster; they let you wait on far more of it at once for a fraction of the cost. Samuel Williams’ benchmarks show fibers allocate 20x faster (~3μs vs ~80μs) and switch 10x faster (~0.1μs vs ~1.3μs) than threads. The OS can manage thousands of threads, but scheduler state, stack reservations, wakeups, and GVL coordination make that a poor default concurrency knob.

Solid Queue currently enforces a database-pool guard. Today it expects threads + 2 database connections per process, so 200 threads across 2 processes won’t boot unless the pool is at least 404. That guard may be conservative for I/O-heavy jobs; there’s an open issue about making it advisory or bypassable. But it is still a guard you hit today.

A blocked job still occupies its worker thread. The OS can park an LLM streaming thread until the socket is ready, but in Solid Queue thread mode it still consumes one of the configured thread workers. If all 25 are streaming tokens, job 26 waits.

Fibers make the Solid Queue limit mean “how many jobs may wait at once” instead of “how many kernel threads should exist.” They still need limits, but the limit is no longer one kernel thread per waiting job.

“Why not Ractors?”

Ractors solve a different problem. Fibers give you I/O concurrency – many things waiting at once. Ractors give you CPU parallelism – many things computing at once.

Here’s what they look like:

# Two Ractors computing fibonacci in parallel
r1 = Ractor.new { fibonacci(38) }
r2 = Ractor.new { fibonacci(38) }

r1.value  # Ruby 4.0+
r2.value  # Both ran in parallel, each with their own GVL

Each Ractor has its own GVL, so they can execute Ruby code truly in parallel across CPU cores. The tradeoff: strict isolation. You can only share immutable (frozen) objects. Everything else gets copied or moved between Ractors via message passing. Access a mutable variable from an outer scope? Ractor::IsolationError.

When Ractors win, they win big. Fibonacci(38) five times: 0.68s with Ractors vs 2.26s sequential. 3.3x speedup. Real parallelism.

But they are not a practical answer for Rails jobs yet:

Still experimental in Ruby 4.0. Creating a Ractor still emits the experimental API warning.
Many gems don’t work without changes. Gems that rely on mutable constants, global variables, class variables, or shared process state can hit Ractor::IsolationError.
No Rails integration. ActiveRecord, ActionCable, the router, the logger – Rails is built on shared mutable state. None of it runs inside a Ractor.
No Ractor-based job queue exists.
Still active bug surface. The Ruby bug tracker still has Ractor-related issues, including recent crash reports.

For I/O concurrency, Ractors don’t help at all. Each Ractor still has threads constrained by its own GVL. Fibers within those threads still do the actual I/O multiplexing. Ractors add CPU parallelism, which is not what LLM streaming needs.

For Rails jobs that need CPU parallelism today, processes are still the boring answer. Puma already uses that model for web workers. Ractors may become useful for isolated CPU-heavy Ruby work, but they are not the answer to this Solid Queue I/O problem.

“Isn’t this just what JavaScript does?”

No. I showed the code comparison above. JavaScript’s async/await is a colored concurrency model: the async keyword spreads upward through every caller. Ruby’s fibers are colorless: your existing code works unchanged, and the scheduler handles yields below your code.

There’s a deeper difference too. JavaScript async/await runs on an event loop. Ruby fibers run on top of a multi-threaded runtime. You can have multiple Ruby threads, each running its own reactor with its own fibers, and mix fibers and threads in the same application. Node can run JavaScript in parallel with worker_threads, but that’s a worker/isolate model, not the same thing as putting multiple reactors inside ordinary application threads.

“Isn’t this just what Go does?”

Closer. Goroutines are lightweight, runtime-scheduled, and multiplexed across OS threads. Conceptually similar to Ruby fibers, but Go’s scheduler can also preempt goroutines.

Two differences:

Go has true parallelism. Goroutines run across multiple OS threads with no GVL equivalent. CPU-bound goroutines run in parallel. Ruby fibers don’t.
Ruby has existing code. If you have a Rails application with hundreds of thousands of lines of Ruby, you can add fiber-based concurrency without rewriting anything. Your models, your controllers, your views, your gems – they all work. With Go, you’re rewriting.

If you’re starting from scratch and need both I/O concurrency and CPU parallelism, Go is a strong choice. If you have a Ruby application and need I/O concurrency, fibers give you that without a rewrite.

“Fibers need `Async do` blocks. That’s still new syntax.”

Someone on Hacker News called this out: I said “no async/await” but the examples show Async do and .wait.

Here’s the actual change:

# Before
chat = RubyLLM.chat
response = chat.ask("Hello")

# After
Async do
  chat = RubyLLM.chat
  response = chat.ask("Hello")
end

Two lines of wrapping. Your application code inside doesn’t change. Your models don’t change. Your gems don’t change. Nothing gets a new keyword.

In Python, adopting async means rewriting every function signature in the call chain to async def, adding await to every call, and replacing or wrapping blocking libraries. requests becomes aiohttp or async httpx. Blocking database APIs become async database APIs. Your test framework changes. Your middleware changes. It’s a rewrite.

Two lines of wrapping vs. rewriting your stack. That’s not even the same conversation.

When to use what

flowchart TD A[What kind of work?] --> B{CPU-bound?} B -->|Yes| C{Need parallelism?} C -->|Yes| D{Rails?} D -->|Yes| E[Processes] D -->|No| H[Ractors] C -->|No| F[Threads] B -->|No| I[Fibers] style E fill:#4a90a4,color:#fff style H fill:#c084fc,color:#fff style F fill:#7fb069,color:#fff style I fill:#e8a87c,color:#fff

I/O-bound work (LLM streaming, HTTP calls, webhooks, email delivery): fibers. Low overhead, high concurrency, database connections sized to database work rather than waiting jobs.
CPU-bound work (image processing, data crunching, PDF generation): threads. CRuby can preempt them, and C extensions can release the GVL for parallelism.
CPU parallelism with Rails: processes. Each one gets its own GVL, its own memory, its own everything. Puma already does this.
CPU parallelism without Rails: Ractors (when they graduate from experimental). Lighter than processes, true parallelism, but strict isolation means most gems don’t work.
All of them at once: that’s what a well-configured Rails app does. Puma forks processes. Each process runs threads. Fibers run inside those threads for I/O-heavy jobs. They coexist.

# Solid Queue with the fiber-mode patch: all three working together
workers:
  - queues: [ chat, turbo ]
    fibers: 50        # I/O-bound: fibers
    processes: 2       # parallelism: processes
  - queues: [ pdf, images ]
    threads: 4         # CPU-bound: threads
    processes: 1

No single model is universally better. The right answer is matching the model to the workload.

This covers every “what happens when” question I’ve gotten so far. If I missed yours, find me on Twitter; I’ll either update this post or write a follow-up.

Making the Rails Default Job Queue Fiber-Based

Tue, 21 Apr 2026 00:00:00 +0000

Last year I moved the LLM streaming jobs in Chat with Work to Async::Job. It was fast. Genuinely fast. Fiber-based execution with Redis, thousands of concurrent jobs on a single thread. I was so convinced that I wrote a whole post about why async Ruby is the future for AI apps and recommended it to everyone.

Then I started hitting walls.

Async::Job doesn’t persist jobs. They go into Redis and they’re gone. Mission Control shows nothing. Background jobs in Rails are already quieter than the rest of your application – they fail without anyone noticing unless you go looking. Even with Honeybadger catching exceptions, I still want to see the full picture: which jobs are queued, which are running, which failed, what the system looks like right now. Without job persistence, you don’t get that.

Solid Queue is the default in Rails 8. Every new Rails app ships with it. When someone picks up Rails to build an LLM application and their 25-thread worker pool can only handle 25 concurrent streaming conversations, the answer shouldn’t be “swap your entire job backend.” It should be “change one line of config.”

So I opened a PR.

Threads vs fibers, quickly

If you already know this, skip ahead to the config.

Solid Queue runs each job on its own thread. Those threads can all query the database concurrently, so the worker has to be configured for that worst case, plus stack memory and kernel thread overhead. For a job that crunches data for 30 seconds, that’s fine – the thread is busy. For a job that streams an LLM response for 30 seconds but spends 99% of that time waiting for tokens, the thread is just sitting there holding resources.

Fibers sidestep much of this. Cooperatively scheduled, running in userspace on a single thread. When a fiber hits I/O – a network call, a database query, waiting for the next token – it steps aside and another fiber picks up. One thread, hundreds of concurrent jobs. No kernel thread overhead per job, and database pool sizing follows actual database concurrency rather than the number of jobs waiting on I/O. Rails 7.2+ helps ordinary Active Record code release connections after query operations, but that behavior is not fiber-specific. The async gem handles the yielding for you: your code yields at I/O boundaries without you changing anything.

For the full deep dive – processes, threads, fibers, the GVL, I/O multiplexing – see Async Ruby is the Future.

The switch

While the PR gets approved, you can point your Gemfile at the branch:

# Gemfile
gem "solid_queue", git: "https://github.com/crmne/solid_queue.git", branch: "async-worker-execution-mode"

Then one config change:

# config/solid_queue.yml
production:
  workers:
    - queues: ["*"]
      # threads: 10
      fibers: 100  # <- that's it
      processes: 2

Your jobs don’t change. Your queue doesn’t change. The worker runs them as fibers instead of threads.

threads or fibers. Pick one per worker. One more thing in your Rails app:

# config/application.rb
config.active_support.isolation_level = :fiber  # required for fibers

Fibers share a thread, so they need fiber-scoped state instead of the default thread-scoped state. The patch checks this at boot and tells you if it’s wrong.

Under the hood

The core of the patch is FiberPool. One thread, one async reactor, a semaphore capping concurrency at whatever number you set:

def start_reactor
  create_thread do
    Async do |task|
      semaphore = Async::Semaphore.new(size, parent: task)
      boot_queue << :ready

      wait_for_executions(semaphore)
      wait_for_inflight_executions
    end
  end
end

When the worker picks up jobs, it hands them to the pool. Each one becomes a fiber:

def schedule_pending_executions(semaphore)
  while execution = next_pending_execution
    semaphore.async(execution) do |_execution_task, scheduled_execution|
      perform_execution(scheduled_execution)
    end
  end
end

Each job runs as a fiber. When it hits I/O, it yields. The reactor picks up another fiber. One thread, hundreds of jobs, switching at I/O boundaries instead of depending on thread preemption.

CPU-bound work gets nothing from fibers. They don’t parallelize computation. But most of what job queues do is wait on I/O, and that’s exactly where fibers win. If a CPU-bound fiber blocks the reactor, Solid Queue’s supervisor still runs fine on its own process.

The database connection math

I wrote about this last year:

For 1000 concurrent conversations using traditional job queues like SolidQueue or Sidekiq, you’d need 1000 worker slots. That means 1000 kernel threads across your worker fleet, plus enough database pool capacity for whatever fraction of those jobs can hit the database at the same time. Even when the jobs are 99% idle waiting for streaming tokens, the thread resources are still reserved.

That framing is about worker resources, not a special Active Record rule. Active Record 7.2 connection handling is not different for threads and fibers; the important part in the patch is Solid Queue’s worker-pool sizing and the amount of simultaneous database work. Here’s the actual math from the patch.

A Solid Queue worker needs database connections for three things: polling for jobs, heartbeats, and running jobs. In thread mode, the configured concurrency is threads, and Solid Queue’s current guard treats each execution thread as potentially needing its own connection, plus two for the worker itself. That’s threads + 2. Actual Active Record usage may be lower for jobs that only touch the database in short bursts, but the configured pool still has to satisfy the guard.

With fibers, all job fibers run on one reactor thread, and the patch sizes the execution side for expected database concurrency instead of job concurrency. For the common LLM job shape – long waits, short database bursts – the minimum is often 1 + 2 = 3: one execution connection, plus two for the worker itself. If your jobs are DB-heavy, use long transactions, or pin connections with APIs like ActiveRecord::Base.connection, increase the pool and fibers will check out separate connections concurrently, just like threads.

Same job concurrency, very different configured pool requirements:

Concurrent jobs	Thread-mode DB pool guard (per process)	Fiber-mode baseline (per process)
10	12	3
25	27	3
50	52	3
100	102	3
200	202	3

The thread-mode guard scales linearly. The fiber-mode baseline stays flat for I/O-heavy jobs. Multiply by the number of worker processes and the gap gets dramatic: 6 processes with 50 concurrent jobs means 312 configured connections for thread mode, 18 for fiber. PostgreSQL’s default max_connections is 100.

The patch detects your Rails version and calculates the right pool size automatically.

The benchmarks below use two pool policies. The primary Solid Queue comparison deliberately gives both modes the same pool, DB_POOL = concurrency + 5 per worker process, so it measures the executor instead of measuring pool starvation. The stress suite uses mode-specific pools to show the operational failure envelope under higher connection demand.

The benchmarks

I reran the benchmark suite on April 28, 2026. The headline Solid Queue comparison covers four workloads across per-process concurrency 5, 10, 25, 50, and 100; process counts 1, 2, and 6; and both execution modes. Three runs per cell, median real run reported, with total concurrency capped at 60 so the main comparison stays about executor behavior.

The workloads:

Sleep: 50ms Kernel.sleep. Pure cooperative wait. The I/O upper bound.
Async HTTP: HTTP request to a local server with 50ms delay via Async::HTTP. Real fiber-friendly I/O.
CPU: 50,000 SHA256 iterations. Pure computation. The control.
RubyLLM Stream: Actual RubyLLM chat completion through a fake OpenAI SSE endpoint, with token-by-token Turbo Stream broadcasts. 40 tokens at 20ms each. The closest thing to a production AI workload you can benchmark repeatably.

Results

Workload	Best throughput	Avg paired delta	Best paired delta
RubyLLM Stream	fiber, 7.01 j/s	+11.9%	+21.8%
Async HTTP	fiber, 492.82 j/s	+9.5%	+25.5%
Sleep	fiber, 500.50 j/s	+7.4%	+15.9%
CPU	fiber, 110.02 j/s	+0.6%	+2.4%

RubyLLM Stream is the workload that matters. It runs an actual RubyLLM chat completion with streaming, database writes, and Turbo broadcasts per token – the same thing Chat with Work does in production. Fiber wins every single paired experiment there: 9 out of 9.

The CPU row is the control. Fibers don’t help computation, and the average confirms it: essentially flat. That’s how you know the I/O gains are real and not measurement noise.

That table shows the best observed point and the paired-cell deltas. Here’s the full spread. Some configurations favor threads for synthetic workloads, but the paired averages are the steadier signal: fiber wins the I/O workloads, and RubyLLM Stream always favors fiber.

The newer suite also adds database-shaped workloads. With matched pools, short DB bursts still favor fiber: db_queries averages +12.6%, and a read/API/write mix averages +6.9%. The transaction case is the useful caveat: when each job pins a connection for the whole transaction, fiber still averages +3.5%, but the win is less consistent. That’s exactly the workload where you should be more careful with pool sizing.

Thread mode hit the wall

Those benchmarks cap total concurrency at 60. I wanted to see what breaks when you push past that, so I ran a stress suite: per-process concurrency 25, 50, 100, 150, and 200; process counts 2 and 6; three runs per cell. Read this as a current Solid Queue failure-envelope test, not a universal law about threads and fibers.

The result is stark. Thread mode only completed the smallest cell for each workload. Fiber mode completed every planned cell.

Workload	Thread cells completed	Fiber cells completed
Sleep	1/10	10/10
Async HTTP	1/10	10/10
RubyLLM Stream	1/10	10/10

PostgreSQL’s default max_connections is 100. In this stress run, thread mode at concurrency 50 with 2 processes asked for 110 worker-pool connections. With 6 processes, even concurrency 25 asked for 180. The one surviving thread cell was the smallest: concurrency 25, 2 processes.

Fiber mode in the stress suite used a smaller mode-specific pool: 6 connections per process for 2-process runs, 10 per process for 6-process runs. That is 60 worker-pool connections at concurrency 200 across 6 processes, while thread mode would ask for 1,230. The exact constants are benchmark policy, but the shape is the point for this worker design: thread mode’s required configured pool scales with thread concurrency; fiber mode’s baseline scales with worker process overhead plus actual database concurrency.

One backend, two modes

Fiber mode isn’t universally better. CPU-bound jobs get nothing from it. C extensions that aren’t fiber-safe won’t work. And that’s fine – you don’t have to pick one.

As Trevor Turk pointed out in the PR discussion, that’s the whole point: separately configured worker pools. Here’s what Chat with Work actually runs in production:

workers:
  - queues: [ chat ]
    fibers: 10
    processes: 2
    polling_interval: 0.1
  - queues: [ turbo ]
    fibers: 10
    processes: 1
    polling_interval: 0.05
  - queues: [ notifications, default, maintenance ]
    fibers: 5
    processes: 1
    polling_interval: 0.2
  - queues: [ cpu ]
    threads: 1
    processes: 1

Almost everything uses fibers. LLM streaming, Turbo broadcasts, notifications, maintenance jobs – all fiber-based. Only the cpu queue uses threads, and right now it’s just one thread for the occasional heavy extraction. One backend. One deployment. Mission Control shows all of it.

Instead of running Solid Queue and Async::Job side by side – two processors, two configurations, two sets of things to monitor – you run one. I moved Chat with Work to this setup, and Brad Gessler has been running it in production too.

Async::Job is actually faster if you compare raw throughput against Redis. It is a backend comparison, not a Solid Queue executor comparison, but the ceiling is useful:

Workload	Solid Queue fiber best	Async::Job best	Delta
RubyLLM Stream	7.01 j/s	16.94 j/s	+141.7%
Async HTTP	492.82 j/s	652.96 j/s	+32.5%
Sleep	500.50 j/s	644.98 j/s	+28.9%
CPU	110.02 j/s	125.75 j/s	+14.3%

If you want raw speed and don’t need persistence, Async::Job is the right call. But if you want job visibility, failure tracking, retries, Mission Control, everything Rails gives you out of the box, fiber mode gets you there. Same concurrency. Database connections sized to database work, not waiting jobs. You set fibers: N and keep building.

The PR is up on GitHub. The benchmark suite is open source. Run your own numbers, or challenge mine.

Your Agent's Context Window Is Not a Junk Drawer

Tue, 07 Apr 2026 00:00:00 +0000

Your agent’s context window is the most precious resource it has. The more you stuff into it, the worse your agent performs.

Researchers call it context rot: the more tokens in the window, the harder it becomes for the model to follow instructions, retrieve information, and stay on task. Chroma tested 18 frontier models and found that accuracy drops up to 30% when you go from a focused 300-token input to 113k tokens of conversation history, with the task held constant. The model essentially became dumber.

This holds true regardless of how big the window is, yet most agent setups treat the context window like a junk drawer.

“Just toss it in there, the LLM will figure it out!”

MCP: the biggest offender

Don’t get me wrong. MCP is a fine idea. You need to talk to a service? Grab an MCP server, plug it in, and you’re running in ten minutes. For prototyping, for exploration, for answering “is this even worth building?”, it’s great.

The problem is what happens next. Which is: nothing.

People leave the MCP servers plugged in. They add more. Every MCP server you connect dumps tool descriptions, schemas, and instructions into your context. You didn’t write those. You didn’t optimize them. You probably haven’t even read them. You’re handing over a chunk of your context window to whatever some third party decided to shove in there.

Say you need a tool that checks the weather. You could plug in an MCP server and get dozens of tool descriptions, parameter schemas, and whatever instructions its author decided to write. Or you could write this:

class Weather < RubyLLM::Tool
  description "Gets current weather for a location"

  param :latitude, desc: "Latitude (e.g., 52.5200)"
  param :longitude, desc: "Longitude (e.g., 13.4050)"

  def execute(latitude:, longitude:)
    url = "https://api.open-meteo.com/v1/forecast?latitude=#{latitude}&longitude=#{longitude}¤t=temperature_2m,wind_speed_10m"
    Faraday.get(url).body
  rescue => e
    { error: e.message }
  end
end

Twelve lines of RubyLLM. You wrote the description, so you know exactly what tokens are going into your context. You wrote the parameters, so the model gets precisely the interface it needs, no more. You own it, you can tune it, and nobody can inject anything into your agent’s brain through it.

Use MCP to prototype. Then replace it with crafted tools you actually control.

Tool responses are context too

Your RAG retrieves ten full documents when the model needs a paragraph. Your API call returns a massive JSON blob when the model needs two fields. You’re paying for every one of those tokens with your agent’s IQ.

The fix is progressive disclosure. At Chat with Work, when the agent searches your Google Drive, we don’t dump entire files into context. The search tool returns only some metadata and a single line from the file, the line that matched the search keywords. Fifty results, fifty lines. The AI reads those, decides which files actually matter, and only then reads them. If a file is too large, it reads it in chunks. At every step, the model is only looking at what it needs.

The same principle applies to any tool. Don’t return everything. Return enough for the model to decide what to look at next.

Your instructions are context too

Then there’s the stuff you wrote yourself. Your system prompt is context. Your tool descriptions are context. Your parameter schemas are context. Every edge case, every guardrail, every overly detailed description competes for attention. You think you’re being thorough. You’re actually drowning the instructions that matter in a sea of instructions that don’t. A focused system prompt will outperform an exhaustive one every time.

Tool count is context too

You hand-crafted 40 beautiful tools. Your agent needs 5 for this task. The other 35 sit in context doing nothing except making the model slower at picking the right one.

Don’t register every tool your agent might ever need. Load the tools the current task actually requires. If you’re building a support agent that handles billing and technical issues, don’t give it all of both. Route billing questions to a billing agent and technical questions to a technical agent. Two focused agents will outperform one bloated one.

Every token should earn its place

The context window is not a junk drawer. It’s a workbench. Everything on it should be there for a reason, and you should be able to say what that reason is.

So before you plug in another MCP server, add another RAG source, or write another paragraph in your system prompt, ask yourself one question: is this worth making my agent dumber?

I Built a Monitor Configuration Tool for Hyprland

Tue, 31 Mar 2026 00:00:00 +0000

Configuring monitors in Hyprland means writing monitor= lines by hand. A 4K display at 1.33x scale is effectively 2880x1620 pixels, so the monitor next to it needs to start at x=2880. Vertically centering a 1080p panel against it means doing division in your head to get the y-offset right. You reload, you’re off by 40 pixels, you edit, you reload again. There’s no visual feedback until after you’ve committed to a config.

Then it gets worse. You unplug your laptop, go to a conference, plug into a projector, and you’re back to editing config files backstage before your talk. You come home, dock the laptop, and the layout is wrong again.

I looked at what was available. The closest to what I wanted was Monique: spatial editor, profiles, workspace management, a hotplug daemon. It does exactly what I need. But it’s a GTK4 GUI that pulls in Python and a stack of dependencies, and the daemon was broken when I tried it. The other tools each cover parts of this: kanshi does profiles and auto-switching but has no editor, you write config files; nwg-displays and HyprMon have spatial editors but no daemon; HyprDynamicMonitors has a daemon but no real layout tool, and it pulls in UPower and D-Bus.

I wanted Monique’s feature set without the dependency baggage, in something that works over SSH when your monitors are broken. So I built hyprmoncfg.

A real spatial editor, in your terminal

The TUI is the thing I’m most proud of. It’s not a config editor with a preview pane. It’s a full spatial layout tool.

The left side is a canvas where your monitors are drawn as rectangles, proportional to their resolution. You click one to select it, drag it to move it. Monitors snap to each other’s edges as you position them, just like arranging windows in a GUI display manager. Arrow keys give you fine control: 100px per step, Shift for 10px, Ctrl for 1px.

The right side is a per-monitor inspector. Pick a resolution and refresh rate from a scrollable list. Set scale, position, transform, VRR, mirroring. All inline, no dialogs within dialogs. A third tab handles workspace planning.

And because it’s a TUI: it works over SSH. When your monitor configuration is broken and you can’t see anything, you can SSH into the machine and fix it. Try that with a GTK app.

Safe apply with automatic revert

Every apply, whether from the TUI or the daemon, follows the same path: write monitors.conf atomically (temp file + rename, no corruption), reload Hyprland, re-read the actual monitor state, and verify the result matches what was requested.

Then it gives you 10 seconds to confirm. If you don’t, maybe because the layout left you staring at a black screen, it reverts automatically. No stuck monitors. No reaching for a second machine to undo the damage.

This is the same apply engine everywhere. The TUI and the daemon share identical code. If it works when you test it interactively, it works when the daemon fires at 2am because you bumped your dock cable.

Workspace planning

Monitor configuration and workspace assignment are the same problem. If you’re rearranging monitors, you probably want workspaces to follow. hyprmoncfg has a workspace planner built into its third tab, with three strategies:

Sequential: Groups in chunks. Workspaces 1-3 on monitor A, 4-6 on monitor B.
Interleave: Round-robins. 1→A, 2→B, 3→A, 4→B.
Manual: Explicit per-workspace rules when you want full control.

Workspace assignments are stored inside each profile and applied together with the layout. Switch profiles, switch workspace distribution. One operation.

Source-chain verification

Here’s something no other tool does. Before writing anything, hyprmoncfg parses your hyprland.conf and verifies it actually sources the target monitors.conf. If it doesn’t, it refuses to write.

Other tools skip this check. They silently update a file that Hyprland never reads. You spend twenty minutes debugging why nothing changed, only to realize the file was never sourced. I lost an evening to this once. Never again.

Dotfiles integration

Profiles are stored as JSON files in ~/.config/hyprmoncfg/profiles/, one per profile. The generated monitors.conf is a build artifact, you don’t commit it. You commit the profiles.

chezmoi add ~/.config/hyprmoncfg

Save a “desk” profile at home with your ultrawide. Save “conference-1080p” at one venue. Save “conference-4k” at another. Sync them across machines via your dotfiles. The daemon matches profiles to connected hardware automatically. Arrive somewhere, plug in, and the right layout applies.

This is portable. The same profile library works across machines because matching is based on the monitors you have, not on the machine you’re at.

One runtime dependency: Hyprland

Two compiled Go binaries. No Python, no GTK, no GObject introspection, no D-Bus, no UPower. Install them and you’re done. The only runtime requirement is Hyprland itself.

How it compares

	hyprmoncfg	Monique	HyprDynamicMonitors	HyprMon	nwg-displays	kanshi
GUI or TUI	TUI	GUI	TUI	TUI	GUI	CLI
Spatial layout editor	Yes	Yes	Partial	Yes	Yes	No
Drag-and-drop	Yes	Yes	No	Yes	Yes	No
Snapping	Yes	Not documented	No	Yes	Yes	No
Profiles	Yes	Yes	Yes	Yes	No	Yes
Auto-switching daemon	Yes	Yes	Yes	No (roadmap)	No	Yes
Workspace planning	Yes	Yes	No	No	Basic	No
Mirror support	Yes	Yes	Yes	Yes	Yes	No
Safe apply with revert	Yes	Yes	No	Partial (manual rollback)	No	No
Source-chain verification	Yes	No	No	No	No	No
Additional runtime dependencies	None	Python + GTK4 + libadwaita	UPower, D-Bus	None	Python + GTK3	None

Try it

On Arch:

yay -S hyprmoncfg

Or build from source:

go install github.com/crmne/hyprmoncfg/cmd/hyprmoncfg@latest
go install github.com/crmne/hyprmoncfg/cmd/hyprmoncfgd@latest

Check out the documentation for the full guide, or browse the source on GitHub.

Comb Shaped Slices

Tue, 24 Mar 2026 00:00:00 +0000

A friend who’s built and shut down companies in this space sat across from me at breakfast during a conference recently. He knows what I’m building: Chat with Work, an AI tool that lets you talk to your actual work data. He wanted to know what my plan was. I think he was a bit concerned.

“Add more integrations, finish the security assessment, market it well.” I said.

That didn’t help. “All those LLM providers are going to eat the whole market. They’ll ship every integration you can think of. If you want a slice of the pie, you need to pick a vertical and own it.”

I told him I was going to grab a T shaped slice of the pie instead.

He looked at me like I’d lost it.

Here’s the thing about the “pick a vertical” advice: it’s not wrong. It’s just not the only way. And for a lot of small software companies, it’s a trap dressed up as strategy.

The conventional wisdom goes like this: the market is huge, the big players are coming, so you’d better find your little corner and defend it. Specialize. Go deep. Become the AI assistant for dentists in Luxembourg or the knowledge tool for corporate lawyers in Berlin-Brandenburg. Calculate your total addressable market. Build a defensible moat. Make investors happy.

But what if you don’t care about making investors happy? Most companies don’t need investors. What if you just want to build something good?

The comb

I said T shaped in the moment. One horizontal, one vertical. But the more I thought about it, the more teeth it grew. Less like a T, more like a comb.

Here’s why. When you’re OpenAI or Google, you sample from the top of the distribution. You build what most people use first, then work your way down. The result is always the same: a broad horizontal platform that serves everyone and surprises no one.

When you’re small, you sample from what’s right in front of you. You build for yourself because no amount of user research, design thinking, or theory of mind will ever match the depth of actually needing the thing you’re making. You understand your own problems in a way that connects to your emotions, your workflow, your instincts. You can’t fake that. You can’t interview your way to it. I chose fast onboarding over full sync, because I don’t want to wait to start working. Nextcloud, Todoist, IMAP, and CalDAV: that’s my stack, so that’s where I’ll go deep next.

Then you listen to your customers. “This is cool, but I use Slack.” So you build that too. A team needs to own their data, so you add on-premises installation. Someone uses Basecamp, and you build that integration because the people behind it think like you. One tooth at a time.

The shape that emerges is yours. Not because you planned it on a whiteboard, but because you started from yourself and grew outward. It works for the small teams, the freelancers, the music collectives, the people who don’t have an IT department and don’t want one. That’s the comb: not a strategy you choose, but what naturally happens when you’re small and you give a damn.

There’s a reason people still choose Linear over Jira, or Proton over Gmail, or Plausible over Google Analytics. It’s not because the small player has more features. It’s because someone built it for themselves first, and that resonated. The entire market doesn’t need to resonate with you. Just enough of it.

So yes, the big players are coming. They’re going to ship a lot of integrations. They’re going to spend a lot of money. And they’re going to build software that feels like it was built by a company that spends a lot of money.

I’ll be over here, grabbing my comb shaped slice of pie. It’s Plenty.

Today also happens to be the day I officially founded Plenty. The papers are signed. The comb is real!

Ruby Deserves Beautiful Documentation

Thu, 19 Mar 2026 00:00:00 +0000

Have you ever looked at a VitePress documentation site and felt a little jealous?

The sidebar navigation. The “On this page” outline on the right. The search that pops up with /. The homepage that actually looks like a product page, not a README with a nav bar. Dark mode that just works. Code blocks with copy buttons and language labels. It all looks like someone sat down and designed the whole experience.

Because someone did. VitePress is genuinely great. And Ruby developers know it, because some of the most visible projects in our community are shipping their docs on VitePress. Not on a Jekyll theme, not on a Ruby tool. On a JavaScript static site generator built for Vue.

I don’t blame them. I looked at what we had in the Jekyll ecosystem and understood immediately. The best option is Just the Docs, and I’ve been using it for RubyLLM. It’s solid. But I had to patch in proper dark mode support that follows the browser setting. I had to add a copy-page button. The homepage layout is narrow and document-y. It works. It doesn’t wow.

So I built Jekyll VitePress Theme.

What It Is

A Jekyll theme gem that recreates the VitePress documentation experience. Everything you’d expect:

Top nav with mobile menu
Left sidebar, right “On this page” outline
Homepage layout with hero section and feature cards
Built-in local search (press / or Cmd+K)
Dark/light/auto appearance toggle
Code blocks with copy buttons, language labels, and file title bars
Doc footer with edit link, previous/next pager, and “last updated”
GitHub star widget
Rouge syntax highlighting with separate light and dark themes

All configured through _config.yml and _data/*.yml files. No JavaScript toolchain. No Node.js. Just Jekyll.

Getting Started

gem "jekyll-vitepress-theme"

theme: jekyll-vitepress-theme
plugins:
  - jekyll-vitepress-theme

jekyll_vitepress:
  branding:
    site_title: My Project

bundle install
bundle exec jekyll serve --livereload

That’s it. Your docs site now looks like VitePress. Customize the nav, sidebar, colors, fonts, and everything else from the configuration reference.

Why This Matters

When I came back to Ruby in 2024, I kept finding things that could be better. There wasn’t a great LLM library, so I built RubyLLM. Async deserved more attention, so I blogged about it. And our documentation sites? They didn’t look the part.

In open source, looks matter. A beautiful docs site tells potential users: this project is serious, maintained, and worth your time. It lowers the barrier to adoption. It makes people want to try your library.

VitePress understood this. Now Jekyll has it too.

gem "jekyll-vitepress-theme", "~> 1.0"

RubyLLM 1.14: From Zero to AI Chat App in Under Two Minutes

Wed, 18 Mar 2026 00:00:00 +0000

RubyLLM 1.14 ships a full chat UI generator. Two commands and you have a working AI chat app with Turbo streaming, model selection, and tool call display, in under two minutes. The demo above shows the whole thing: new Rails app to working chat in 1:46, including trying it out.

Why This Matters

RubyLLM turned one last week. 1.0 shipped on March 11, 2025 with Rails integration from day one: ActiveRecord models, acts_as_chat, Turbo streaming, persistence out of the box. 1.4 added the install generator. 1.7 brought the first scaffold chat UI with Turbo Streams. 1.12 introduced agents with prompt conventions. Each release got closer to the same thing: AI that works the way Rails works.

1.14 fully realizes that goal. A beautiful Tailwind chat UI (with automatic fallback to scaffold if you’re not using Tailwind). Generators for agents and tools. Conventional directories for everything. All of it extracted from Chat with Work, where it’s been running in production for months.

What You Get

Two generators. That’s it.

bin/rails generate ruby_llm:install
bin/rails generate ruby_llm:chat_ui

Your app now has this structure:

app/
├── agents/
├── controllers/
│   ├── chats_controller.rb
│   └── messages_controller.rb
├── helpers/
│   └── messages_helper.rb
├── jobs/
│   └── chat_response_job.rb
├── models/
│   ├── chat.rb
│   ├── message.rb
│   ├── model.rb
│   └── tool_call.rb
├── prompts/
├── schemas/
├── tools/
└── views/
    ├── chats/
    │   ├── index.html.erb
    │   ├── show.html.erb
    │   └── _chat.html.erb
    └── messages/
        ├── _assistant.html.erb
        ├── _user.html.erb
        ├── _tool.html.erb
        ├── _error.html.erb
        ├── create.turbo_stream.erb
        ├── tool_calls/
        │   └── _default.html.erb
        └── tool_results/
            └── _default.html.erb

Separate partials for each message role. Turbo Stream templates for real-time updates via broadcasts_to. A background job that handles the AI response. Tool calls and tool results each get their own rendering pipeline. A complete Tailwind chat interface, not a scaffold you need to fight with.

Full Tutorial: New App from Scratch

If you want to start from zero, this is what the demo shows. The whole thing takes just a minute.

rails new chat_app --css tailwind
cd chat_app
bundle add ruby_llm
bin/rails generate ruby_llm:install
bin/rails generate ruby_llm:chat_ui
bin/rails db:migrate
bin/rails ruby_llm:load_models
bin/dev

That’s a new Rails app with Tailwind, RubyLLM installed, the chat UI generated, the database set up, models loaded, and the server running. Open localhost:3000/chats and start talking to an AI.

Generators for Agents, Tools, and Schemas

Now the fun part. You scaffold agents, tools, and schemas the same way you’d scaffold anything else in Rails:

bin/rails generate ruby_llm:agent SupportAgent

app/
├── agents/
│   └── support_agent.rb
└── prompts/
    └── support_agent/
        └── instructions.txt.erb

The agent class comes with the 1.12 DSL ready to go. The instructions file is an ERB template for your system prompt, so you can version it, review it in PRs, and template it with runtime context.

bin/rails generate ruby_llm:tool WeatherTool

app/
├── tools/
│   └── weather_tool.rb
└── views/
    └── messages/
        ├── tool_calls/
        │   └── _weather.html.erb
        └── tool_results/
            └── _weather.html.erb

Each tool gets its own partials for rendering calls and results. Show a weather widget for the weather tool, a search results list for a search tool, all through Rails partials.

bin/rails generate ruby_llm:schema Product

app/
└── schemas/
    └── product_schema.rb

This creates a schema for structured output validation.

More on all of this in the Rails integration docs, and the dedicated guides for agents and tools.

Self-Registering Provider Config

For people building provider gems: providers now register their own configuration options instead of patching a monolithic Configuration class.

class DeepSeek < RubyLLM::Provider
  class << self
    def configuration_options
      %i[deepseek_api_key deepseek_api_base]
    end
  end
end

When the provider is registered, its options become attr_accessors on RubyLLM::Configuration automatically. Third-party gems can add their config keys without touching the core.

Bug Fixes

Faraday logging memory bloat: logging no longer serializes large payloads (like base64-encoded PDFs) when the log level is above DEBUG.
Agent assume_model_exists propagation: setting this on the agent class now actually works.
Renamed model associations: foreign key references with acts_as helpers are fixed.
MySQL/MariaDB compatibility: JSON column defaults work correctly now.
Error.new with string argument: no longer raises a NoMethodError.

Full list in the release notes.

gem 'ruby_llm', '~> 1.14'

Ruby Is the Best Language for Building AI Apps

Fri, 20 Feb 2026 00:00:00 +0000

If your goal is to ship AI applications in 2026, Ruby is the best language to do it.

The AI Training Ecosystem Is Irrelevant

Python owns model training. PyTorch, TensorFlow, the entire notebooks-and-papers gravity well. Nobody disputes that.

But you’re not training LLMs. Almost nobody is. Each training run costs millions of dollars. The dataset is the internet!

This is what AI development today looks like:

curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"model": "gpt-5.2", "messages": [{"role": "user", "content": "Hello"}]}'

That’s it. An HTTP call.

The entire Python ML stack is irrelevant to achieve this. What matters is everything around it: streaming responses to users, persisting conversations, tracking costs, switching providers when pricing changes.

That’s web application engineering. That’s where Ruby and Rails shine like no other.

“You Need a Complex Agent Framework or You’re Not Doing Real AI”

Bullshit.

You need a beautiful, truly provider-independent API. Let me show you.

Python vs JavaScript vs Ruby LLM Libraries

Simple chat

Python (LangChain):

from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage

model = init_chat_model("gpt-5.2", model_provider="openai")
response = model.invoke([HumanMessage("Hello!")])

You need to specify the provider, create an array of messages that need to be instantiated, etc.

That’s ceremony.

JavaScript (AI SDK):

import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';

const { text } = await generateText({
  model: openai('gpt-5.2'),
  prompt: 'Hello!',
});

What if you want to use a model from another provider?

Ruby (RubyLLM):

require 'ruby_llm'

RubyLLM.chat.ask "Hello!"

Reads like it should.

Token usage tracking

If you’re running AI in production, you need to track token usage. This is how you price your app.

LangChain (GPT):

response = model.invoke([HumanMessage("Hello!")])
response.response_metadata['token_usage']
# {'completion_tokens': 12, 'prompt_tokens': 8, 'total_tokens': 20}

LangChain (Claude):

response.response_metadata['usage']
# {'input_tokens': 8, 'output_tokens': 12}

Different key and different structure!

LangChain (Gemini):

response.response_metadata
# ...nothing...

It’s not even there!

RubyLLM:

response.tokens.input   # => 8
response.tokens.output  # => 12

Same interface. Every provider. Every model.

Agents

This is where it gets fun.

Python (LangChain):

from langchain_openai import ChatOpenAI
from langchain.agents import create_agent

model = ChatOpenAI(model="gpt-5-nano")

graph = create_agent(
    model=model,
    tools=[search_docs, lookup_account],
    system_prompt="You are a concise support assistant",
)

inputs = {"messages": [{"role": "user", "content": "How do I reset my API key?"}]}

for chunk in graph.stream(inputs, stream_mode="updates"):
    print(chunk)

JavaScript (AI SDK 6):

import { ToolLoopAgent } from 'ai';
import { openai } from '@ai-sdk/openai';

const supportAgent = new ToolLoopAgent({
  model: openai('gpt-5-nano'),
  system: 'You are a concise support assistant.',
  tools: { searchDocs, lookupAccount },
});

const { text } = await supportAgent.generateText({
  messages: [{ role: 'user', content: 'How do I reset my API key?' }],
});

Ruby (RubyLLM):

require 'ruby_llm'

class SupportAgent < RubyLLM::Agent
  model "gpt-5-nano"
  instructions "You are a concise support assistant."
  tools SearchDocs, LookupAccount
end

SupportAgent.new.ask "How do I reset my API key?"

Pure joy.

It’s About Cognitive Overhead

This isn’t just about aesthetics.

It’s about cognitive overhead: how many abstractions, how many provider-specific details, how many different data structures you need to hold in your head instead of focusing on what really matters: prompts and tool design.

Low cognitive overhead compounds: faster onboarding, fewer accidental bugs, easier refactors, and cleaner debugging when production explodes at 2AM.

Ruby’s advantage here is cultural: elegant APIs are treated as first-class engineering work, not icing on the cake.

Rails Gives You the Rest of the Product for Free

Model calls are only a small chunk of your code. The rest makes up the bulk of it: auth, billing, background jobs, streaming UI, persistence, admin screens, observability, even native apps.

Rails gives you a beautiful, coherent answer for all of it.

With RubyLLM + Rails, the core streaming loop is tiny:

class ChatResponseJob < ApplicationJob
  def perform(chat_id, content)
    chat = Chat.find(chat_id)

    chat.ask(content) do |chunk|
      message = chat.messages.last
      message.broadcast_append_chunk(chunk.content) if chunk.content.present?
    end
  end
end

And on the model side:

class Chat < ApplicationRecord
  acts_as_chat
end

class Message < ApplicationRecord
  acts_as_message
  has_many_attached :attachments
end

This gives you streaming chunks to your web app and persistence in your DB in absurdly few lines of code.

It Scales

“Ruby can’t handle AI scale.”

Wrong.

LLM workloads are mostly network-bound and streaming-bound. That’s exactly where Ruby’s Async ecosystem shines. Fibers let you handle high concurrency without thread explosion and resource waste. No need to plaster the code with async/await keywords. RubyLLM became concurrent with 0 code changes.

I wrote a deep dive here: Async Ruby is the Future of AI Apps (And It’s Already Here)

Don’t Take My Word for It

Someone ported RubyLLM’s API design to JavaScript as NodeLLM. Same design. Clean code, good docs.

The JavaScript community’s response: zero upvotes on Reddit. 14 GitHub stars. Top comments: “How’s this different from AI SDK?” and “It’s always fun when you AI bros post stuff. They all look and sound the same. Also, totally unnecessary.”

RubyLLM: #1 on Hacker News. ~3,600 stars. 5 million downloads. Millions of people using RubyLLM-powered apps today.

Same design. Wildly different reception. That tells you everything about which community is ready for this moment.

And teams that switched from Python are not going back:

We had a customer deployment coming up and our Langgraph agent was failing. I rebuilt it using RubyLLM. Not only was it far simpler, it performed better than the Langgraph agent.

Our first pass at the AI Agent used langchain… it was so painful that we built it from scratch in Ruby. Like a cloud had lifted. Langchain was that bad.

At Yuma, serving over 100,000 end users, our unified AI interface was awful. RubyLLM is so much nicer than all of that.

These aren’t people who haven’t tried Python. They tried it, shipped it, and replaced it.

Go Ship AI Apps with Ruby, Rails, and RubyLLM

When we freed ourselves from complexity, this community built Twitter, GitHub, Shopify, Basecamp, Airbnb. Rails changed web development forever.

Now we have the chance to change AI app development. Because AI apps are all about the product. And nobody builds products better than Ruby developers.

RubyLLM 1.12: Agents Are Just LLMs with Tools

Tue, 17 Feb 2026 00:00:00 +0000

“Agent” might be the most overloaded word in tech right now. Every startup claims to have one. Every framework promises to help you build them. The discourse has gotten so thick that the actual concept is buried under layers of marketing.

So let’s start from first principles.

What’s an Agent?

An agent is an LLM that can call functions.

That’s it. When you give a language model a set of tools it can invoke – a database lookup, an API call, a file operation – and the model decides when and how to use them, you have an agent. The model reasons about the problem, picks the right tool, looks at the result, and continues reasoning. Sometimes it calls several tools in sequence. Sometimes none.

There’s no special “agent mode.” No orchestration engine. No graph of nodes. It’s just a conversation where the model can do things besides talk.

RubyLLM Always Had This

Tool calling has been a core feature of RubyLLM since 1.0:

class SearchDocs < RubyLLM::Tool
  description "Searches our documentation"
  param :query, desc: "Search query"

  def execute(query:)
    Document.search(query).map(&:title)
  end
end

chat = RubyLLM.chat
chat.with_tool(SearchDocs)
chat.ask "How do I configure webhooks?"
# Model searches docs, reads results, answers the question

That’s an agent. The model decides to search, interprets the results, and responds. You didn’t need a special class or framework to make this happen.

But there was a problem.

The Reuse Problem

In a real application, you don’t configure a chat once. You configure it in controllers, background jobs, service objects, API endpoints. The same instructions, the same tools, the same temperature – scattered across your codebase:

# In the controller
chat = RubyLLM.chat(model: 'gpt-4.1')
chat.with_instructions("You are a support assistant for #{workspace.name}...")
chat.with_tools(SearchDocs, LookupAccount, CreateTicket)
chat.with_temperature(0.2)

# In the background job
chat = RubyLLM.chat(model: 'gpt-4.1')
chat.with_instructions("You are a support assistant for #{workspace.name}...")
chat.with_tools(SearchDocs, LookupAccount, CreateTicket)
chat.with_temperature(0.2)

# In the service object...
# You get the idea

Every Rubyist’s instinct kicks in: this should be a class.

RubyLLM 1.12: A DSL for Agents

That’s exactly what 1.12 adds. Define your agent once, use it everywhere:

class SupportAgent < RubyLLM::Agent
  model 'gpt-4.1'
  instructions "You are a concise support assistant."
  tools SearchDocs, LookupAccount, CreateTicket
  temperature 0.2
end

# Anywhere in your app
response = SupportAgent.new.ask "How do I reset my API key?"

Every macro maps to a with_* call you already know. model maps to RubyLLM.chat(model:). tools maps to with_tools. instructions maps to with_instructions. No new concepts. Just a cleaner way to package what you were already doing.

Runtime Context

Static configuration is only half the story. Real agents need runtime data – the current user, the workspace, the time of day. Agents support lazy evaluation for this:

class WorkAssistant < RubyLLM::Agent
  chat_model Chat
  inputs :workspace

  instructions { "You are helping #{workspace.name}" }

  tools do
    [
      TodoTool.new(chat: chat),
      GoogleDriveTool.new(user: chat.user)
    ]
  end
end

chat = WorkAssistant.create!(user: current_user, workspace: @workspace)
chat.ask "What's on my todo list?"

Blocks and lambdas are evaluated at runtime, with access to the chat object and any declared inputs. Values that depend on runtime context must be lazy – a constraint that Ruby makes trivially natural.

Prompt Conventions

If you’re using Rails, agents follow a convention for prompt management:

class WorkAssistant < RubyLLM::Agent
  chat_model Chat
  instructions display_name: -> { chat.user.display_name_or_email }
end

This renders app/prompts/work_assistant/instructions.txt.erb with display_name available as a local. Namespaced agents map naturally: Admin::SupportAgent looks in app/prompts/admin/support_agent/.

Your prompts are ERB templates. Version them in git. Review them in PRs. Treat them like the application code they are.

Rails Integration

The chat_model macro activates Rails-backed persistence:

class WorkAssistant < RubyLLM::Agent
  chat_model Chat
  model 'gpt-4.1'
  instructions "You are a helpful assistant."
  tools SearchDocs, LookupAccount
end

# Create a persisted chat with agent config applied
chat = WorkAssistant.create!(user: current_user)

# Load an existing chat, apply runtime config
chat = WorkAssistant.find(params[:id])

# User sends a message, everything persisted automatically
chat.ask(params[:message])

create! persists both the chat and its instructions. find applies configuration at runtime without touching the database. This distinction matters when your prompts evolve faster than your data.

Also in 1.12

Agents are the headline, but this release also adds:

AWS Bedrock full coverage via the Converse API – every Bedrock chat model through one interface
Azure Foundry API – broad model access across Azure’s ecosystem
Clearer with_instructions semantics – explicit append options, guaranteed message ordering

Already in Production

This isn’t a spec or a proposal. The agent DSL powers Chat with Work in production right now. The WorkAssistant examples above aren’t hypothetical – they’re simplified versions of real code handling real conversations.

If you want to see what it feels like, try it out.

The Point

The industry is making agents complicated. They’re not. An agent is an LLM with tools. You define the tools in Ruby. You package them in a class. You use the class in your app.

No graphs. No chains. No orchestration frameworks. Just Ruby.

gem 'ruby_llm', '~> 1.12'

Dictation Is the New Prompt (Voxtype on Omarchy)

Wed, 07 Jan 2026 00:00:00 +0000

Typing every prompt feels backwards in 2026. You can speak faster than you can type. Hold a hotkey, speak, your OS types it for you. If you care about flow, dictation is the most underrated upgrade you can make.

In the Omarchy world, Hyprwhspr is getting a lot of attention after a recent DHH tweet:

I had no idea that local model dictation had gotten this good and this fast! I'm blown away by how good hyprwhspr with Omarchy is just using a base model backed by the CPU. Unbelievably accurate. https://t.co/Jtz3eN84Jf
— DHH (@dhh) January 3, 2026

He’s right: local dictation is shockingly good now. The catch is Hyprwhspr uses Python virtual environments, which don’t mix well with mise. Fortunately Pete Jackson saw that and created Voxtype to solve exactly this issue!

EDIT: five minutes after I posted this, DHH confirmed that Voxtype ships will ship with Omarchy 3.3! 🎉

Voxtype is shipping with Omarchy 3.3 👍 https://t.co/Pt1EkgNLoi
— DHH (@dhh) January 7, 2026

Why Voxtype

Voxtype is built in Rust, so you don’t need Python virtual environments which means it works well with mise. It’s fast, it just works, and when I opened an issue asking for an Omarchy theme, the author shipped it immediately. Now it looks stunning in my setup.

With Vulkan enabled, transcription is almost instant on my Ryzen AI 9 HX370. The video at the top is not sped up. Longer text also transcribes instantly.

If you want to copy my exact configuration, here it is.

Install

sudo pacman -S wtype ydotool wl-clipboard vulkan-icd-loader # last only if you want to use your GPU
sudo yay -S voxtype

voxtype setup --download
voxtype setup gpu # if you want to use your GPU
voxtype setup systemd

Restart Waybar after the changes:

pkill -SIGUSR2 waybar

Voxtype config

~/.config/voxtype/config.toml

state_file = "auto"

[hotkey]
enabled = false

[audio]
device = "default"
sample_rate = 16000
max_duration_secs = 600

[audio.feedback]
enabled = true
# Sound theme: "default", "subtle", "mechanical", or path to custom theme directory
theme = "default"
volume = 0.7

[whisper]
model = "base.en"
language = "en"
translate = false
on_demand_loading = true # saves your GPU until it's needed

[output]
mode = "type"
fallback_to_clipboard = true

# Delay between typed characters in milliseconds
# 0 = fastest possible, increase if characters are dropped
type_delay_ms = 1

[output.notification]
on_recording_start = false
on_recording_stop = false
on_transcription = true

[text]
replacements = { "hyperwhisper" = "hyprwhspr" }

[status]
icon_theme = "omarchy"

Waybar integration

~/.config/waybar/config.jsonc

"custom/voxtype": {
  "exec": "voxtype status --follow --format json",
  "return-type": "json",
  "format": "{}",
  "tooltip": true
},

And add it to modules-right:

"modules-right": [
  "group/tray-expander",
  "custom/voxtype",
  "bluetooth",
  "network",
  "pulseaudio",
  "cpu",
  "battery"
]

~/.config/waybar/style.css

@import "voxtype.css";
@import "../omarchy/current/theme/waybar.css";

~/.config/waybar/voxtype.css

#custom-voxtype {
  margin: 0 16px 0 0;
  font-size: 12px;
  font-weight: bold;
  border-top: 2px solid transparent;
  border-bottom: 2px solid transparent;
  transition: color 150ms ease-in-out, border-color 150ms ease-in-out;
}

#custom-voxtype.recording {
  color: #ff5555;
  animation: pulse 1s ease-in-out infinite;
}

#custom-voxtype.transcribing {
  color: #ff5555;
}

#custom-voxtype.stopped {
  color: #6272a4;
}

@keyframes pulse {
  0% { opacity: 1; }
  50% { opacity: 0.5; }
  100% { opacity: 1; }
}

Keybinding

In your Hyprland config:

# Voxtype
bindd = SHIFT, XF86AudioMicMute, Transcribe, exec, voxtype record toggle

That’s it. Use your voice whenever possible. It’s faster, more natural, and keeps you in flow.

Nano Banana with RubyLLM

Thu, 23 Oct 2025 00:00:00 +0000

Google wired Nano Banana into the chat interface generateContent, not the image API’s predict. Counterintuitive if you’re using RubyLLM, which makes you think in terms of actions like paint instead of chat.

Once you know that quirk, it’s straightforward. Only caveat: you need the latest trunk or v1.9+, because that’s where we taught RubyLLM to unpack inline file data from chat responses.

Wire It Up

chat = RubyLLM
         .chat(model: "gemini-2.5-flash-image")
         .with_temperature(1.0) # optional, but you like creativity, right?
         .with_params(generationConfig: { responseModalities: ["image"] }) # also optional, if you prefer the model to return only images

response = chat.ask "your prompt", with: ["all.png", "the.jpg", "attachments.png", "you.png", "want.jpg"]

image_io = response.content[:attachments].first.source

That StringIO holds the generated image. Stream it to S3, attach it to Active Storage, or keep it in memory for a downstream processor.

Want a file?

response.content[:attachments].first.save "nano-banana.png"

That’s it. Chat endpoint, one call. Ship the image feature and go enjoy the rest of your day.

RubyLLM 1.4-1.5.1: Three Releases in Three Days

Fri, 01 Aug 2025 00:00:00 +0000

Three releases in three days. Wednesday, Friday, and Friday again. Each one shipped as soon as it was ready.

1.4.0: The Structured Output Release (Wednesday)

Getting LLMs to return data in the format you need has always been painful.

We all had code like this:

# The old struggle
response = chat.ask("Return user data as JSON. ONLY JSON. NO MARKDOWN.")
begin
  data = JSON.parse(response.content.gsub(/```json\n?/, '').gsub(/```\n?/, ''))
rescue JSON::ParserError
  # Hope and pray
end

Now with structured output:

# Define your schema with the RubyLLM::Schema DSL
class PersonSchema < RubyLLM::Schema
  string :name
  integer :age
  array :skills, of: :string
end

# Get perfectly structured JSON every time
chat = RubyLLM.chat.with_schema(PersonSchema)
response = chat.ask("Generate a Ruby developer profile")

# => {"name" => "Yukihiro", "age" => 59, "skills" => ["Ruby", "C", "Language Design"]}

No more regex. No more parsing. Just data structures that work.

Oh, and Daniel Friis released RubyLLM::Schema just for the occasion, but you can use any gem you want with RubyLLM, or even write your own JSON schema from scratch.

Rails Generators: From Zero to Chat

We didn’t have Rails generators before. Now we do:

rails generate ruby_llm:install

This creates everything you need:

Migrations
Models with acts_as_chat, acts_as_message, and acts_as_tool_call
A clean initializer

Your Chat model works like any Rails model:

chat = Chat.create!(model: "gpt-4.1-nano")
response = chat.ask("Explain Ruby blocks")
# Messages are automatically persisted with proper associations

From rails new to working chat in under 5 minutes.

Tool Call Transparency

New callback to see what your AI is doing:

chat.on_tool_call do |tool_call|
  puts "🔧 AI is calling: #{tool_call.name}"
  puts "   Arguments: #{tool_call.arguments}"

  Rails.logger.info "[AI Tool] #{tool_call.name}: #{tool_call.arguments}"
end

chat.ask("What's the weather in Tokyo?").with_tools([weather_tool])
# => 🔧 AI is calling: get_weather
#    Arguments: {"location": "Tokyo"}

Essential for debugging and auditing AI behavior.

Direct Parameter Provider Access

Need that one weird parameter? Use with_params:

# OpenAI's JSON mode
chat.with_params(response_format: { type: "json_object" })
     .ask("List Ruby features as JSON")

No waiting for us to wrap every provider option.

Critical Bug Fixes and Other Improvements in 1.4.0

Anthropic multiple tool calls: Was only processing the first tool call, silently ignoring the rest
Streaming errors: Now handled properly in both Faraday V1 and V2
Test fixtures: Removed 60MB of unnecessary test data
Message ordering: Fixed race conditions in streaming responses
JRuby support: Now officially tested and supported
Direct access to raw responses: Get the raw responses from Faraday for debugging
GPUStack support: A production-ready alternative to Ollama

Full release notes for 1.4.0 available on GitHub.

1.5.0: Two New Providers (Friday)

Mistral AI

63 models from France, from tiny to massive:

RubyLLM.configure do |config|
  config.mistral_api_key = ENV['MISTRAL_API_KEY']
end

# Efficient small model
chat = RubyLLM.chat(model: 'ministral-3b-latest')

# Their flagship model
chat = RubyLLM.chat(model: 'mistral-large-latest')

# Vision with Pixtral
vision = RubyLLM.chat(model: 'pixtral-12b-latest')
vision.ask("What's in this image?", with: "path/to/image.jpg")

Perplexity

Real-time web search meets LLMs:

RubyLLM.configure do |config|
  config.perplexity_api_key = ENV['PERPLEXITY_API_KEY']
end

# Get current information with web search
chat = RubyLLM.chat(model: 'sonar-pro')
response = chat.ask("What are the latest Ruby 3.4 features?")
# Searches the web and returns current information

Full release notes for 1.5.0 available on GitHub.

Rails Generator Fixes

Fixed migration order (Chats → Messages → Tool Calls)
Fixed PostgreSQL detection that was broken by namespace collision
PostgreSQL users now get jsonb columns instead of json

1.5.1: Quick Fixes (Also Friday)

Found issues Friday afternoon. Fixed them. Shipped them. That’s it.

Why make users wait through the weekend with broken code?

Fixed Mistral model capabilities (was a Hash, should be Array)
Fixed Google Imagen output modality
Updated to JRuby 10.0.1.0
Added JSON schema validation for model registry

Full release notes for 1.5.1 available on GitHub.

The Philosophy: Ship When Ready

Three days. Three releases. Each one made someone’s code work better.

We could have bundled everything into one release next week. But every moment we wait is a moment someone’s dealing with a bug we already fixed.

The structured output in 1.4.0? People needed that since before RubyLLM existed. The PostgreSQL fix in 1.5.0? Someone’s migrations were failing Thursday. The Mistral fix? Breaking someone’s code Friday morning.

When code is ready, you ship.

What You Can Build Now

With structured output and multiple providers, you can build real features:

# Extract structured data from any text
class InvoiceSchema < RubyLLM::Schema
  string :invoice_number
  date :date
  float :total
  array :line_items do
    object do
      string :description
      float :amount
    end
  end
end

# Use Mistral for cost-effective extraction
extractor = RubyLLM.chat(model: 'ministral-8b-latest')
                    .with_schema(InvoiceSchema)

invoice_data = extractor.ask("Extract invoice details from: #{pdf_text}")
# Reliable data extraction at a fraction of GPT-4's cost

# Use Perplexity for current information
researcher = RubyLLM.chat(model: 'sonar-deep-research')
market_data = researcher.ask("Current Ruby job market trends in 2025")
# Real-time data, not training cutoff guesses

Use It

gem 'ruby_llm', '~> 1.5'

Full backward compatibility. Your 1.0 code still runs. These releases just made everything better.

Carmine Paolino

Production Experience Cannot Be Hallucinated

The Four Magic Words in Tech

What Actually Happened

Do Not Counterfeit Experience

RubyLLM 1.15: Image Editing, Cost Tracking and Less Tool Boilerplate

Image Editing

Cost Tracking

Token Counts That Mean What They Say

Less Tool Boilerplate

Callbacks That Stack

Rails Fixes

Providers and Models

Use It

kamal-backup: Scheduled Rails Backups for Kamal Apps

A gem and a Docker image

Why restic

Setting it up

Rails data, not just a database dump

Restores are part of the product

Restore drills

Evidence for reviews

Try it

Ruby Concurrency: What Actually Happens

The four primitives

How scheduling works

Thread scheduling

Cooperative scheduling (fibers)

The GVL: why threads and fibers are more similar than you think

What happens when a fiber hits I/O

What happens when a fiber does CPU-bound work

What happens when a fiber queries the database

Pool size follows database work

What happens when a fiber starts a transaction

What happens when you have too many fibers

“Why not just configure more Solid Queue threads?”

“Why not Ractors?”

“Isn’t this just what JavaScript does?”

“Isn’t this just what Go does?”

“Fibers need Async do blocks. That’s still new syntax.”

When to use what

Making the Rails Default Job Queue Fiber-Based

Threads vs fibers, quickly

The switch

Under the hood

The database connection math

The benchmarks

Results

Thread mode hit the wall

One backend, two modes

Your Agent's Context Window Is Not a Junk Drawer

MCP: the biggest offender

Tool responses are context too

Your instructions are context too

Tool count is context too

Every token should earn its place

I Built a Monitor Configuration Tool for Hyprland

A real spatial editor, in your terminal

Safe apply with automatic revert

Workspace planning

Source-chain verification

Dotfiles integration

One runtime dependency: Hyprland

How it compares

Try it

Comb Shaped Slices

The comb

Ruby Deserves Beautiful Documentation

What It Is

Getting Started

Why This Matters

RubyLLM 1.14: From Zero to AI Chat App in Under Two Minutes

Why This Matters

What You Get

Full Tutorial: New App from Scratch

Generators for Agents, Tools, and Schemas

Self-Registering Provider Config

Bug Fixes

Ruby Is the Best Language for Building AI Apps

The AI Training Ecosystem Is Irrelevant

“Fibers need `Async do` blocks. That’s still new syntax.”