<?xml version="1.0" encoding="UTF-8"?>

<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:atom="http://www.w3.org/2005/Atom">

<channel>

<title>Carmine Paolino</title>
<link>https://paolino.me</link>
<description>I build AI tools at Chat with Work and RubyLLM. Co-founded Freshflow. Outside tech, I make music, run Floppy Disco, and take photos.</description>

<atom:link href="https://paolino.me/rss.xml"
           rel="self"
           type="application/rss+xml"/>

<lastBuildDate>Tue, 19 May 2026 09:56:36 +0000</lastBuildDate>



<item>

<title>Production Experience Cannot Be Hallucinated</title>

<link>https://paolino.me/production-experience-cannot-be-hallucinated/</link>

<guid isPermaLink="true">
https://paolino.me/production-experience-cannot-be-hallucinated/
</guid>

<pubDate>Wed, 13 May 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[
I paid five dollars to read a Medium article about my own free, open source library. It was sold as hard-won production experience.

]]>
</description>



<content:encoded>
<![CDATA[

<p>I paid five dollars to read a <a href="https://mrrazahussain.medium.com/the-rails-llm-stack-is-finally-ready-for-production-here-is-what-i-learned-shipping-it-ff9d20298c5c">Medium article</a> about <a href="https://rubyllm.com">my own free, open source library</a>. It was sold as hard-won production experience.</p>

<p>It was fabricated.</p>

<p>The first code sample used <code>RubyLLM.client</code>, which does not exist. It called <code>client.chat(messages: ...)</code>, which does not exist. Then it invented <code>RubyLLM::StreamInterrupted</code>, <code>RubyLLM::APIError</code>, and a <code>stream: proc</code> API that RubyLLM has never had.</p>

<p>The problem was not merely wrong information. Wrong information can be corrected. This was sold as experience with RubyLLM in production, which is a much more valuable claim.</p>

<p>AI slop is not just filling the web with <a href="https://x.com/jorgemanru/status/2053183727514091820">predictable cadence</a>. It is fabricating experience. It is letting people skip the work, skip the scar tissue, and still write in the voice of someone who has been there.</p>

<p>In open source, that turns into a tax. Maintainers build the thing, write the docs, publish the source, keep the examples working, answer the issues, and then have to police hallucinated articles about their own projects before users start debugging ghosts.</p>

<h2 id="the-four-magic-words-in-tech">The Four Magic Words in Tech</h2>

<p>Production. Scale. Security. Reliability.</p>

<p>In the tech world, attach one of these words to a claim and it immediately becomes true. “This does not scale” can kill a project before anyone measures it. “This is not production ready” can sabotage it without a single deploy.</p>

<p>So when an article says “what broke in production”, it is not just offering advice. It is claiming experience, and experience cannot be hallucinated.</p>

<p><a href="/assets/receipts/2026-05-13-production-experience-medium-original-article-2026-05-12.md">The first version</a> opened by saying the author had spent three weeks on the wrong side of the problem before getting something stable in production. That is a powerful claim. It tells the reader to relax and inherit the author’s scars.</p>

<p>There were no scars. The author had not even run the first example.</p>

<p>This is why fake experience is so dangerous. Bad code fails fast. Fake experience lingers. It gets quoted. It gets summarized. It gets used in meetings by people who do not know enough yet to see the hollow center.</p>

<p>The recipe is familiar. Streaming failures. Token budgets. Provider fallback. Turbo Streams. Redis circuit breakers. nginx buffering. Load testing. They sit near “LLM production” in the LLM training data. Arrange them with enough confidence and the result smells real.</p>

<p>Production experience is not a smell. It is a thing that happened, and none of these things happened.</p>

<h2 id="what-actually-happened">What Actually Happened</h2>

<p>Here is the short version.</p>

<p>Most articles about RubyLLM are good. Since it became popular, I have seen a few confident guides from people who clearly had not run the code. Usually they disappear into LinkedIn or search results. This one made the pattern impossible to ignore.</p>

<p><a href="/assets/receipts/2026-05-13-production-experience-maintainer-first-correction.png">I called it out</a>:</p>

<blockquote>
  <p>Author of RubyLLM here.</p>

  <p>The very first example does not work.</p>

  <p>The article is not merely wrong in a few places. It is fabricated.</p>

  <p>…</p>
</blockquote>

<p><a href="/assets/receipts/2026-05-13-production-experience-author-admission.png">The author replied</a>:</p>

<blockquote>
  <p>You were right.</p>

  <p>The code in the original article was not verified against the actual gem. <code>RubyLLM.client</code>, <code>RubyLLM::StreamInterrupted</code>, <code>RubyLLM::APIError</code>, <code>stream: proc</code> – none of it exists. You caught every fabrication accurately.</p>

  <p>I’ve replaced the article entirely. The new version has been verified against your documentation and source. The fake “production experience” framing is gone. It’s now an honest documentation-based guide with a correction notice at the top explaining what happened.</p>
</blockquote>

<p>“I’ve replaced the article entirely.”</p>

<p>It was a long article. The completely rewritten replacement appeared in a few minutes. The fake method names were replaced with real ones, but the posture stayed the same: “RubyLLM in production”, “what tutorials skip”, “streaming failures”, “provider fallback”, “token budgets.”</p>

<p>The method names got real. The experience didn’t.</p>

<p>The new version claimed Puma restarts produce neat RubyLLM streaming errors. They do not. If the worker dies, the Ruby process running the call is gone. It suggested deleting old persisted chat messages as context management. That is destroying conversation history. It described fallback by throwing away the chat and asking another provider the last prompt as a fresh question. That is not conversation fallback. It confused HTTP/SSE buffering with Turbo Streams over ActionCable.</p>

<p>Not battle scars. Guesses presented as authority.</p>

<p><a href="/assets/receipts/2026-05-13-production-experience-maintainer-second-correction.png">I called the second version what it was: phony</a>. <a href="/assets/receipts/2026-05-13-production-experience-responses-hidden.png">The author then hid responses</a> while keeping the article up.</p>

<p>I reported the article to Medium and contacted the publication that promoted it with the fabricated APIs, the author’s admission, and the hidden corrections. To their credit, the editor replied quickly, apologized, and removed it from the publication. But only the author can take down the original Medium article, so the piece remained available without the maintainer corrections visible next to it.</p>

<h2 id="do-not-counterfeit-experience">Do Not Counterfeit Experience</h2>

<p>Please do write about your favourite software. Critique it too. Tell us maintainers where the API is wrong, the docs are bad, the abstraction leaks. Preferably in an issue so we can actually see it. That feedback is gold.</p>

<p>But do not counterfeit experience. If you’re using The Four Magic Words in Tech, the bar is even higher.</p>

<p>And if you run a technical publication, please at least check the first example.</p>

]]>
</content:encoded>

</item>



<item>

<title>RubyLLM 1.15: Image Editing, Cost Tracking and Less Tool Boilerplate</title>

<link>https://paolino.me/rubyllm-1-15/</link>

<guid isPermaLink="true">
https://paolino.me/rubyllm-1-15/
</guid>

<pubDate>Thu, 07 May 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[
I released RubyLLM 1.15 today.

]]>
</description>



<content:encoded>
<![CDATA[

<p>I released <a href="https://rubyllm.com">RubyLLM</a> 1.15 today.</p>

<p>It ships image editing, cost tracking, cleaner token accounting, inferred tool parameters, additive callbacks, and Rails fixes.</p>

<p>The theme is simple: stop making me write glue code. If the computer can infer it, RubyLLM should infer it. If a provider reports usage, RubyLLM should turn it into cost. If Rails already has a blob, RubyLLM should not download it and upload it again.</p>

<h2 id="image-editing">Image Editing</h2>

<p><code>RubyLLM.paint</code> could already generate images:</p>

<pre><code class="language-ruby">image = RubyLLM.paint("A watercolor robot holding a Ruby gem")
</code></pre>

<p>Now <code>with:</code> turns it into an image edit:</p>

<pre><code class="language-ruby">image = RubyLLM.paint(
  "Turn the logo green and keep the background transparent",
  model: "gpt-image-1",
  with: "logo.png"
)
</code></pre>

<p>Same method, same attachment shape.</p>

<p>The source can be a path, a URL, an IO-like object, or an Active Storage attachment. Multiple source images work too:</p>

<pre><code class="language-ruby">image = RubyLLM.paint(
  "Combine these references into a postcard illustration",
  model: "gpt-image-1",
  with: ["person.png", "style-reference.png"]
)
</code></pre>

<p>And if you need to constrain the edit, pass a mask:</p>

<pre><code class="language-ruby">image = RubyLLM.paint(
  "Replace only the background with a sunset sky",
  model: "gpt-image-1",
  with: "portrait.png",
  mask: "portrait-mask.png"
)
</code></pre>

<p>That’s it. <code>paint</code> paints. Sometimes from scratch, sometimes from an existing image.</p>

<h2 id="cost-tracking">Cost Tracking</h2>

<p>RubyLLM has tracked tokens since 1.0. But “this used 18,432 tokens” is only half the answer. The next question is always: how much did that cost?</p>

<p>Calculating that was never hard. Take the input tokens, output tokens, cached tokens, maybe reasoning tokens. The pricing is already in RubyLLM’s model registry. Multiply by the per-million rate.</p>

<p>But why should every app have to write that code?</p>

<p>RubyLLM already has the usage. RubyLLM already knows the model. RubyLLM already ships the model registry. So now it does the boring math for you.</p>

<p>Now you can ask:</p>

<pre><code class="language-ruby">response = chat.ask("Summarize Ruby's object model.")

response.cost.total
chat.cost.total
agent.cost.total
</code></pre>

<p>Same for images:</p>

<pre><code class="language-ruby">image = RubyLLM.paint("A small watercolor robot", model: "gpt-image-1")

image.tokens.input
image.tokens.output

image.cost.input
image.cost.output
image.cost.total
</code></pre>

<p>If RubyLLM does not have pricing for part of the usage, the cost is <code>nil</code>. Better no answer than a fake one.</p>

<p>A chat with ten messages can tell you the total. An agent can tell you the total. A generated image can tell you the total. No more handrolled sums.</p>

<h2 id="token-counts-that-mean-what-they-say">Token Counts That Mean What They Say</h2>

<p>Prompt caching made token counts messy.</p>

<p>Some providers include cache reads in prompt tokens. Some report cache creation separately. Some don’t. If you multiply the wrong number by the wrong price, your cost tracking is wrong before it starts.</p>

<p>So 1.15 separates the different kinds of tokens before exposing them:</p>

<pre><code class="language-ruby">response.tokens.input       # standard input tokens
response.tokens.output      # billable output tokens
response.tokens.cache_read  # prompt cache reads
response.tokens.cache_write # prompt cache writes
</code></pre>

<p><code>tokens.input</code> now means normal input tokens. Cache reads and cache writes are separate. <code>tokens.output</code> always mean billable output tokens.</p>

<p>The old top-level helpers still work. New code should use <code>response.tokens.*</code>.</p>

<p>No new Rails migration is required if you already ran the 1.9 token migration. If you display token counts directly, read the <a href="https://rubyllm.com/upgrading/#upgrade-to-115">1.15 upgrade notes</a>.</p>

<h2 id="less-tool-boilerplate">Less Tool Boilerplate</h2>

<p>Tools in RubyLLM are Ruby classes. But for very simple tools, RubyLLM still made you repeat yourself:</p>

<pre><code class="language-ruby">class Weather &lt; RubyLLM::Tool
  description "Gets current weather for a location"
  param :latitude  # why?
  param :longitude # DRY!

  def execute(latitude:, longitude:)
    # ...
  end
end
</code></pre>

<p>That is silly. The method signature already says there is a <code>latitude</code> and a <code>longitude</code>.</p>

<p>Now this works:</p>

<pre><code class="language-ruby">class Weather &lt; RubyLLM::Tool
  desc "Gets current weather for a location"

  def execute(latitude:, longitude:, units: "metric")
    # ...
  end
end
</code></pre>

<p>Required keywords become required string parameters. Optional keywords become optional string parameters.</p>

<p>Ruby method signatures don’t tell us JSON Schema types or descriptions, so if those matter, keep using <code>param</code>:</p>

<pre><code class="language-ruby">param :units, type: :string, desc: "metric or imperial", required: false
</code></pre>

<p>And when you need nested objects, arrays, enums, or full schema control, use <code>params</code>. Nothing changed there.</p>

<p>Also:</p>

<ul>
  <li><code>desc</code> is now an alias for <code>description</code></li>
  <li><code>param</code> accepts <code>description:</code> as an alias for <code>desc:</code></li>
  <li>the tool generator now emits <code>desc</code></li>
  <li>we retain full backwards compatibility!</li>
</ul>

<h2 id="callbacks-that-stack">Callbacks That Stack</h2>

<p>The old <code>on_*</code> callbacks were replace-style callbacks. Register another one and you replaced the previous one.</p>

<p>That caused an obvious problem: Rails persistence wants callbacks, and your app also wants callbacks. Logging wants callbacks. Analytics wants callbacks. Replacing the previous callback is the wrong default.</p>

<p>So 1.15 adds additive callbacks:</p>

<pre><code class="language-ruby">chat.before_message { ... }
chat.after_message { |message| ... }
chat.before_tool_call { |tool_call| ... }
chat.after_tool_result { |result| ... }
</code></pre>

<p>Register five callbacks, all five run.</p>

<p>Rails persistence uses these internally now. Your app can layer its own callbacks on top without breaking persistence.</p>

<p>The old <code>on_*</code> callbacks are deprecated. They’ll go away in RubyLLM 2.0.</p>

<h2 id="rails-fixes">Rails Fixes</h2>

<p>Rails got a lot of boring, important fixes:</p>

<ul>
  <li>Action Text-backed message content is converted to plain text before being sent to the model.</li>
  <li>ActiveRecord support no longer sits in the core gem eager-load path, fixing standalone <code>require "ruby_llm"</code> with Zeitwerk eager loading.</li>
  <li>The <code>acts_as</code> API follows Rails association inference more closely.</li>
  <li>Existing Active Storage blobs and attachments passed through <code>with:</code> are reused instead of downloaded and re-uploaded.</li>
</ul>

<h2 id="providers-and-models">Providers and Models</h2>

<p>Empty tool results are now handled consistently across Anthropic, Bedrock, and Gemini. When a tool returns nothing, RubyLLM sends a small placeholder instead of provider-invalid empty content.</p>

<p>Streaming and non-streaming token usage is normalized across OpenAI, OpenRouter, Bedrock, and Gemini before cost calculation.</p>

<p>The model registry has been refreshed too: cache read/write pricing, reasoning output pricing, GPT Image pricing, and new aliases including Claude Opus 4.7, DeepSeek V4, Gemini Embedding 2, Gemma 4, and GPT-5.5.</p>

<h2 id="use-it">Use It</h2>

<pre><code class="language-ruby">gem 'ruby_llm', '~&gt; 1.15'
</code></pre>

<p>Then:</p>

<pre><code class="language-bash">bundle update ruby_llm
</code></pre>

<p>Full release notes on <a href="https://github.com/crmne/ruby_llm/releases/tag/1.15.0">GitHub</a>.</p>

]]>
</content:encoded>

</item>



<item>

<title>kamal-backup: Scheduled Rails Backups for Kamal Apps</title>

<link>https://paolino.me/kamal-backup/</link>

<guid isPermaLink="true">
https://paolino.me/kamal-backup/
</guid>

<pubDate>Tue, 05 May 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[
I released kamal-backup today.

]]>
</description>



<content:encoded>
<![CDATA[

<p>I released <a href="https://kamal-backup.dev">kamal-backup</a> today.</p>

<p>I run <a href="https://chatwithwork.com">Chat with Work</a> on Kamal, and I needed backups. There are already Kamal accessories for database backups. None of them also back up Active Storage. None use restic, so encryption, deduplication, and repository checks are on you. None ship a CLI with restores and drills. None produce evidence you can hand a security reviewer.</p>

<p>So I built one.</p>

<h2 id="a-gem-and-a-docker-image">A gem and a Docker image</h2>

<p><code>kamal-backup</code> is two pieces: a Ruby gem you add to your Rails app, and a Docker image you boot as a Kamal accessory. They point at a restic repository you bring yourself.</p>

<p>The gem is your CLI. Local commands run directly on your machine using restic. Production-side commands shell out through Kamal into the accessory. The same <code>kamal-backup</code> binary covers setup (<code>init</code>, <code>validate</code>), on-demand operations (<code>backup</code>, <code>list</code>, <code>check</code>), data movement (<code>restore local</code>, <code>restore production</code>), verification (<code>drill local</code>, <code>drill production</code>), and audit (<code>evidence</code>).</p>

<p>The Docker image (<code>ghcr.io/crmne/kamal-backup</code>) ships with <code>restic</code>, <code>pg_dump</code>, <code>mariadb-dump</code>/<code>mysqldump</code>, and <code>sqlite3</code> baked in. The default container command is <code>kamal-backup schedule</code>, a loop that fires every <code>backup_schedule_seconds</code> and writes one database snapshot and one Active Storage file snapshot per run.</p>

<p>The restic repository is where the encrypted snapshots end up: S3-compatible object storage, a restic REST server, or a filesystem path. <code>kamal-backup</code> points at it. It doesn’t run it for you.</p>

<h2 id="why-restic">Why restic</h2>

<p>I didn’t want to invent a backup format, and I didn’t want to bolt encryption and deduplication onto shell scripts. Restic does what I needed:</p>

<ul>
  <li>encrypted repositories by default;</li>
  <li>a tag system, so the database dump and the Active Storage tree from the same run share a <code>run:&lt;timestamp&gt;</code> and pair up at restore time;</li>
  <li>deduplication across runs, so a year of daily backups doesn’t grow linearly;</li>
  <li><code>restic forget --prune</code> for retention;</li>
  <li><code>restic check</code> for repository health;</li>
  <li>S3-compatible storage, a restic REST server, or a local filesystem path, so you host the repository wherever fits.</li>
</ul>

<p>It’s a single binary that drops cleanly into a Docker image, alongside the database client tools. Nothing extra to install on the Rails host. <code>kamal-backup</code> is the Rails- and Kamal-shaped layer on top, and restic does the cryptography, the storage, and the integrity checks.</p>

<h2 id="setting-it-up">Setting it up</h2>

<p>Add the gem in development:</p>

<pre><code class="language-ruby"># Gemfile
group :development do
  gem "kamal-backup"
end
</code></pre>

<p>Run <code>init</code>. It creates <code>config/kamal-backup.yml</code> and prints an accessory block you paste into your Kamal deploy config:</p>

<pre><code class="language-sh">bundle install
bundle exec kamal-backup init
</code></pre>

<p><code>config/kamal-backup.yml</code> holds the backup settings:</p>

<pre><code class="language-yaml">accessory: backup
app_name: chatwithwork
database_adapter: postgres
database_url: postgres://chatwithwork@chatwithwork-db:5432/chatwithwork_production
backup_paths:
  - /data/storage
restic_repository: s3:https://s3.example.com/chatwithwork-backups
restic_init_if_missing: true
backup_schedule_seconds: 86400
</code></pre>

<p>Kamal mounts that file read-only into the accessory, so the accessory block in <code>config/deploy.yml</code> stays small. Only secrets live in <code>env</code>:</p>

<pre><code class="language-yaml">accessories:
  backup:
    image: ghcr.io/crmne/kamal-backup:latest
    host: chatwithwork.com
    files:
      - config/kamal-backup.yml:/app/config/kamal-backup.yml:ro
    env:
      secret:
        - PGPASSWORD
        - RESTIC_PASSWORD
        - AWS_ACCESS_KEY_ID
        - AWS_SECRET_ACCESS_KEY
    volumes:
      - "chatwithwork_storage:/data/storage:ro"
      - "chatwithwork_backup_state:/var/lib/kamal-backup"
</code></pre>

<p>Validate, boot, and watch the logs:</p>

<pre><code class="language-sh">bundle exec kamal-backup validate
bin/kamal accessory boot backup
bin/kamal accessory logs backup
</code></pre>

<p><code>validate</code> catches missing required settings before the accessory has to be running. Once it’s up, the container loops on <code>kamal-backup schedule</code>.</p>

<p>Then run the first backup and print evidence:</p>

<pre><code class="language-sh">bundle exec kamal-backup backup
bundle exec kamal-backup list
bundle exec kamal-backup evidence
</code></pre>

<p>No cron glue. No separate backup host. No “remember to install restic on production.” The accessory image already has it.</p>

<h2 id="rails-data-not-just-a-database-dump">Rails data, not just a database dump</h2>

<p>A Rails app has two things worth backing up: the database, and file-backed Active Storage. <code>kamal-backup</code> handles both.</p>

<p>Postgres uses <code>pg_dump</code>. MySQL and MariaDB use <code>mariadb-dump</code> or <code>mysqldump</code>. SQLite uses <code>sqlite3 .backup</code>. File-backed Active Storage uses <code>restic backup</code> from mounted volumes.</p>

<p>Each run writes one database snapshot and one file snapshot, both tagged with <code>app:&lt;name&gt;</code>, <code>type:database</code> or <code>type:files</code>, and the same <code>run:&lt;timestamp&gt;</code>. You pair them at restore time using that timestamp.</p>

<p>If your app stores Active Storage blobs directly in S3, there’s no mounted path for <code>backup_paths</code> to capture. <code>kamal-backup</code> still covers the database. The S3 side is on your bucket lifecycle and replication settings.</p>

<h2 id="restores-are-part-of-the-product">Restores are part of the product</h2>

<p>The backup script is the easy part. The restore path is where most setups fail.</p>

<p>So <code>kamal-backup</code> ships with restore commands:</p>

<pre><code class="language-sh">bundle exec kamal-backup restore local
bundle exec kamal-backup restore production
</code></pre>

<p><code>restore local</code> pulls a production backup down to your laptop. Useful when you want to inspect real data, reproduce a production bug, or prove the backup actually comes back.</p>

<p><code>restore production</code> prompts before it overwrites anything.</p>

<h2 id="restore-drills">Restore drills</h2>

<p>The command I care about most is <code>drill</code>.</p>

<pre><code class="language-sh">bundle exec kamal-backup drill local \
  --check "bin/rails runner 'puts User.count'"
</code></pre>

<p>A drill means: restore, check, record the result.</p>

<p>Two modes:</p>

<ul>
  <li><code>drill local</code> restores onto your machine and runs an optional check.</li>
  <li><code>drill production</code> restores into scratch production-side targets, never the live database.</li>
</ul>

<p>That second one matters. For Postgres and MySQL, you give it a scratch database. For SQLite, a scratch file path. For Active Storage, a scratch restore directory. The drill uses production infrastructure, without pointing at live production.</p>

<p>That’s the difference between “the backup ran” and “we restored the latest production snapshot into a scratch target on April 30, ran this check, and it passed.”</p>

<h2 id="evidence-for-reviews">Evidence for reviews</h2>

<p>I went through a security review for <a href="https://chatwithwork.com">Chat with Work</a> this year. The questions were fair:</p>

<ul>
  <li>What’s being backed up?</li>
  <li>Where does it go?</li>
  <li>Is it encrypted?</li>
  <li>When did the last backup run?</li>
  <li>When did the last repository check run?</li>
  <li>When was the last restore drill?</li>
  <li>Can you prove all of that without leaking secrets?</li>
</ul>

<p><code>kamal-backup evidence</code> prints redacted JSON: current backup settings, latest snapshots, latest restic check, latest restore drill, retention settings, tool versions.</p>

<pre><code class="language-sh">bundle exec kamal-backup evidence
</code></pre>

<p>Secrets are redacted. The output is meant to land in an internal ops record or a CASA packet. Not a screenshot of a green cron job. Actual evidence.</p>

<h2 id="try-it">Try it</h2>

<pre><code class="language-ruby"># Gemfile
gem "kamal-backup"
</code></pre>

<p>Docs at <a href="https://kamal-backup.dev">kamal-backup.dev</a>, source on <a href="https://github.com/crmne/kamal-backup">GitHub</a>.</p>

]]>
</content:encoded>

</item>



<item>

<title>Ruby Concurrency: What Actually Happens</title>

<link>https://paolino.me/ruby-concurrency-what-actually-happens/</link>

<guid isPermaLink="true">
https://paolino.me/ruby-concurrency-what-actually-happens/
</guid>

<pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[
Since I wrote about async Ruby and patched Solid Queue to support fibers, people keep asking the same questions. What happens when a fiber blocks? Don’t you still need threads? What about database transactions? What about Ractors?


]]>
</description>



<content:encoded>
<![CDATA[

<p>Since I wrote about <a href="/async-ruby-is-the-future/">async Ruby</a> and <a href="/solid-queue-doesnt-need-a-thread-per-job/">patched Solid Queue to support fibers</a>, people keep asking the same questions. What happens when a fiber blocks? Don’t you still need threads? What about database transactions? What about Ractors?</p>

<p>This post answers all of it. From the ground up.</p>

<h2 id="the-four-primitives">The four primitives</h2>

<p>Ruby gives you four concurrency primitives: processes, threads, fibers, and Ractors. They nest. Every process has an implicit “main Ractor” where your code runs by default, so you never have to think about Ractors unless you explicitly create one. Without Ractors, the hierarchy is simply process – threads – fibers. With Ractors, it becomes:</p>

<div class="mermaid">
graph TD
    P[Process] --&gt; R1["Ractor 1 (GVL 1)"]
    P --&gt; R2["Ractor 2 (GVL 2)"]
    R1 --&gt; T1[Thread 1]
    R1 --&gt; T2[Thread 2]
    R2 --&gt; T3[Thread 3]
    T1 --&gt; F1[Fiber A]
    T1 --&gt; F2[Fiber B]
    T2 --&gt; F3[Fiber C]
    T3 --&gt; F4[Fiber D]
    T3 --&gt; F5[Fiber E]
    style P fill:#4a90a4,color:#fff
    style R1 fill:#c084fc,color:#fff
    style R2 fill:#c084fc,color:#fff
    style T1 fill:#7fb069,color:#fff
    style T2 fill:#7fb069,color:#fff
    style T3 fill:#7fb069,color:#fff
    style F1 fill:#e8a87c,color:#fff
    style F2 fill:#e8a87c,color:#fff
    style F3 fill:#e8a87c,color:#fff
    style F4 fill:#e8a87c,color:#fff
    style F5 fill:#e8a87c,color:#fff
</div>

<p>Think of your computer as an office building.</p>

<p><strong>Processes</strong> are fully isolated: separate offices, each with its own locked door, furniture, and files. Each process has its own memory, its own Ruby VM, and its own GVL. When you run Puma with 3 workers, you get 3 processes. They can’t corrupt each other’s state because they don’t share memory. The OS schedules them independently. The cost: each one loads your entire application into memory.</p>

<p><strong>Ractors</strong> sit between processes and threads: offices that share a mailroom but not their filing cabinets. Each Ractor has its own GVL, so threads in different Ractors can execute Ruby code truly in parallel, but they can only pass notes to each other – no shared mutable objects. You communicate via message passing, copying or moving data between them. Every Ruby process has a “main Ractor” where all your code runs by default. Creating additional Ractors is opt-in.</p>

<p><strong>Threads</strong> live inside a process and share its memory: workers sharing the same office, accessing the same filing cabinets, coordinating to avoid collisions. In CRuby, they are native threads, with the GVL deciding which one can execute Ruby code at a time. You don’t control when Ruby switches between them. The GVL releases during I/O, so two threads can wait on two different network calls simultaneously, but they can’t crunch numbers at the same time.</p>

<p><strong>Fibers</strong> live inside a thread and are cooperatively scheduled: multiple tasks juggled by one worker at their desk. When they’re waiting for something – a phone call, a fax, a response – they set it aside and pick up the next task. A fiber runs until it explicitly yields. When it hits I/O – a network call, a database query, reading a file – it yields to the reactor, and another fiber picks up. No OS thread context switch for the fiber itself, no preemption. One thread can run thousands of fibers.</p>

<p>Here’s what that means for cost:</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>Process</th>
      <th>Ractor</th>
      <th>Thread</th>
      <th>Fiber</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Memory</td>
      <td>full app copy</td>
      <td>~thread + Ractor state</td>
      <td>~8MB virtual stack reservation</td>
      <td>~4KB initial virtual stack, grows as needed</td>
    </tr>
    <tr>
      <td>Creation time</td>
      <td>~ms</td>
      <td>~80μs</td>
      <td>~80μs</td>
      <td>~3μs</td>
    </tr>
    <tr>
      <td>Context switch</td>
      <td>kernel</td>
      <td>kernel (threads within)</td>
      <td>~1.3μs (kernel)</td>
      <td>~0.1μs (userspace)</td>
    </tr>
    <tr>
      <td>Isolation</td>
      <td>Full (own memory)</td>
      <td>Share-nothing (messages)</td>
      <td>Shared memory</td>
      <td>Shared thread</td>
    </tr>
    <tr>
      <td>Parallelism</td>
      <td>Yes</td>
      <td>Yes (own GVL)</td>
      <td>No (shared GVL)</td>
      <td>No</td>
    </tr>
    <tr>
      <td>I/O concurrency</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>Yes</td>
    </tr>
    <tr>
      <td>Rails compatible</td>
      <td>Yes</td>
      <td>No</td>
      <td>Yes</td>
      <td>Yes</td>
    </tr>
  </tbody>
</table>

<p>Creation and switching benchmarks are from <a href="https://github.com/socketry/performance/tree/adfd780c6b4842b9534edfa15e383e5dfd4b4137/fiber-vs-thread">Samuel Williams’ fiber-vs-thread performance comparison</a>. Fibers create 20x faster and switch 10x faster than threads. The memory row is about virtual address space reserved by the platform/runtime, not resident memory. The benchmark reports actual RSS, where the gap is much smaller than the virtual stack numbers suggest. But the shape is still real: each thread is a kernel object with scheduler state and a stack reservation, while each fiber is scheduled in userspace. Ractors give you parallelism too, but can’t run Rails. Everything is a tradeoff.</p>

<h2 id="how-scheduling-works">How scheduling works</h2>

<p>This is where most of the confusion lives. Let me show you what actually happens.</p>

<h3 id="thread-scheduling">Thread scheduling</h3>

<p>CRuby threads are native threads, but the GVL decides which one can run Ruby code. Your code has no say. A thread can be paused mid-calculation, mid-assignment, mid-anything.</p>

<div class="mermaid">
sequenceDiagram
    participant VM as CRuby / OS
    participant T1 as Thread 1
    participant T2 as Thread 2
    participant LLM as LLM API

    VM-&gt;&gt;T1: Run
    T1-&gt;&gt;LLM: Send request
    Note over T1: Blocks in I/O (parked)
    VM-&gt;&gt;T2: Run
    T2-&gt;&gt;LLM: Send request
    Note over T2: Blocks in I/O (parked)
    Note over VM: Both threads parked
    LLM--&gt;&gt;T1: Response ready
    LLM--&gt;&gt;T2: Response ready
    VM-&gt;&gt;T1: Wake and run
    Note over T1: Processing response
    VM-&gt;&gt;VM: Time slice expired
    VM-&gt;&gt;T2: Preempt T1, run T2
    Note over T2: Processing response
    VM-&gt;&gt;VM: Time slice expired
    VM-&gt;&gt;T1: Resume T1
    Note over T1: Finish response
    VM-&gt;&gt;T2: Resume T2
    Note over T2: Finish response
</div>

<p>CRuby can switch runnable threads on a time slice, but a thread blocked in I/O is parked until the socket is ready. That part matters: threads do not spin uselessly while waiting for tokens. The switch happens when a thread is runnable – including in the middle of response processing, object allocation, assignment, or any other Ruby code.</p>

<p>For two threads doing I/O, this works fine. The overhead is noise. For 200 threads mostly waiting for LLM tokens, the problem is the one-operation-per-thread shape: 200 kernel threads, 200 stack reservations, 200 scheduler entries, and usually 200 copies of whatever per-thread application resources the worker holds.</p>

<p>This is also why a worker limit means different things in Solid Queue’s current thread mode and in the fiber mode from my patch. <code>threads: 25</code> is both “run 25 jobs at once” and “create 25 kernel threads.” If all 25 jobs are streaming tokens, job 26 waits. <code>fibers: 250</code> is mostly an admission limit for the reactor: run up to 250 jobs as fibers on the same thread, park the ones waiting on I/O, and resume them when ready. You still need limits because APIs, sockets, memory, and databases have limits. But the cap is no longer tied to one kernel thread per job.</p>

<h3 id="cooperative-scheduling-fibers">Cooperative scheduling (fibers)</h3>

<p>Fibers switch only when they choose to. In practice, the <a href="https://github.com/socketry/async">async</a> gem makes this automatic: your code yields at I/O boundaries without you writing anything special.</p>

<div class="mermaid">
sequenceDiagram
    participant R as Reactor
    participant F1 as Fiber 1
    participant F2 as Fiber 2
    participant LLM as LLM API

    R-&gt;&gt;F1: Run
    F1-&gt;&gt;LLM: Send request
    Note over F1: Yields (I/O wait)
    R-&gt;&gt;F2: Run
    F2-&gt;&gt;LLM: Send request
    Note over F2: Yields (I/O wait)
    Note over R: Both waiting, reactor sleeps
    LLM--&gt;&gt;F1: Response ready
    R-&gt;&gt;F1: Resume immediately
    Note over F1: Processes response
    F1-&gt;&gt;R: Done
    LLM--&gt;&gt;F2: Response ready
    R-&gt;&gt;F2: Resume immediately
    Note over F2: Processes response
    F2-&gt;&gt;R: Done
</div>

<p>No OS thread context switch per fiber. No timer-based preemption between fibers. When a fiber yields, the reactor checks which fibers have I/O ready and resumes them. When nothing is ready, the reactor sleeps in the OS until something is. The kernel still does the I/O readiness work; Ruby just avoids one kernel thread per wait.</p>

<h2 id="the-gvl-why-threads-and-fibers-are-more-similar-than-you-think">The GVL: why threads and fibers are more similar than you think</h2>

<p>This is the part that makes thread-based Ruby less different from fiber-based Ruby than it first looks.</p>

<p>The GVL means only one thread can execute Ruby code at a time. Threads run in parallel only during I/O, when the GVL is released. So if your workload is I/O-bound – HTTP calls, database queries, LLM streaming – threads give you I/O concurrency, not parallelism.</p>

<p>Fibers give you the same I/O concurrency. One fiber yields at I/O, another picks up. The difference: fibers do it without kernel thread overhead, without the memory cost of a thread stack, and without making job concurrency itself imply one worker thread or one database slot per job.</p>

<p>If threads only help with I/O anyway, why pay their overhead?</p>

<p>There is one case where threads win: CPU-bound work that releases the GVL. Some C extensions (image processing, cryptographic operations) release the GVL while doing heavy computation. Multiple threads can then run those C extensions in parallel. Fibers can’t do that. They share a thread.</p>

<p>For actual Ruby-level CPU parallelism, you need processes or <a href="#why-not-ractors">Ractors</a>. Processes are production-ready and Rails-compatible. Ractors are lighter than processes, but still experimental.</p>

<h2 id="what-happens-when-a-fiber-hits-io">What happens when a fiber hits I/O</h2>

<p>This is the happy path and the most common question.</p>

<pre><code class="language-ruby"># Inside a fiber
response = Net::HTTP.get(URI("https://api.example.com/v1/completions"))
</code></pre>

<p>Here’s the full chain:</p>

<ol>
  <li><code>Net::HTTP</code> opens a socket and sends the request</li>
  <li>The socket isn’t readable yet (the server hasn’t responded)</li>
  <li>Ruby calls <code>rb_io_wait</code> on the socket</li>
  <li>The async gem’s <code>Fiber.scheduler</code> intercepts this call</li>
  <li>The scheduler suspends the current fiber and registers the socket with the event loop</li>
  <li>The reactor runs other fibers while this one sleeps</li>
  <li>When the socket becomes readable, the reactor resumes this fiber</li>
  <li><code>Net::HTTP</code> reads the response as if nothing happened</li>
</ol>

<p>Your code doesn’t change. No <code>await</code>, no callbacks, no promises. The same <code>Net::HTTP.get</code> call that works in a thread works in a fiber. The yield is invisible.</p>

<p>Bob Nystrom called this <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">the function color problem</a> in 2015. In languages with async/await, every function is either sync or async. An async function can only be called with <code>await</code>, and <code>await</code> can only live inside another async function. The color spreads upward through your entire call stack.</p>

<p><strong>Python:</strong></p>

<pre><code class="language-python"># Python: the color spreads, and you need different libraries
async def get_user(id):
    async with aiohttp.ClientSession() as session:  # can't use requests
        response = await session.get(f"/users/{id}")  # must await
        return await response.json()                   # must await

async def handle_request():  # must be async because it calls get_user
    user = await get_user(1)  # must await
</code></pre>

<p>You can’t use <code>requests</code> in async Python without blocking the event loop. You need <code>aiohttp</code>, <code>httpx</code> in async mode, or a thread wrapper. You can’t use the blocking <code>psycopg2</code> API as async I/O; you need <code>asyncpg</code> or Psycopg’s async API. The ecosystem splits: sync libraries and async libraries, doing the same thing differently.</p>

<p><strong>JavaScript:</strong></p>

<pre><code class="language-javascript">// JavaScript: same problem, less severe (Node has fewer library splits)
async function getUser(id) {
  const response = await fetch(`/users/${id}`);  // must await
  return await response.json();                   // must await
}

async function handleRequest() {  // must be async
  const user = await getUser(1);  // must await
}
</code></pre>

<p><strong>Ruby:</strong></p>

<pre><code class="language-ruby"># Ruby: no color
def get_user(id)
  response = Net::HTTP.get(URI("https://api.example.com/users/#{id}"))  # just a normal call
  JSON.parse(response)                            # just a normal call
end

def handle_request
  user = get_user(1)  # just a normal call
end
</code></pre>

<p>Same <code>Net::HTTP</code>. Same <code>pg</code>. Same call stack, as long as the library uses scheduler-aware Ruby I/O. The fiber scheduler intercepts I/O at the Ruby runtime level, below your code. Your methods don’t know and don’t care whether they’re running in a thread or a fiber.</p>

<h2 id="what-happens-when-a-fiber-does-cpu-bound-work">What happens when a fiber does CPU-bound work</h2>

<pre><code class="language-ruby"># Inside a fiber
100_000.times { Digest::SHA256.hexdigest("work") }
</code></pre>

<p>This blocks the reactor. No other fiber runs until it finishes. There’s no I/O boundary to yield at, so the fiber holds the thread.</p>

<div class="mermaid">
sequenceDiagram
    participant R as Reactor
    participant F1 as Fiber 1 (CPU)
    participant F2 as Fiber 2 (I/O)

    R-&gt;&gt;F1: Run
    Note over F1,F2: F1 doing CPU work...
    Note over F2: Waiting to run
    Note over F1,F2: F1 still computing...
    Note over F2: Still waiting
    F1-&gt;&gt;R: Done
    R-&gt;&gt;F2: Finally runs
</div>

<p>This is not a bug. It’s the current tradeoff of cooperative scheduling. Fibers are designed for I/O-bound work; CPU-bound work belongs on a thread, where CRuby can preempt it.</p>

<p>With <a href="/solid-queue-doesnt-need-a-thread-per-job/">my fiber-mode patch for Solid Queue</a>, this is a configuration choice:</p>

<pre><code class="language-yaml">workers:
  - queues: [ chat, turbo, notifications ]
    fibers: 50       # I/O-bound: use fibers
  - queues: [ cpu ]
    threads: 2        # CPU-bound: use threads
</code></pre>

<p>One backend, two modes, matching the concurrency model to the workload.</p>

<h2 id="what-happens-when-a-fiber-queries-the-database">What happens when a fiber queries the database</h2>

<p>The <a href="https://github.com/ged/ruby-pg">pg gem</a> has supported <code>Fiber.scheduler</code> since v1.3.0. When a fiber executes a query, the pg gem sends it non-blockingly via <code>PQsendQuery</code>, then calls <code>rb_io_wait</code> on the PostgreSQL socket. The scheduler intercepts this, suspends the fiber, and lets others run while PostgreSQL processes the query.</p>

<pre><code class="language-ruby"># Inside a fiber
user = User.find(42)  # yields while waiting for PostgreSQL
</code></pre>

<p>The fiber yields. Other fibers run. When PostgreSQL responds, the reactor resumes the fiber. Your code doesn’t know the difference.</p>

<h3 id="pool-size-follows-database-work">Pool size follows database work</h3>

<p>A database connection is busy until its query finishes. While PostgreSQL works, Ruby can run something else – another thread, or another fiber on the reactor – but that connection stays checked out.</p>

<p>For an LLM job, most of the wall time is not database time. Read a row, call an API, stream tokens, write a status update. The database touches are short. The long waits are external HTTP. So 100 jobs in flight does not mean 100 jobs hitting PostgreSQL at the same instant.</p>

<p>The reactor never preempts a fiber – it only switches when a fiber yields at an I/O boundary:</p>

<div class="mermaid">
sequenceDiagram
    participant R as Reactor
    participant F1 as Fiber A
    participant F2 as Fiber B
    participant Pool as DB Pool (1 conn)
    participant PG as PostgreSQL
    participant HTTP as HTTP API

    R-&gt;&gt;F1: Run
    F1-&gt;&gt;Pool: Check out
    F1-&gt;&gt;PG: SELECT * FROM users
    Note over F1: Yields (waiting for PG)
    R-&gt;&gt;F2: Run
    F2-&gt;&gt;HTTP: GET /api/data
    Note over F2: Yields (waiting for HTTP)
    PG--&gt;&gt;R: F1's result ready
    R-&gt;&gt;F1: Resume
    F1-&gt;&gt;Pool: Return
    F1-&gt;&gt;R: Done
    HTTP--&gt;&gt;R: F2's result ready
    R-&gt;&gt;F2: Resume
    F2-&gt;&gt;Pool: Check out
    F2-&gt;&gt;PG: UPDATE messages SET ...
    Note over F2: Yields (waiting for PG)
    PG--&gt;&gt;R: F2's result ready
    R-&gt;&gt;F2: Resume
    F2-&gt;&gt;Pool: Return
    F2-&gt;&gt;R: Done
</div>

<p>Read this as a timeline. Fiber A uses the only connection for its query. While PostgreSQL works, Fiber B waits on HTTP. After Fiber A returns the connection, Fiber B can use it for its update. If both fibers tried to query at the same time, one would wait unless the pool had another connection.</p>

<p>Active Record follows the same checkout rules in both cases. The current Solid Queue difference is a guardrail: thread mode expects <code>threads + 2</code> connections per process, so you don’t run 50 execution threads against a 5-connection pool. Fiber mode can use a smaller baseline because <code>fibers: 100</code> means “allow 100 jobs to wait,” not “create 100 execution threads.” In my patch, I/O-heavy workers often start at 3 connections per process (1 execution + 2 worker overhead). If the jobs are DB-heavy, raise it.</p>

<h2 id="what-happens-when-a-fiber-starts-a-transaction">What happens when a fiber starts a transaction</h2>

<p>A transaction changes the timeline. The connection cannot be returned after each statement, because the transaction state lives on that connection.</p>

<p>When a fiber starts a transaction, it keeps its checked-out connection for the entire duration – from <code>BEGIN</code> to <code>COMMIT</code> or <code>ROLLBACK</code>. The connection is not released mid-transaction. Other fibers that need the database wait for the connection to be returned.</p>

<div class="mermaid">
sequenceDiagram
    participant R as Reactor
    participant F1 as Fiber A
    participant F2 as Fiber B
    participant Pool as DB Pool (1 conn)
    participant PG as PostgreSQL

    R-&gt;&gt;F1: Run
    F1-&gt;&gt;Pool: Check out
    F1-&gt;&gt;PG: BEGIN
    F1-&gt;&gt;PG: UPDATE accounts SET ...
    Note over F1: Yields (waiting for PG)
    R-&gt;&gt;F2: Run
    F2-&gt;&gt;Pool: Check out
    Note over F2: Waits (connection held by F1)
    PG--&gt;&gt;F1: Result
    R-&gt;&gt;F1: Resume
    F1-&gt;&gt;PG: COMMIT
    F1-&gt;&gt;Pool: Return
    F1-&gt;&gt;R: Done
    Pool-&gt;&gt;F2: Connection available
    F2-&gt;&gt;PG: SELECT * FROM accounts
    Note over F2: Yields (waiting for PG)
    PG--&gt;&gt;F2: Result
    R-&gt;&gt;F2: Resume
    F2-&gt;&gt;Pool: Return
    F2-&gt;&gt;R: Done
</div>

<p>Under fiber isolation (<code>config.active_support.isolation_level = :fiber</code>), Active Support’s execution state is fiber-scoped, so Active Record’s lease is associated with the current fiber instead of the surrounding thread. The connection still gets a real <code>Monitor</code> lock. No other fiber can touch it during a transaction.</p>

<p>Safe. No interleaving. Fiber B just waits.</p>

<p>For the target workload – LLM streaming, HTTP calls – database touches are short reads and status updates. Transactions are brief. The wait is negligible. If your jobs run long transactions, those jobs belong on a thread-based worker.</p>

<h2 id="what-happens-when-you-have-too-many-fibers">What happens when you have too many fibers</h2>

<p>Fibers aren’t free. Each one uses memory (~4KB), and each one might hold open connections to external services. If you spawn 10,000 fibers that all hit the same API, you’re opening 10,000 connections to that API. The API will not be happy.</p>

<p>Async doesn’t eliminate resource limits; it changes where they show up. With threads, the limit is explicit: 25 threads, 25 concurrent jobs. With fibers, the limit is implicit: you keep going until something else breaks.</p>

<p>The fix is a semaphore. The <code>FiberPool</code> in my Solid Queue patch uses one:</p>

<pre><code class="language-ruby">semaphore = Async::Semaphore.new(size)

# Only `size` fibers run concurrently
semaphore.async do
  perform_job
end
</code></pre>

<p>When you configure <code>fibers: 100</code> with the patch, that’s not “unlimited fibers.” It’s a semaphore capping concurrency at 100. You control the ceiling.</p>

<h2 id="why-not-just-configure-more-solid-queue-threads">“Why not just configure more Solid Queue threads?”</h2>

<p>In plain Ruby, more threads can be reasonable. In Solid Queue thread mode, <code>threads: 200</code> means more than “allow 200 jobs to wait on I/O.”</p>

<p><strong>Kernel threads are the expensive unit.</strong> Fibers don’t make I/O complete faster; they let you wait on far more of it at once for a fraction of the cost. <a href="https://github.com/socketry/performance/tree/adfd780c6b4842b9534edfa15e383e5dfd4b4137/fiber-vs-thread">Samuel Williams’ benchmarks</a> show fibers allocate 20x faster (~3μs vs ~80μs) and switch 10x faster (~0.1μs vs ~1.3μs) than threads. The OS can manage thousands of threads, but scheduler state, stack reservations, wakeups, and GVL coordination make that a poor default concurrency knob.</p>

<p><strong>Solid Queue currently enforces a database-pool guard.</strong> Today it expects <code>threads + 2</code> database connections per process, so 200 threads across 2 processes won’t boot unless the pool is at least 404. That guard may be conservative for I/O-heavy jobs; <a href="https://github.com/rails/solid_queue/issues/736">there’s an open issue</a> about making it advisory or bypassable. But it is still a guard you hit today.</p>

<p><strong>A blocked job still occupies its worker thread.</strong> The OS can park an LLM streaming thread until the socket is ready, but in Solid Queue thread mode it still consumes one of the configured thread workers. If all 25 are streaming tokens, job 26 waits.</p>

<p>Fibers make the Solid Queue limit mean “how many jobs may wait at once” instead of “how many kernel threads should exist.” They still need limits, but the limit is no longer one kernel thread per waiting job.</p>

<h2 id="why-not-ractors">“Why not Ractors?”</h2>

<p>Ractors solve a different problem. Fibers give you I/O concurrency – many things waiting at once. Ractors give you CPU parallelism – many things computing at once.</p>

<p>Here’s what they look like:</p>

<pre><code class="language-ruby"># Two Ractors computing fibonacci in parallel
r1 = Ractor.new { fibonacci(38) }
r2 = Ractor.new { fibonacci(38) }

r1.value  # Ruby 4.0+
r2.value  # Both ran in parallel, each with their own GVL
</code></pre>

<p>Each Ractor has its own GVL, so they can execute Ruby code truly in parallel across CPU cores. The tradeoff: strict isolation. You can only share immutable (frozen) objects. Everything else gets copied or moved between Ractors via message passing. Access a mutable variable from an outer scope? <code>Ractor::IsolationError</code>.</p>

<p>When Ractors win, they win big. Fibonacci(38) five times: 0.68s with Ractors vs 2.26s sequential. 3.3x speedup. Real parallelism.</p>

<p>But they are not a practical answer for Rails jobs yet:</p>

<ul>
  <li><strong>Still experimental in Ruby 4.0.</strong> Creating a Ractor still emits the experimental API warning.</li>
  <li><strong>Many gems don’t work without changes.</strong> Gems that rely on mutable constants, global variables, class variables, or shared process state can hit <code>Ractor::IsolationError</code>.</li>
  <li><strong>No Rails integration.</strong> ActiveRecord, ActionCable, the router, the logger – Rails is built on shared mutable state. None of it runs inside a Ractor.</li>
  <li><strong>No Ractor-based job queue exists.</strong></li>
  <li><strong>Still active bug surface.</strong> The Ruby bug tracker still has Ractor-related issues, including recent crash reports.</li>
</ul>

<p>For I/O concurrency, Ractors don’t help at all. Each Ractor still has threads constrained by its own GVL. Fibers within those threads still do the actual I/O multiplexing. Ractors add CPU parallelism, which is not what LLM streaming needs.</p>

<p>For Rails jobs that need CPU parallelism today, processes are still the boring answer. Puma already uses that model for web workers. Ractors may become useful for isolated CPU-heavy Ruby work, but they are not the answer to this Solid Queue I/O problem.</p>

<h2 id="isnt-this-just-what-javascript-does">“Isn’t this just what JavaScript does?”</h2>

<p>No. I showed the <a href="#what-happens-when-a-fiber-hits-io">code comparison above</a>. JavaScript’s async/await is a colored concurrency model: the <code>async</code> keyword spreads upward through every caller. Ruby’s fibers are colorless: your existing code works unchanged, and the scheduler handles yields below your code.</p>

<p>There’s a deeper difference too. JavaScript async/await runs on an event loop. Ruby fibers run on top of a multi-threaded runtime. You can have multiple Ruby threads, each running its own reactor with its own fibers, and mix fibers and threads in the same application. Node can run JavaScript in parallel with <code>worker_threads</code>, but that’s a worker/isolate model, not the same thing as putting multiple reactors inside ordinary application threads.</p>

<h2 id="isnt-this-just-what-go-does">“Isn’t this just what Go does?”</h2>

<p>Closer. Goroutines are lightweight, runtime-scheduled, and multiplexed across OS threads. Conceptually similar to Ruby fibers, but Go’s scheduler can also preempt goroutines.</p>

<p>Two differences:</p>

<ol>
  <li>
    <p><strong>Go has true parallelism.</strong> Goroutines run across multiple OS threads with no GVL equivalent. CPU-bound goroutines run in parallel. Ruby fibers don’t.</p>
  </li>
  <li>
    <p><strong>Ruby has existing code.</strong> If you have a Rails application with hundreds of thousands of lines of Ruby, you can add fiber-based concurrency without rewriting anything. Your models, your controllers, your views, your gems – they all work. With Go, you’re rewriting.</p>
  </li>
</ol>

<p>If you’re starting from scratch and need both I/O concurrency and CPU parallelism, Go is a strong choice. If you have a Ruby application and need I/O concurrency, fibers give you that without a rewrite.</p>

<h2 id="fibers-need-async-do-blocks-thats-still-new-syntax">“Fibers need <code>Async do</code> blocks. That’s still new syntax.”</h2>

<p>Someone on <a href="https://news.ycombinator.com/item?id=44516555">Hacker News</a> called this out: I said “no async/await” but the examples show <code>Async do</code> and <code>.wait</code>.</p>

<p>Here’s the actual change:</p>

<pre><code class="language-ruby"># Before
chat = RubyLLM.chat
response = chat.ask("Hello")

# After
Async do
  chat = RubyLLM.chat
  response = chat.ask("Hello")
end
</code></pre>

<p>Two lines of wrapping. Your application code inside doesn’t change. Your models don’t change. Your gems don’t change. Nothing gets a new keyword.</p>

<p>In Python, adopting async means rewriting every function signature in the call chain to <code>async def</code>, adding <code>await</code> to every call, and replacing or wrapping blocking libraries. <code>requests</code> becomes <code>aiohttp</code> or async <code>httpx</code>. Blocking database APIs become async database APIs. Your test framework changes. Your middleware changes. It’s a rewrite.</p>

<p>Two lines of wrapping vs. rewriting your stack. That’s not even the same conversation.</p>

<h2 id="when-to-use-what">When to use what</h2>

<div class="mermaid">
flowchart TD
    A[What kind of work?] --&gt; B{CPU-bound?}
    B --&gt;|Yes| C{Need parallelism?}
    C --&gt;|Yes| D{Rails?}
    D --&gt;|Yes| E[Processes]
    D --&gt;|No| H[Ractors]
    C --&gt;|No| F[Threads]
    B --&gt;|No| I[Fibers]

    style E fill:#4a90a4,color:#fff
    style H fill:#c084fc,color:#fff
    style F fill:#7fb069,color:#fff
    style I fill:#e8a87c,color:#fff
</div>

<ul>
  <li><strong>I/O-bound work</strong> (LLM streaming, HTTP calls, webhooks, email delivery): <strong>fibers.</strong> Low overhead, high concurrency, database connections sized to database work rather than waiting jobs.</li>
  <li><strong>CPU-bound work</strong> (image processing, data crunching, PDF generation): <strong>threads.</strong> CRuby can preempt them, and C extensions can release the GVL for parallelism.</li>
  <li><strong>CPU parallelism with Rails</strong>: <strong>processes.</strong> Each one gets its own GVL, its own memory, its own everything. Puma already does this.</li>
  <li><strong>CPU parallelism without Rails</strong>: <strong>Ractors</strong> (when they graduate from experimental). Lighter than processes, true parallelism, but strict isolation means most gems don’t work.</li>
  <li><strong>All of them at once</strong>: that’s what a well-configured Rails app does. Puma forks processes. Each process runs threads. Fibers run inside those threads for I/O-heavy jobs. They coexist.</li>
</ul>

<pre><code class="language-yaml"># Solid Queue with the fiber-mode patch: all three working together
workers:
  - queues: [ chat, turbo ]
    fibers: 50        # I/O-bound: fibers
    processes: 2       # parallelism: processes
  - queues: [ pdf, images ]
    threads: 4         # CPU-bound: threads
    processes: 1
</code></pre>

<p>No single model is universally better. The right answer is matching the model to the workload.</p>

<hr />

<p>This covers every “what happens when” question I’ve gotten so far. If I missed yours, <a href="https://twitter.com/paolino">find me on Twitter</a>; I’ll either update this post or write a follow-up.</p>

]]>
</content:encoded>

</item>



<item>

<title>Making the Rails Default Job Queue Fiber-Based</title>

<link>https://paolino.me/solid-queue-doesnt-need-a-thread-per-job/</link>

<guid isPermaLink="true">
https://paolino.me/solid-queue-doesnt-need-a-thread-per-job/
</guid>

<pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[
Last year I moved the LLM streaming jobs in Chat with Work to Async::Job. It was fast. Genuinely fast. Fiber-based execution with Redis, thousands of concurrent jobs on a single thread. I was so convinced that I wrote a whole post about why async Ruby is the future for AI apps...
]]>
</description>



<content:encoded>
<![CDATA[

<p>Last year I moved the LLM streaming jobs in <a href="https://chatwithwork.com">Chat with Work</a> to <a href="https://github.com/socketry/async-job">Async::Job</a>. It was fast. Genuinely fast. Fiber-based execution with Redis, thousands of concurrent jobs on a single thread. I was so convinced that I <a href="/async-ruby-is-the-future/">wrote a whole post</a> about why async Ruby is the future for AI apps and recommended it to everyone.</p>

<p>Then I started hitting walls.</p>

<p>Async::Job doesn’t persist jobs. They go into Redis and they’re gone. <a href="https://github.com/rails/mission_control-jobs">Mission Control</a> shows nothing. Background jobs in Rails are already quieter than the rest of your application – they fail without anyone noticing unless you go looking. Even with Honeybadger catching exceptions, I still want to see the full picture: which jobs are queued, which are running, which failed, what the system looks like right now. Without job persistence, you don’t get that.</p>

<p>Solid Queue is the default in Rails 8. Every new Rails app ships with it. When someone picks up Rails to build an LLM application and their 25-thread worker pool can only handle 25 concurrent streaming conversations, the answer shouldn’t be “swap your entire job backend.” It should be “change one line of config.”</p>

<p>So I <a href="https://github.com/rails/solid_queue/pull/728">opened a PR</a>.</p>

<h2 id="threads-vs-fibers-quickly">Threads vs fibers, quickly</h2>

<p>If you already know this, <a href="#the-switch">skip ahead to the config</a>.</p>

<p>Solid Queue runs each job on its own thread. Those threads can all query the database concurrently, so the worker has to be configured for that worst case, plus stack memory and kernel thread overhead. For a job that crunches data for 30 seconds, that’s fine – the thread is busy. For a job that streams an LLM response for 30 seconds but spends 99% of that time waiting for tokens, the thread is just sitting there holding resources.</p>

<p>Fibers sidestep much of this. Cooperatively scheduled, running in userspace on a single thread. When a fiber hits I/O – a network call, a database query, waiting for the next token – it steps aside and another fiber picks up. One thread, hundreds of concurrent jobs. No kernel thread overhead per job, and database pool sizing follows actual database concurrency rather than the number of jobs waiting on I/O. Rails 7.2+ helps ordinary Active Record code release connections after query operations, but that behavior is not fiber-specific. The <a href="https://github.com/socketry/async">async</a> gem handles the yielding for you: your code yields at I/O boundaries without you changing anything.</p>

<p>For the full deep dive – processes, threads, fibers, the GVL, I/O multiplexing – see <a href="/async-ruby-is-the-future/">Async Ruby is the Future</a>.</p>

<h2 id="the-switch">The switch</h2>

<p>While the PR gets approved, you can point your Gemfile at the branch:</p>

<pre><code class="language-ruby"># Gemfile
gem "solid_queue", git: "https://github.com/crmne/solid_queue.git", branch: "async-worker-execution-mode"
</code></pre>

<p>Then one config change:</p>

<pre><code class="language-yaml"># config/solid_queue.yml
production:
  workers:
    - queues: ["*"]
      # threads: 10
      fibers: 100  # &lt;- that's it
      processes: 2
</code></pre>

<p>Your jobs don’t change. Your queue doesn’t change. The worker runs them as fibers instead of threads.</p>

<p><code>threads</code> or <code>fibers</code>. Pick one per worker. One more thing in your Rails app:</p>

<pre><code class="language-ruby"># config/application.rb
config.active_support.isolation_level = :fiber  # required for fibers
</code></pre>

<p>Fibers share a thread, so they need fiber-scoped state instead of the default thread-scoped state. The patch checks this at boot and tells you if it’s wrong.</p>

<h2 id="under-the-hood">Under the hood</h2>

<p>The core of the patch is <code>FiberPool</code>. One thread, one <a href="https://github.com/socketry/async">async</a> reactor, a semaphore capping concurrency at whatever number you set:</p>

<pre><code class="language-ruby">def start_reactor
  create_thread do
    Async do |task|
      semaphore = Async::Semaphore.new(size, parent: task)
      boot_queue &lt;&lt; :ready

      wait_for_executions(semaphore)
      wait_for_inflight_executions
    end
  end
end
</code></pre>

<p>When the worker picks up jobs, it hands them to the pool. Each one becomes a fiber:</p>

<pre><code class="language-ruby">def schedule_pending_executions(semaphore)
  while execution = next_pending_execution
    semaphore.async(execution) do |_execution_task, scheduled_execution|
      perform_execution(scheduled_execution)
    end
  end
end
</code></pre>

<p>Each job runs as a fiber. When it hits I/O, it yields. The reactor picks up another fiber. One thread, hundreds of jobs, switching at I/O boundaries instead of depending on thread preemption.</p>

<p>CPU-bound work gets nothing from fibers. They don’t parallelize computation. But most of what job queues do is wait on I/O, and that’s exactly where fibers win. If a CPU-bound fiber blocks the reactor, Solid Queue’s supervisor still runs fine on its own process.</p>

<h2 id="the-database-connection-math">The database connection math</h2>

<p>I <a href="/async-ruby-is-the-future/">wrote about this last year</a>:</p>

<blockquote>
  <p>For 1000 concurrent conversations using traditional job queues like SolidQueue or Sidekiq, you’d need 1000 worker slots. That means 1000 kernel threads across your worker fleet, plus enough database pool capacity for whatever fraction of those jobs can hit the database at the same time. Even when the jobs are 99% idle waiting for streaming tokens, the thread resources are still reserved.</p>
</blockquote>

<p>That framing is about worker resources, not a special Active Record rule. Active Record 7.2 connection handling is not different for threads and fibers; the important part in the patch is Solid Queue’s worker-pool sizing and the amount of simultaneous database work. Here’s the actual math from the patch.</p>

<p>A Solid Queue worker needs database connections for three things: polling for jobs, heartbeats, and running jobs. In thread mode, the configured concurrency is <code>threads</code>, and Solid Queue’s current guard treats each execution thread as potentially needing its own connection, plus two for the worker itself. That’s <code>threads + 2</code>. Actual Active Record usage may be lower for jobs that only touch the database in short bursts, but the configured pool still has to satisfy the guard.</p>

<p>With fibers, all job fibers run on one reactor thread, and the patch sizes the execution side for expected database concurrency instead of job concurrency. For the common LLM job shape – long waits, short database bursts – the minimum is often <code>1 + 2 = 3</code>: one execution connection, plus two for the worker itself. If your jobs are DB-heavy, use long transactions, or pin connections with APIs like <code>ActiveRecord::Base.connection</code>, increase the pool and fibers will check out separate connections concurrently, just like threads.</p>

<p>Same job concurrency, very different configured pool requirements:</p>

<table>
  <thead>
    <tr>
      <th>Concurrent jobs</th>
      <th>Thread-mode DB pool guard (per process)</th>
      <th>Fiber-mode baseline (per process)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>10</td>
      <td>12</td>
      <td>3</td>
    </tr>
    <tr>
      <td>25</td>
      <td>27</td>
      <td>3</td>
    </tr>
    <tr>
      <td>50</td>
      <td>52</td>
      <td>3</td>
    </tr>
    <tr>
      <td>100</td>
      <td>102</td>
      <td>3</td>
    </tr>
    <tr>
      <td>200</td>
      <td>202</td>
      <td>3</td>
    </tr>
  </tbody>
</table>

<p>The thread-mode guard scales linearly. The fiber-mode baseline stays flat for I/O-heavy jobs. Multiply by the number of worker processes and the gap gets dramatic: 6 processes with 50 concurrent jobs means 312 configured connections for thread mode, 18 for fiber. PostgreSQL’s default <code>max_connections</code> is 100.</p>

<p>The patch detects your Rails version and calculates the right pool size automatically.</p>

<p>The benchmarks below use two pool policies. The primary Solid Queue comparison deliberately gives both modes the same pool, <code>DB_POOL = concurrency + 5</code> per worker process, so it measures the executor instead of measuring pool starvation. The stress suite uses mode-specific pools to show the operational failure envelope under higher connection demand.</p>

<h2 id="the-benchmarks">The benchmarks</h2>

<p>I reran the benchmark suite on April 28, 2026. The headline Solid Queue comparison covers four workloads across per-process concurrency 5, 10, 25, 50, and 100; process counts 1, 2, and 6; and both execution modes. Three runs per cell, median real run reported, with total concurrency capped at 60 so the main comparison stays about executor behavior.</p>

<p>The workloads:</p>

<ul>
  <li><strong>Sleep</strong>: 50ms <code>Kernel.sleep</code>. Pure cooperative wait. The I/O upper bound.</li>
  <li><strong>Async HTTP</strong>: HTTP request to a local server with 50ms delay via <a href="https://github.com/socketry/async-http">Async::HTTP</a>. Real fiber-friendly I/O.</li>
  <li><strong>CPU</strong>: 50,000 SHA256 iterations. Pure computation. The control.</li>
  <li><strong>RubyLLM Stream</strong>: Actual <a href="https://rubyllm.com">RubyLLM</a> chat completion through a fake OpenAI SSE endpoint, with token-by-token Turbo Stream broadcasts. 40 tokens at 20ms each. The closest thing to a production AI workload you can benchmark repeatably.</li>
</ul>

<h3 id="results">Results</h3>

<table>
  <thead>
    <tr>
      <th>Workload</th>
      <th>Best throughput</th>
      <th>Avg paired delta</th>
      <th>Best paired delta</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>RubyLLM Stream</td>
      <td>fiber, 7.01 j/s</td>
      <td><strong>+11.9%</strong></td>
      <td><strong>+21.8%</strong></td>
    </tr>
    <tr>
      <td>Async HTTP</td>
      <td>fiber, 492.82 j/s</td>
      <td><strong>+9.5%</strong></td>
      <td><strong>+25.5%</strong></td>
    </tr>
    <tr>
      <td>Sleep</td>
      <td>fiber, 500.50 j/s</td>
      <td><strong>+7.4%</strong></td>
      <td><strong>+15.9%</strong></td>
    </tr>
    <tr>
      <td>CPU</td>
      <td>fiber, 110.02 j/s</td>
      <td>+0.6%</td>
      <td>+2.4%</td>
    </tr>
  </tbody>
</table>

<p>RubyLLM Stream is the workload that matters. It runs an actual <a href="https://rubyllm.com">RubyLLM</a> chat completion with streaming, database writes, and Turbo broadcasts per token – the same thing <a href="https://chatwithwork.com">Chat with Work</a> does in production. Fiber wins every single paired experiment there: 9 out of 9.</p>

<p>The CPU row is the control. Fibers don’t help computation, and the average confirms it: essentially flat. That’s how you know the I/O gains are real and not measurement noise.</p>

<p>That table shows the best observed point and the paired-cell deltas. Here’s the full spread. Some configurations favor threads for synthetic workloads, but the paired averages are the steadier signal: fiber wins the I/O workloads, and RubyLLM Stream always favors fiber.</p>

<p><img src="/images/solid-queue-headline-fiber-vs-thread.svg" alt="Solid Queue fiber over thread throughput ranges across all workloads." /></p>

<p>The newer suite also adds database-shaped workloads. With matched pools, short DB bursts still favor fiber: <code>db_queries</code> averages +12.6%, and a read/API/write mix averages +6.9%. The transaction case is the useful caveat: when each job pins a connection for the whole transaction, fiber still averages +3.5%, but the win is less consistent. That’s exactly the workload where you should be more careful with pool sizing.</p>

<h2 id="thread-mode-hit-the-wall">Thread mode hit the wall</h2>

<p>Those benchmarks cap total concurrency at 60. I wanted to see what breaks when you push past that, so I ran a stress suite: per-process concurrency 25, 50, 100, 150, and 200; process counts 2 and 6; three runs per cell. Read this as a current Solid Queue failure-envelope test, not a universal law about threads and fibers.</p>

<p>The result is stark. Thread mode only completed the smallest cell for each workload. Fiber mode completed every planned cell.</p>

<table>
  <thead>
    <tr>
      <th>Workload</th>
      <th>Thread cells completed</th>
      <th>Fiber cells completed</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Sleep</td>
      <td>1/10</td>
      <td>10/10</td>
    </tr>
    <tr>
      <td>Async HTTP</td>
      <td>1/10</td>
      <td>10/10</td>
    </tr>
    <tr>
      <td>RubyLLM Stream</td>
      <td>1/10</td>
      <td>10/10</td>
    </tr>
  </tbody>
</table>

<p><img src="/images/solid-queue-stress-cell-status.svg" alt="Solid Queue stress cell status." /></p>

<p>PostgreSQL’s default <code>max_connections</code> is 100. In this stress run, thread mode at concurrency 50 with 2 processes asked for 110 worker-pool connections. With 6 processes, even concurrency 25 asked for 180. The one surviving thread cell was the smallest: concurrency 25, 2 processes.</p>

<p>Fiber mode in the stress suite used a smaller mode-specific pool: 6 connections per process for 2-process runs, 10 per process for 6-process runs. That is 60 worker-pool connections at concurrency 200 across 6 processes, while thread mode would ask for 1,230. The exact constants are benchmark policy, but the shape is the point for this worker design: thread mode’s required configured pool scales with thread concurrency; fiber mode’s baseline scales with worker process overhead plus actual database concurrency.</p>

<h2 id="one-backend-two-modes">One backend, two modes</h2>

<p>Fiber mode isn’t universally better. CPU-bound jobs get nothing from it. C extensions that aren’t fiber-safe won’t work. And that’s fine – you don’t have to pick one.</p>

<p>As Trevor Turk pointed out in the PR discussion, that’s the whole point: separately configured worker pools. Here’s what <a href="https://chatwithwork.com">Chat with Work</a> actually runs in production:</p>

<pre><code class="language-yaml">workers:
  - queues: [ chat ]
    fibers: 10
    processes: 2
    polling_interval: 0.1
  - queues: [ turbo ]
    fibers: 10
    processes: 1
    polling_interval: 0.05
  - queues: [ notifications, default, maintenance ]
    fibers: 5
    processes: 1
    polling_interval: 0.2
  - queues: [ cpu ]
    threads: 1
    processes: 1
</code></pre>

<p>Almost everything uses fibers. LLM streaming, Turbo broadcasts, notifications, maintenance jobs – all fiber-based. Only the <code>cpu</code> queue uses threads, and right now it’s just one thread for the occasional heavy extraction. One backend. One deployment. <a href="https://github.com/rails/mission_control-jobs">Mission Control</a> shows all of it.</p>

<p>Instead of running Solid Queue and Async::Job side by side – two processors, two configurations, two sets of things to monitor – you run one. I moved <a href="https://chatwithwork.com">Chat with Work</a> to this setup, and Brad Gessler has been running it in production too.</p>

<p>Async::Job is actually faster if you compare raw throughput against Redis. It is a backend comparison, not a Solid Queue executor comparison, but the ceiling is useful:</p>

<table>
  <thead>
    <tr>
      <th>Workload</th>
      <th>Solid Queue fiber best</th>
      <th>Async::Job best</th>
      <th>Delta</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>RubyLLM Stream</td>
      <td>7.01 j/s</td>
      <td>16.94 j/s</td>
      <td>+141.7%</td>
    </tr>
    <tr>
      <td>Async HTTP</td>
      <td>492.82 j/s</td>
      <td>652.96 j/s</td>
      <td>+32.5%</td>
    </tr>
    <tr>
      <td>Sleep</td>
      <td>500.50 j/s</td>
      <td>644.98 j/s</td>
      <td>+28.9%</td>
    </tr>
    <tr>
      <td>CPU</td>
      <td>110.02 j/s</td>
      <td>125.75 j/s</td>
      <td>+14.3%</td>
    </tr>
  </tbody>
</table>

<p><img src="/images/solid-queue-headline-asyncjob-vs-fiber.svg" alt="Async::Job over Solid Queue fiber throughput ranges." /></p>

<p>If you want raw speed and don’t need persistence, Async::Job is the right call. But if you want job visibility, failure tracking, retries, Mission Control, everything Rails gives you out of the box, fiber mode gets you there. Same concurrency. Database connections sized to database work, not waiting jobs. You set <code>fibers: N</code> and keep building.</p>

<hr />

<p>The PR is <a href="https://github.com/rails/solid_queue/pull/728">up on GitHub</a>. The <a href="https://github.com/crmne/solid_queue_bench">benchmark suite</a> is open source. Run your own numbers, or challenge mine.</p>

]]>
</content:encoded>

</item>



<item>

<title>Your Agent&apos;s Context Window Is Not a Junk Drawer</title>

<link>https://paolino.me/your-agents-context-window-is-not-a-junk-drawer/</link>

<guid isPermaLink="true">
https://paolino.me/your-agents-context-window-is-not-a-junk-drawer/
</guid>

<pubDate>Tue, 07 Apr 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[
Your agent’s context window is the most precious resource it has. The more you stuff into it, the worse your agent performs.

]]>
</description>



<content:encoded>
<![CDATA[

<p>Your agent’s context window is the most precious resource it has. The more you stuff into it, the worse your agent performs.</p>

<p>Researchers call it <a href="https://research.trychroma.com/context-rot">context rot</a>: the more tokens in the window, the harder it becomes for the model to follow instructions, retrieve information, and stay on task. Chroma tested 18 frontier models and found that accuracy drops up to 30% when you go from a focused 300-token input to 113k tokens of conversation history, with the task held constant. The model essentially became <em>dumber</em>.</p>

<p>This holds true regardless of how big the window is, yet most agent setups treat the context window like a junk drawer.</p>

<p>“Just toss it in there, the LLM will figure it out!”</p>

<h2 id="mcp-the-biggest-offender">MCP: the biggest offender</h2>

<p>Don’t get me wrong. MCP is a fine idea. You need to talk to a service? Grab an MCP server, plug it in, and you’re running in ten minutes. For prototyping, for exploration, for answering “is this even worth building?”, it’s great.</p>

<p>The problem is what happens next. Which is: nothing.</p>

<p>People leave the MCP servers plugged in. They add more. Every MCP server you connect dumps tool descriptions, schemas, and instructions into your context. You didn’t write those. You didn’t optimize them. You probably haven’t even read them. You’re handing over a chunk of your context window to whatever some third party decided to shove in there.</p>

<p>Say you need a tool that checks the weather. You could plug in an MCP server and get dozens of tool descriptions, parameter schemas, and whatever instructions its author decided to write. Or you could write this:</p>

<pre><code class="language-ruby">class Weather &lt; RubyLLM::Tool
  description "Gets current weather for a location"

  param :latitude, desc: "Latitude (e.g., 52.5200)"
  param :longitude, desc: "Longitude (e.g., 13.4050)"

  def execute(latitude:, longitude:)
    url = "https://api.open-meteo.com/v1/forecast?latitude=#{latitude}&amp;longitude=#{longitude}&amp;current=temperature_2m,wind_speed_10m"
    Faraday.get(url).body
  rescue =&gt; e
    { error: e.message }
  end
end
</code></pre>

<p>Twelve lines of <a href="https://rubyllm.com">RubyLLM</a>. You wrote the description, so you know exactly what tokens are going into your context. You wrote the parameters, so the model gets precisely the interface it needs, no more. You own it, you can tune it, and nobody can inject anything into your agent’s brain through it.</p>

<p>Use MCP to prototype. Then replace it with crafted tools you actually control.</p>

<h2 id="tool-responses-are-context-too">Tool responses are context too</h2>

<p>Your RAG retrieves ten full documents when the model needs a paragraph. Your API call returns a massive JSON blob when the model needs two fields. You’re paying for every one of those tokens with your agent’s IQ.</p>

<p>The fix is progressive disclosure. At <a href="https://chatwithwork.com">Chat with Work</a>, when the agent searches your Google Drive, we don’t dump entire files into context. The search tool returns only some metadata and a single line from the file, the line that matched the search keywords. Fifty results, fifty lines. The AI reads those, decides which files actually matter, and only then reads them. If a file is too large, it reads it in chunks. At every step, the model is only looking at what it needs.</p>

<p>The same principle applies to any tool. Don’t return everything. Return enough for the model to decide what to look at next.</p>

<h2 id="your-instructions-are-context-too">Your instructions are context too</h2>

<p>Then there’s the stuff you wrote yourself. Your system prompt is context. Your tool descriptions are context. Your parameter schemas are context. Every edge case, every guardrail, every overly detailed description competes for attention. You think you’re being thorough. You’re actually drowning the instructions that matter in a sea of instructions that don’t. A focused system prompt will outperform an exhaustive one every time.</p>

<h2 id="tool-count-is-context-too">Tool count is context too</h2>

<p>You hand-crafted 40 beautiful tools. Your agent needs 5 for this task. The other 35 sit in context doing nothing except making the model slower at picking the right one.</p>

<p>Don’t register every tool your agent might ever need. Load the tools the current task actually requires. If you’re building a support agent that handles billing and technical issues, don’t give it all of both. Route billing questions to a billing agent and technical questions to a technical agent. Two focused agents will outperform one bloated one.</p>

<h2 id="every-token-should-earn-its-place">Every token should earn its place</h2>

<p>The context window is not a junk drawer. It’s a workbench. Everything on it should be there for a reason, and you should be able to say what that reason is.</p>

<p>So before you plug in another MCP server, add another RAG source, or write another paragraph in your system prompt, ask yourself one question: is this worth making my agent dumber?</p>

]]>
</content:encoded>

</item>



<item>

<title>I Built a Monitor Configuration Tool for Hyprland</title>

<link>https://paolino.me/hyprmoncfg-monitor-configuration-for-hyprland/</link>

<guid isPermaLink="true">
https://paolino.me/hyprmoncfg-monitor-configuration-for-hyprland/
</guid>

<pubDate>Tue, 31 Mar 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[
Configuring monitors in Hyprland means writing monitor= lines by hand. A 4K display at 1.33x scale is effectively 2880x1620 pixels, so the monitor next to it needs to start at x=2880. Vertically centering a 1080p panel against it means doing division in your head to get the y-...
]]>
</description>



<content:encoded>
<![CDATA[

<p>Configuring monitors in Hyprland means writing <code>monitor=</code> lines by hand. A 4K display at 1.33x scale is effectively 2880x1620 pixels, so the monitor next to it needs to start at x=2880. Vertically centering a 1080p panel against it means doing division in your head to get the y-offset right. You reload, you’re off by 40 pixels, you edit, you reload again. There’s no visual feedback until after you’ve committed to a config.</p>

<p>Then it gets worse. You unplug your laptop, go to a conference, plug into a projector, and you’re back to editing config files backstage before your talk. You come home, dock the laptop, and the layout is wrong again.</p>

<p>I looked at what was available. The closest to what I wanted was <a href="https://github.com/ToRvaLDz/monique">Monique</a>: spatial editor, profiles, workspace management, a hotplug daemon. It does exactly what I need. But it’s a GTK4 GUI that pulls in Python and a stack of dependencies, and the daemon was broken when I tried it. The other tools each cover parts of this: <a href="https://sr.ht/~emersion/kanshi/">kanshi</a> does profiles and auto-switching but has no editor, you write config files; <a href="https://github.com/nwg-piotr/nwg-displays">nwg-displays</a> and <a href="https://github.com/erans/hyprmon">HyprMon</a> have spatial editors but no daemon; <a href="https://github.com/fiffeek/hyprdynamicmonitors">HyprDynamicMonitors</a> has a daemon but no real layout tool, and it pulls in UPower and D-Bus.</p>

<p>I wanted Monique’s feature set without the dependency baggage, in something that works over SSH when your monitors are broken. So I built <a href="https://hyprmoncfg.dev">hyprmoncfg</a>.</p>

<h2 id="a-real-spatial-editor-in-your-terminal">A real spatial editor, in your terminal</h2>

<p>The TUI is the thing I’m most proud of. It’s not a config editor with a preview pane. It’s a full spatial layout tool.</p>

<p>The left side is a canvas where your monitors are drawn as rectangles, proportional to their resolution. You click one to select it, drag it to move it. Monitors snap to each other’s edges as you position them, just like arranging windows in a GUI display manager. Arrow keys give you fine control: 100px per step, Shift for 10px, Ctrl for 1px.</p>

<p>The right side is a per-monitor inspector. Pick a resolution and refresh rate from a scrollable list. Set scale, position, transform, VRR, mirroring. All inline, no dialogs within dialogs. A third tab handles workspace planning.</p>

<p>And because it’s a TUI: it works over SSH. When your monitor configuration is broken and you can’t see anything, you can SSH into the machine and fix it. Try that with a GTK app.</p>

<h2 id="safe-apply-with-automatic-revert">Safe apply with automatic revert</h2>

<p>Every apply, whether from the TUI or the daemon, follows the same path: write <code>monitors.conf</code> atomically (temp file + rename, no corruption), reload Hyprland, re-read the actual monitor state, and verify the result matches what was requested.</p>

<p>Then it gives you 10 seconds to confirm. If you don’t, maybe because the layout left you staring at a black screen, it reverts automatically. No stuck monitors. No reaching for a second machine to undo the damage.</p>

<p>This is the same apply engine everywhere. The TUI and the daemon share identical code. If it works when you test it interactively, it works when the daemon fires at 2am because you bumped your dock cable.</p>

<h2 id="workspace-planning">Workspace planning</h2>

<p>Monitor configuration and workspace assignment are the same problem. If you’re rearranging monitors, you probably want workspaces to follow. hyprmoncfg has a workspace planner built into its third tab, with three strategies:</p>

<ul>
  <li><strong>Sequential</strong>: Groups in chunks. Workspaces 1-3 on monitor A, 4-6 on monitor B.</li>
  <li><strong>Interleave</strong>: Round-robins. 1→A, 2→B, 3→A, 4→B.</li>
  <li><strong>Manual</strong>: Explicit per-workspace rules when you want full control.</li>
</ul>

<p>Workspace assignments are stored inside each profile and applied together with the layout. Switch profiles, switch workspace distribution. One operation.</p>

<h2 id="source-chain-verification">Source-chain verification</h2>

<p>Here’s something no other tool does. Before writing anything, hyprmoncfg parses your <code>hyprland.conf</code> and verifies it actually sources the target <code>monitors.conf</code>. If it doesn’t, it refuses to write.</p>

<p>Other tools skip this check. They silently update a file that Hyprland never reads. You spend twenty minutes debugging why nothing changed, only to realize the file was never sourced. I lost an evening to this once. Never again.</p>

<h2 id="dotfiles-integration">Dotfiles integration</h2>

<p>Profiles are stored as JSON files in <code>~/.config/hyprmoncfg/profiles/</code>, one per profile. The generated <code>monitors.conf</code> is a build artifact, you don’t commit it. You commit the profiles.</p>

<pre><code class="language-sh">chezmoi add ~/.config/hyprmoncfg
</code></pre>

<p>Save a “desk” profile at home with your ultrawide. Save “conference-1080p” at one venue. Save “conference-4k” at another. Sync them across machines via your <a href="https://github.com/crmne/dotfiles">dotfiles</a>. The daemon matches profiles to connected hardware automatically. Arrive somewhere, plug in, and the right layout applies.</p>

<p>This is portable. The same profile library works across machines because matching is based on the monitors you have, not on the machine you’re at.</p>

<h2 id="one-runtime-dependency-hyprland">One runtime dependency: Hyprland</h2>

<p>Two compiled Go binaries. No Python, no GTK, no GObject introspection, no D-Bus, no UPower. Install them and you’re done. The only runtime requirement is Hyprland itself.</p>

<h2 id="how-it-compares">How it compares</h2>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>hyprmoncfg</th>
      <th>Monique</th>
      <th>HyprDynamicMonitors</th>
      <th>HyprMon</th>
      <th>nwg-displays</th>
      <th>kanshi</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>GUI or TUI</td>
      <td>TUI</td>
      <td>GUI</td>
      <td>TUI</td>
      <td>TUI</td>
      <td>GUI</td>
      <td>CLI</td>
    </tr>
    <tr>
      <td>Spatial layout editor</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>Partial</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Drag-and-drop</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>No</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Snapping</td>
      <td>Yes</td>
      <td>Not documented</td>
      <td>No</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Profiles</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>No</td>
      <td>Yes</td>
    </tr>
    <tr>
      <td>Auto-switching daemon</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>No (roadmap)</td>
      <td>No</td>
      <td>Yes</td>
    </tr>
    <tr>
      <td>Workspace planning</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>No</td>
      <td>No</td>
      <td>Basic</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Mirror support</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Safe apply with revert</td>
      <td>Yes</td>
      <td>Yes</td>
      <td>No</td>
      <td>Partial (manual rollback)</td>
      <td>No</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Source-chain verification</td>
      <td>Yes</td>
      <td>No</td>
      <td>No</td>
      <td>No</td>
      <td>No</td>
      <td>No</td>
    </tr>
    <tr>
      <td>Additional runtime dependencies</td>
      <td>None</td>
      <td>Python + GTK4 + libadwaita</td>
      <td>UPower, D-Bus</td>
      <td>None</td>
      <td>Python + GTK3</td>
      <td>None</td>
    </tr>
  </tbody>
</table>

<h2 id="try-it">Try it</h2>

<p>On Arch:</p>

<pre><code class="language-sh">yay -S hyprmoncfg
</code></pre>

<p>Or build from source:</p>

<pre><code class="language-sh">go install github.com/crmne/hyprmoncfg/cmd/hyprmoncfg@latest
go install github.com/crmne/hyprmoncfg/cmd/hyprmoncfgd@latest
</code></pre>

<p>Check out the <a href="https://hyprmoncfg.dev/">documentation</a> for the full guide, or browse the <a href="https://github.com/crmne/hyprmoncfg">source on GitHub</a>.</p>

]]>
</content:encoded>

</item>



<item>

<title>Comb Shaped Slices</title>

<link>https://paolino.me/comb-shaped-slices/</link>

<guid isPermaLink="true">
https://paolino.me/comb-shaped-slices/
</guid>

<pubDate>Tue, 24 Mar 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[
A friend who’s built and shut down companies in this space sat across from me at breakfast during a conference recently. He knows what I’m building: Chat with Work, an AI tool that lets you talk to your actual work data. He wanted to know what my plan was. I think he was a bit...
]]>
</description>



<content:encoded>
<![CDATA[

<p>A friend who’s built and shut down companies in this space sat across from me at breakfast during a conference recently. He knows what I’m building: <a href="https://chatwithwork.com">Chat with Work</a>, an AI tool that lets you talk to your actual work data. He wanted to know what my plan was. I think he was a bit concerned.</p>

<p>“Add more integrations, finish the security assessment, market it well.” I said.</p>

<p>That didn’t help. “All those LLM providers are going to eat the whole market. They’ll ship every integration you can think of. If you want a slice of the pie, you need to pick a vertical and own it.”</p>

<p>I told him I was going to grab a T shaped slice of the pie instead.</p>

<p>He looked at me like I’d lost it.</p>

<hr />

<p>Here’s the thing about the “pick a vertical” advice: it’s not wrong. It’s just not the only way. And for a lot of small software companies, it’s a trap dressed up as strategy.</p>

<p>The conventional wisdom goes like this: the market is huge, the big players are coming, so you’d better find your little corner and defend it. Specialize. Go deep. Become the AI assistant for dentists in Luxembourg or the knowledge tool for corporate lawyers in Berlin-Brandenburg. Calculate your total addressable market. Build a defensible moat. Make investors happy.</p>

<p>But what if you don’t care about making investors happy? Most companies don’t need investors. What if you just want to build something good?</p>

<h2 id="the-comb">The comb</h2>

<p>I said T shaped in the moment. One horizontal, one vertical. But the more I thought about it, the more teeth it grew. Less like a T, more like a comb.</p>

<p>Here’s why. When you’re OpenAI or Google, you sample from the top of the distribution. You build what most people use first, then work your way down. The result is always the same: a broad horizontal platform that serves everyone and surprises no one.</p>

<p>When you’re small, you sample from what’s right in front of you. You build for yourself because no amount of user research, design thinking, or theory of mind will ever match the depth of actually needing the thing you’re making. You understand your own problems in a way that connects to your emotions, your workflow, your instincts. You can’t fake that. You can’t interview your way to it. I chose fast onboarding over full sync, because I don’t want to wait to start working. Nextcloud, Todoist, IMAP, and CalDAV: that’s my stack, so that’s where I’ll go deep next.</p>

<p>Then you listen to your customers. “This is cool, but I use Slack.” So you build that too. A team needs to own their data, so you add on-premises installation. Someone uses Basecamp, and you build that integration because the people behind it think like you. One tooth at a time.</p>

<p>The shape that emerges is yours. Not because you planned it on a whiteboard, but because you started from yourself and grew outward. It works for the small teams, the freelancers, the music collectives, the people who don’t have an IT department and don’t want one. That’s the comb: not a strategy you choose, but what naturally happens when you’re small and you give a damn.</p>

<p>There’s a reason people still choose Linear over Jira, or Proton over Gmail, or Plausible over Google Analytics. It’s not because the small player has more features. It’s because someone built it for themselves first, and that resonated. The entire market doesn’t need to resonate with you. Just enough of it.</p>

<p>So yes, the big players are coming. They’re going to ship a lot of integrations. They’re going to spend a lot of money. And they’re going to build software that feels like it was built by a company that spends a lot of money.</p>

<p>I’ll be over here, grabbing my comb shaped slice of pie. It’s <a href="https://plenty.is">Plenty</a>.</p>

<p><em>Today also happens to be the day I officially founded <a href="https://plenty.is">Plenty</a>. The papers are signed. The comb is real!</em></p>

]]>
</content:encoded>

</item>



<item>

<title>Ruby Deserves Beautiful Documentation</title>

<link>https://paolino.me/ruby-deserves-beautiful-documentation/</link>

<guid isPermaLink="true">
https://paolino.me/ruby-deserves-beautiful-documentation/
</guid>

<pubDate>Thu, 19 Mar 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[
Have you ever looked at a VitePress documentation site and felt a little jealous?

]]>
</description>



<content:encoded>
<![CDATA[

<p>Have you ever looked at a VitePress documentation site and felt a little jealous?</p>

<p>The sidebar navigation. The “On this page” outline on the right. The search that pops up with <code>/</code>. The homepage that actually looks like a product page, not a README with a nav bar. Dark mode that just works. Code blocks with copy buttons and language labels. It all looks like someone sat down and designed the whole experience.</p>

<p>Because someone did. VitePress is genuinely great. And Ruby developers know it, because some of the most visible projects in our community are shipping their docs on VitePress. Not on a Jekyll theme, not on a Ruby tool. On a JavaScript static site generator built for Vue.</p>

<p>I don’t blame them. I looked at what we had in the Jekyll ecosystem and understood immediately. The best option is Just the Docs, and I’ve been using it for <a href="https://rubyllm.com">RubyLLM</a>. It’s solid. But I had to patch in proper dark mode support that follows the browser setting. I had to add a copy-page button. The homepage layout is narrow and document-y. It works. It doesn’t wow.</p>

<p>So I built <a href="https://jekyll-vitepress.dev">Jekyll VitePress Theme</a>.</p>

<h2 id="what-it-is">What It Is</h2>

<p>A Jekyll theme gem that recreates the VitePress documentation experience. Everything you’d expect:</p>

<ul>
  <li>Top nav with mobile menu</li>
  <li>Left sidebar, right “On this page” outline</li>
  <li>Homepage layout with hero section and feature cards</li>
  <li>Built-in local search (press <code>/</code> or <code>Cmd+K</code>)</li>
  <li>Dark/light/auto appearance toggle</li>
  <li>Code blocks with copy buttons, language labels, and file title bars</li>
  <li>Doc footer with edit link, previous/next pager, and “last updated”</li>
  <li>GitHub star widget</li>
  <li>Rouge syntax highlighting with separate light and dark themes</li>
</ul>

<p>All configured through <code>_config.yml</code> and <code>_data/*.yml</code> files. No JavaScript toolchain. No Node.js. Just Jekyll.</p>

<h2 id="getting-started">Getting Started</h2>

<div data-title="Gemfile" class="language-ruby highlighter-rouge"><div class="highlight"><pre><code>gem "jekyll-vitepress-theme"
</code></pre></div></div>

<div data-title="_config.yml" class="language-yaml highlighter-rouge"><div class="highlight"><pre><code>theme: jekyll-vitepress-theme
plugins:
  - jekyll-vitepress-theme

jekyll_vitepress:
  branding:
    site_title: My Project
</code></pre></div></div>

<pre><code class="language-sh">bundle install
bundle exec jekyll serve --livereload
</code></pre>

<p>That’s it. Your docs site now looks like VitePress. Customize the nav, sidebar, colors, fonts, and everything else from the <a href="https://jekyll-vitepress.dev/configuration-reference/">configuration reference</a>.</p>

<h2 id="why-this-matters">Why This Matters</h2>

<p>When I came back to Ruby in 2024, I kept finding things that could be better. There wasn’t a great LLM library, so I built <a href="https://rubyllm.com">RubyLLM</a>. Async deserved more attention, so I <a href="/async-ruby-is-the-future">blogged about it</a>. And our documentation sites? They didn’t look the part.</p>

<p>In open source, looks matter. A beautiful docs site tells potential users: this project is serious, maintained, and worth your time. It lowers the barrier to adoption. It makes people want to try your library.</p>

<p>VitePress understood this. Now Jekyll has it too.</p>

<pre><code class="language-ruby">gem "jekyll-vitepress-theme", "~&gt; 1.0"
</code></pre>

]]>
</content:encoded>

</item>



<item>

<title>RubyLLM 1.14: From Zero to AI Chat App in Under Two Minutes</title>

<link>https://paolino.me/rubyllm-1-14-chat-ui/</link>

<guid isPermaLink="true">
https://paolino.me/rubyllm-1-14-chat-ui/
</guid>

<pubDate>Wed, 18 Mar 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[
RubyLLM 1.14 ships a full chat UI generator. Two commands and you have a working AI chat app with Turbo streaming, model selection, and tool call display, in under two minutes. The demo above shows the whole thing: new Rails app to working chat in 1:46, including trying it out.

]]>
</description>



<content:encoded>
<![CDATA[

<p>RubyLLM 1.14 ships a full chat UI generator. Two commands and you have a working AI chat app with Turbo streaming, model selection, and tool call display, in under two minutes. The demo above shows the whole thing: new Rails app to working chat in 1:46, including trying it out.</p>

<h2 id="why-this-matters">Why This Matters</h2>

<p>RubyLLM turned one last week. <a href="/rubyllm-1-0/">1.0 shipped on March 11, 2025</a> with Rails integration from day one: ActiveRecord models, <code>acts_as_chat</code>, Turbo streaming, persistence out of the box. <a href="/rubyllm-1.4-1.5.1/">1.4</a> added the install generator. <a href="https://github.com/crmne/ruby_llm/releases/tag/1.7.0">1.7</a> brought the first scaffold chat UI with Turbo Streams. <a href="/rubyllm-1-12-agents/">1.12</a> introduced agents with prompt conventions. Each release got closer to the same thing: AI that works the way Rails works.</p>

<p>1.14 fully realizes that goal. A beautiful Tailwind chat UI (with automatic fallback to scaffold if you’re not using Tailwind). Generators for agents and tools. Conventional directories for everything. All of it extracted from <a href="https://chatwithwork.com">Chat with Work</a>, where it’s been running in production for months.</p>

<h2 id="what-you-get">What You Get</h2>

<p>Two generators. That’s it.</p>

<pre><code class="language-sh">bin/rails generate ruby_llm:install
bin/rails generate ruby_llm:chat_ui
</code></pre>

<p>Your app now has this structure:</p>

<pre><code class="language-plaintext">app/
├── agents/
├── controllers/
│   ├── chats_controller.rb
│   └── messages_controller.rb
├── helpers/
│   └── messages_helper.rb
├── jobs/
│   └── chat_response_job.rb
├── models/
│   ├── chat.rb
│   ├── message.rb
│   ├── model.rb
│   └── tool_call.rb
├── prompts/
├── schemas/
├── tools/
└── views/
    ├── chats/
    │   ├── index.html.erb
    │   ├── show.html.erb
    │   └── _chat.html.erb
    └── messages/
        ├── _assistant.html.erb
        ├── _user.html.erb
        ├── _tool.html.erb
        ├── _error.html.erb
        ├── create.turbo_stream.erb
        ├── tool_calls/
        │   └── _default.html.erb
        └── tool_results/
            └── _default.html.erb
</code></pre>

<p>Separate partials for each message role. Turbo Stream templates for real-time updates via <code>broadcasts_to</code>. A background job that handles the AI response. Tool calls and tool results each get their own rendering pipeline. A complete Tailwind chat interface, not a scaffold you need to fight with.</p>

<h2 id="full-tutorial-new-app-from-scratch">Full Tutorial: New App from Scratch</h2>

<p>If you want to start from zero, this is what the demo shows. The whole thing takes just a minute.</p>

<pre><code class="language-sh">rails new chat_app --css tailwind
cd chat_app
bundle add ruby_llm
bin/rails generate ruby_llm:install
bin/rails generate ruby_llm:chat_ui
bin/rails db:migrate
bin/rails ruby_llm:load_models
bin/dev
</code></pre>

<p>That’s a new Rails app with Tailwind, RubyLLM installed, the chat UI generated, the database set up, models loaded, and the server running. Open <code>localhost:3000/chats</code> and start talking to an AI.</p>

<h2 id="generators-for-agents-tools-and-schemas">Generators for Agents, Tools, and Schemas</h2>

<p>Now the fun part. You scaffold agents, tools, and schemas the same way you’d scaffold anything else in Rails:</p>

<pre><code class="language-bash">bin/rails generate ruby_llm:agent SupportAgent
</code></pre>

<pre><code class="language-plaintext">app/
├── agents/
│   └── support_agent.rb
└── prompts/
    └── support_agent/
        └── instructions.txt.erb
</code></pre>

<p>The agent class comes with the <a href="/rubyllm-1-12-agents/">1.12 DSL</a> ready to go. The instructions file is an ERB template for your system prompt, so you can version it, review it in PRs, and template it with runtime context.</p>

<pre><code class="language-bash">bin/rails generate ruby_llm:tool WeatherTool
</code></pre>

<pre><code class="language-plaintext">app/
├── tools/
│   └── weather_tool.rb
└── views/
    └── messages/
        ├── tool_calls/
        │   └── _weather.html.erb
        └── tool_results/
            └── _weather.html.erb
</code></pre>

<p>Each tool gets its own partials for rendering calls and results. Show a weather widget for the weather tool, a search results list for a search tool, all through Rails partials.</p>

<pre><code class="language-bash">bin/rails generate ruby_llm:schema Product
</code></pre>

<pre><code class="language-plaintext">app/
└── schemas/
    └── product_schema.rb
</code></pre>

<p>This creates a schema for structured output validation.</p>

<p>More on all of this in the <a href="https://rubyllm.com/rails/">Rails integration docs</a>, and the dedicated guides for <a href="https://rubyllm.com/agents/">agents</a> and <a href="https://rubyllm.com/tools/">tools</a>.</p>

<h2 id="self-registering-provider-config">Self-Registering Provider Config</h2>

<p>For people building provider gems: providers now register their own configuration options instead of patching a monolithic <code>Configuration</code> class.</p>

<pre><code class="language-ruby">class DeepSeek &lt; RubyLLM::Provider
  class &lt;&lt; self
    def configuration_options
      %i[deepseek_api_key deepseek_api_base]
    end
  end
end
</code></pre>

<p>When the provider is registered, its options become <code>attr_accessor</code>s on <code>RubyLLM::Configuration</code> automatically. Third-party gems can add their config keys without touching the core.</p>

<h2 id="bug-fixes">Bug Fixes</h2>

<ul>
  <li><strong>Faraday logging memory bloat</strong>: logging no longer serializes large payloads (like base64-encoded PDFs) when the log level is above DEBUG.</li>
  <li><strong>Agent <code>assume_model_exists</code> propagation</strong>: setting this on the agent class now actually works.</li>
  <li><strong>Renamed model associations</strong>: foreign key references with <code>acts_as</code> helpers are fixed.</li>
  <li><strong>MySQL/MariaDB compatibility</strong>: JSON column defaults work correctly now.</li>
  <li><strong>Error.new with string argument</strong>: no longer raises a <code>NoMethodError</code>.</li>
</ul>

<p>Full list in the <a href="https://github.com/crmne/ruby_llm/releases/tag/1.14.0">release notes</a>.</p>

<pre><code class="language-ruby">gem 'ruby_llm', '~&gt; 1.14'
</code></pre>

]]>
</content:encoded>

</item>



<item>

<title>Ruby Is the Best Language for Building AI Apps</title>

<link>https://paolino.me/ruby-is-the-best-language-for-ai-apps/</link>

<guid isPermaLink="true">
https://paolino.me/ruby-is-the-best-language-for-ai-apps/
</guid>

<pubDate>Fri, 20 Feb 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[

  If your goal is to ship AI applications in 2026, Ruby is the best language to do it.


]]>
</description>



<content:encoded>
<![CDATA[

<blockquote>
  <p>If your goal is to ship AI applications in 2026, Ruby is the best language to do it.</p>
</blockquote>

<h2 id="the-ai-training-ecosystem-is-irrelevant">The AI Training Ecosystem Is Irrelevant</h2>

<p>Python owns model training. PyTorch, TensorFlow, the entire notebooks-and-papers gravity well. Nobody disputes that.</p>

<p>But you’re not training LLMs. Almost nobody is. Each training run costs millions of dollars. The dataset is the internet!</p>

<p>This is what AI development today looks like:</p>

<pre><code class="language-bash">curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"model": "gpt-5.2", "messages": [{"role": "user", "content": "Hello"}]}'
</code></pre>

<p>That’s it. An HTTP call.</p>

<p>The entire Python ML stack is <em>irrelevant</em> to achieve this. What matters is everything around it: streaming responses to users, persisting conversations, tracking costs, switching providers when pricing changes.</p>

<p>That’s web application engineering. That’s where Ruby and Rails shine like no other.</p>

<h2 id="you-need-a-complex-agent-framework-or-youre-not-doing-real-ai">“You Need a Complex Agent Framework or You’re Not Doing Real AI”</h2>

<p>Bullshit.</p>

<p>You need a beautiful, truly provider-independent API. Let me show you.</p>

<h2 id="python-vs-javascript-vs-ruby-llm-libraries">Python vs JavaScript vs Ruby LLM Libraries</h2>

<h3 id="simple-chat">Simple chat</h3>

<p><strong>Python (LangChain):</strong></p>

<pre><code class="language-python">from langchain.chat_models import init_chat_model
from langchain.messages import HumanMessage

model = init_chat_model("gpt-5.2", model_provider="openai")
response = model.invoke([HumanMessage("Hello!")])
</code></pre>

<p>You need to specify the provider, create an array of messages that need to be instantiated, etc.</p>

<p>That’s ceremony.</p>

<p><strong>JavaScript (AI SDK):</strong></p>

<pre><code class="language-javascript">import { generateText } from 'ai';
import { openai } from '@ai-sdk/openai';

const { text } = await generateText({
  model: openai('gpt-5.2'),
  prompt: 'Hello!',
});
</code></pre>

<p>What if you want to use a model from another provider?</p>

<p><strong>Ruby (<a href="https://rubyllm.com">RubyLLM</a>):</strong></p>

<pre><code class="language-ruby">require 'ruby_llm'

RubyLLM.chat.ask "Hello!"
</code></pre>

<p>Reads like it should.</p>

<h3 id="token-usage-tracking">Token usage tracking</h3>

<p>If you’re running AI in production, you need to track token usage. This is how you price your app.</p>

<p><strong>LangChain (GPT):</strong></p>

<pre><code class="language-python">response = model.invoke([HumanMessage("Hello!")])
response.response_metadata['token_usage']
# {'completion_tokens': 12, 'prompt_tokens': 8, 'total_tokens': 20}
</code></pre>

<p><strong>LangChain (Claude):</strong></p>

<pre><code class="language-python">response.response_metadata['usage']
# {'input_tokens': 8, 'output_tokens': 12}
</code></pre>

<p>Different key and different structure!</p>

<p><strong>LangChain (Gemini):</strong></p>

<pre><code class="language-python">response.response_metadata
# ...nothing...
</code></pre>

<p>It’s not even there!</p>

<p><a href="https://rubyllm.com">RubyLLM</a>:</p>

<pre><code class="language-ruby">response.tokens.input   # =&gt; 8
response.tokens.output  # =&gt; 12
</code></pre>

<p>Same interface. Every provider. Every model.</p>

<h3 id="agents">Agents</h3>

<p>This is where it gets fun.</p>

<p><strong>Python (LangChain):</strong></p>

<pre><code class="language-python">from langchain_openai import ChatOpenAI
from langchain.agents import create_agent

model = ChatOpenAI(model="gpt-5-nano")

graph = create_agent(
    model=model,
    tools=[search_docs, lookup_account],
    system_prompt="You are a concise support assistant",
)

inputs = {"messages": [{"role": "user", "content": "How do I reset my API key?"}]}

for chunk in graph.stream(inputs, stream_mode="updates"):
    print(chunk)
</code></pre>

<p><strong>JavaScript (AI SDK 6):</strong></p>

<pre><code class="language-javascript">import { ToolLoopAgent } from 'ai';
import { openai } from '@ai-sdk/openai';

const supportAgent = new ToolLoopAgent({
  model: openai('gpt-5-nano'),
  system: 'You are a concise support assistant.',
  tools: { searchDocs, lookupAccount },
});

const { text } = await supportAgent.generateText({
  messages: [{ role: 'user', content: 'How do I reset my API key?' }],
});
</code></pre>

<p><strong>Ruby (<a href="https://rubyllm.com">RubyLLM</a>):</strong></p>

<pre><code class="language-ruby">require 'ruby_llm'

class SupportAgent &lt; RubyLLM::Agent
  model "gpt-5-nano"
  instructions "You are a concise support assistant."
  tools SearchDocs, LookupAccount
end

SupportAgent.new.ask "How do I reset my API key?"
</code></pre>

<p>Pure joy.</p>

<h2 id="its-about-cognitive-overhead">It’s About Cognitive Overhead</h2>

<p>This isn’t just about aesthetics.</p>

<p>It’s about <em>cognitive overhead</em>: how many abstractions, how many provider-specific details, how many different data structures you need to hold in your head instead of focusing on what really matters: prompts and tool design.</p>

<p>Low cognitive overhead compounds: faster onboarding, fewer accidental bugs, easier refactors, and cleaner debugging when production explodes at 2AM.</p>

<p>Ruby’s advantage here is cultural: elegant APIs are treated as first-class engineering work, not icing on the cake.</p>

<h2 id="rails-gives-you-the-rest-of-the-product-for-free">Rails Gives You the Rest of the Product for Free</h2>

<p>Model calls are only a small chunk of your code. The rest makes up the bulk of it: auth, billing, background jobs, streaming UI, persistence, admin screens, observability, even <a href="https://native.hotwired.dev/">native apps</a>.</p>

<p>Rails gives you a beautiful, coherent answer for all of it.</p>

<p>With <a href="https://rubyllm.com">RubyLLM</a> + Rails, the core streaming loop is tiny:</p>

<pre><code class="language-ruby">class ChatResponseJob &lt; ApplicationJob
  def perform(chat_id, content)
    chat = Chat.find(chat_id)

    chat.ask(content) do |chunk|
      message = chat.messages.last
      message.broadcast_append_chunk(chunk.content) if chunk.content.present?
    end
  end
end
</code></pre>

<p>And on the model side:</p>

<pre><code class="language-ruby">class Chat &lt; ApplicationRecord
  acts_as_chat
end

class Message &lt; ApplicationRecord
  acts_as_message
  has_many_attached :attachments
end
</code></pre>

<p>This gives you streaming chunks to your web app and persistence in your DB in absurdly few lines of code.</p>

<h2 id="it-scales">It Scales</h2>

<p>“Ruby can’t handle AI scale.”</p>

<p>Wrong.</p>

<p>LLM workloads are mostly network-bound and streaming-bound. That’s exactly where Ruby’s <a href="https://socketry.github.io/async/">Async</a> ecosystem shines. Fibers let you handle high concurrency without thread explosion and resource waste. No need to plaster the code with <code>async</code>/<code>await</code> keywords. <a href="https://rubyllm.com">RubyLLM</a> became concurrent with 0 code changes.</p>

<p>I wrote a deep dive here: <a href="/async-ruby-is-the-future">Async Ruby is the Future of AI Apps (And It’s Already Here)</a></p>

<h2 id="dont-take-my-word-for-it">Don’t Take My Word for It</h2>

<p>Someone ported <a href="https://rubyllm.com">RubyLLM</a>’s API design to JavaScript as <a href="https://github.com/nicholasgriffintn/node-llm">NodeLLM</a>. Same design. Clean code, good docs.</p>

<p>The JavaScript community’s response: zero upvotes on Reddit. 14 GitHub stars. Top comments: “How’s this different from AI SDK?” and “It’s always fun when you AI bros post stuff. They all look and sound the same. Also, totally unnecessary.”</p>

<p><a href="https://rubyllm.com">RubyLLM</a>: #1 on Hacker News. ~3,600 stars. 5 million downloads. Millions of people using RubyLLM-powered apps today.</p>

<p>Same design. Wildly different reception. That tells you everything about which community is ready for this moment.</p>

<p>And teams that switched from Python are not going back:</p>

<blockquote>
  <p>We had a customer deployment coming up and our Langgraph agent was failing. I rebuilt it using <a href="https://rubyllm.com">RubyLLM</a>. Not only was it far simpler, it performed better than the Langgraph agent.</p>
</blockquote>

<blockquote>
  <p>Our first pass at the AI Agent used langchain… it was so painful that we built it from scratch in Ruby. Like a cloud had lifted. Langchain was that bad.</p>
</blockquote>

<blockquote>
  <p>At Yuma, serving over 100,000 end users, our unified AI interface was awful. <a href="https://rubyllm.com">RubyLLM</a> is so much nicer than all of that.</p>
</blockquote>

<p>These aren’t people who haven’t tried Python. They tried it, shipped it, and replaced it.</p>

<h2 id="go-ship-ai-apps-with-ruby-rails-and-rubyllm">Go Ship AI Apps with Ruby, Rails, and <a href="https://rubyllm.com">RubyLLM</a></h2>

<p>When we freed ourselves from complexity, this community built Twitter, GitHub, Shopify, Basecamp, Airbnb. Rails changed web development forever.</p>

<p>Now we have the chance to change AI app development. Because AI apps are all about the product. And nobody builds products better than Ruby developers.</p>

]]>
</content:encoded>

</item>



<item>

<title>RubyLLM 1.12: Agents Are Just LLMs with Tools</title>

<link>https://paolino.me/rubyllm-1-12-agents/</link>

<guid isPermaLink="true">
https://paolino.me/rubyllm-1-12-agents/
</guid>

<pubDate>Tue, 17 Feb 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[
“Agent” might be the most overloaded word in tech right now. Every startup claims to have one. Every framework promises to help you build them. The discourse has gotten so thick that the actual concept is buried under layers of marketing.

]]>
</description>



<content:encoded>
<![CDATA[

<p>“Agent” might be the most overloaded word in tech right now. Every startup claims to have one. Every framework promises to help you build them. The discourse has gotten so thick that the actual concept is buried under layers of marketing.</p>

<p>So let’s start from first principles.</p>

<h2 id="whats-an-agent">What’s an Agent?</h2>

<p>An agent is an LLM that can call functions.</p>

<p>That’s it. When you give a language model a set of tools it can invoke – a database lookup, an API call, a file operation – and the model decides when and how to use them, you have an agent. The model reasons about the problem, picks the right tool, looks at the result, and continues reasoning. Sometimes it calls several tools in sequence. Sometimes none.</p>

<p>There’s no special “agent mode.” No orchestration engine. No graph of nodes. It’s just a conversation where the model can do things besides talk.</p>

<h2 id="rubyllm-always-had-this">RubyLLM Always Had This</h2>

<p>Tool calling has been a core feature of <a href="https://rubyllm.com">RubyLLM</a> since 1.0:</p>

<pre><code class="language-ruby">class SearchDocs &lt; RubyLLM::Tool
  description "Searches our documentation"
  param :query, desc: "Search query"

  def execute(query:)
    Document.search(query).map(&amp;:title)
  end
end

chat = RubyLLM.chat
chat.with_tool(SearchDocs)
chat.ask "How do I configure webhooks?"
# Model searches docs, reads results, answers the question
</code></pre>

<p>That’s an agent. The model decides to search, interprets the results, and responds. You didn’t need a special class or framework to make this happen.</p>

<p>But there was a problem.</p>

<h2 id="the-reuse-problem">The Reuse Problem</h2>

<p>In a real application, you don’t configure a chat once. You configure it in controllers, background jobs, service objects, API endpoints. The same instructions, the same tools, the same temperature – scattered across your codebase:</p>

<pre><code class="language-ruby"># In the controller
chat = RubyLLM.chat(model: 'gpt-4.1')
chat.with_instructions("You are a support assistant for #{workspace.name}...")
chat.with_tools(SearchDocs, LookupAccount, CreateTicket)
chat.with_temperature(0.2)

# In the background job
chat = RubyLLM.chat(model: 'gpt-4.1')
chat.with_instructions("You are a support assistant for #{workspace.name}...")
chat.with_tools(SearchDocs, LookupAccount, CreateTicket)
chat.with_temperature(0.2)

# In the service object...
# You get the idea
</code></pre>

<p>Every Rubyist’s instinct kicks in: this should be a class.</p>

<h2 id="rubyllm-112-a-dsl-for-agents">RubyLLM 1.12: A DSL for Agents</h2>

<p>That’s exactly what 1.12 adds. Define your agent once, use it everywhere:</p>

<pre><code class="language-ruby">class SupportAgent &lt; RubyLLM::Agent
  model 'gpt-4.1'
  instructions "You are a concise support assistant."
  tools SearchDocs, LookupAccount, CreateTicket
  temperature 0.2
end

# Anywhere in your app
response = SupportAgent.new.ask "How do I reset my API key?"
</code></pre>

<p>Every macro maps to a <code>with_*</code> call you already know. <code>model</code> maps to <code>RubyLLM.chat(model:)</code>. <code>tools</code> maps to <code>with_tools</code>. <code>instructions</code> maps to <code>with_instructions</code>. No new concepts. Just a cleaner way to package what you were already doing.</p>

<h2 id="runtime-context">Runtime Context</h2>

<p>Static configuration is only half the story. Real agents need runtime data – the current user, the workspace, the time of day. Agents support lazy evaluation for this:</p>

<pre><code class="language-ruby">class WorkAssistant &lt; RubyLLM::Agent
  chat_model Chat
  inputs :workspace

  instructions { "You are helping #{workspace.name}" }

  tools do
    [
      TodoTool.new(chat: chat),
      GoogleDriveTool.new(user: chat.user)
    ]
  end
end

chat = WorkAssistant.create!(user: current_user, workspace: @workspace)
chat.ask "What's on my todo list?"
</code></pre>

<p>Blocks and lambdas are evaluated at runtime, with access to the chat object and any declared inputs. Values that depend on runtime context must be lazy – a constraint that Ruby makes trivially natural.</p>

<h2 id="prompt-conventions">Prompt Conventions</h2>

<p>If you’re using Rails, agents follow a convention for prompt management:</p>

<pre><code class="language-ruby">class WorkAssistant &lt; RubyLLM::Agent
  chat_model Chat
  instructions display_name: -&gt; { chat.user.display_name_or_email }
end
</code></pre>

<p>This renders <code>app/prompts/work_assistant/instructions.txt.erb</code> with <code>display_name</code> available as a local. Namespaced agents map naturally: <code>Admin::SupportAgent</code> looks in <code>app/prompts/admin/support_agent/</code>.</p>

<p>Your prompts are ERB templates. Version them in git. Review them in PRs. Treat them like the application code they are.</p>

<h2 id="rails-integration">Rails Integration</h2>

<p>The <code>chat_model</code> macro activates Rails-backed persistence:</p>

<pre><code class="language-ruby">class WorkAssistant &lt; RubyLLM::Agent
  chat_model Chat
  model 'gpt-4.1'
  instructions "You are a helpful assistant."
  tools SearchDocs, LookupAccount
end

# Create a persisted chat with agent config applied
chat = WorkAssistant.create!(user: current_user)

# Load an existing chat, apply runtime config
chat = WorkAssistant.find(params[:id])

# User sends a message, everything persisted automatically
chat.ask(params[:message])
</code></pre>

<p><code>create!</code> persists both the chat and its instructions. <code>find</code> applies configuration at runtime without touching the database. This distinction matters when your prompts evolve faster than your data.</p>

<h2 id="also-in-112">Also in 1.12</h2>

<p>Agents are the headline, but this release also adds:</p>

<ul>
  <li><strong>AWS Bedrock full coverage</strong> via the Converse API – every Bedrock chat model through one interface</li>
  <li><strong>Azure Foundry API</strong> – broad model access across Azure’s ecosystem</li>
  <li><strong>Clearer <code>with_instructions</code> semantics</strong> – explicit append options, guaranteed message ordering</li>
</ul>

<h2 id="already-in-production">Already in Production</h2>

<p>This isn’t a spec or a proposal. The agent DSL powers <a href="https://chatwithwork.com">Chat with Work</a> in production right now. The <code>WorkAssistant</code> examples above aren’t hypothetical – they’re simplified versions of real code handling real conversations.</p>

<p>If you want to see what it feels like, <a href="https://chatwithwork.com">try it out</a>.</p>

<h2 id="the-point">The Point</h2>

<p>The industry is making agents complicated. They’re not. An agent is an LLM with tools. You define the tools in Ruby. You package them in a class. You use the class in your app.</p>

<p>No graphs. No chains. No orchestration frameworks. Just Ruby.</p>

<pre><code class="language-ruby">gem 'ruby_llm', '~&gt; 1.12'
</code></pre>

]]>
</content:encoded>

</item>



<item>

<title>Dictation Is the New Prompt (Voxtype on Omarchy)</title>

<link>https://paolino.me/dictation-is-the-new-prompt/</link>

<guid isPermaLink="true">
https://paolino.me/dictation-is-the-new-prompt/
</guid>

<pubDate>Wed, 07 Jan 2026 00:00:00 +0000</pubDate>

<description>
<![CDATA[
Typing every prompt feels backwards in 2026. You can speak faster than you can type. Hold a hotkey, speak, your OS types it for you. If you care about flow, dictation is the most underrated upgrade you can make.

]]>
</description>



<content:encoded>
<![CDATA[

<p>Typing every prompt feels backwards in 2026. You can speak faster than you can type. Hold a hotkey, speak, your OS types it for you. If you care about flow, dictation is the most underrated upgrade you can make.</p>

<p>In the <a href="https://omarchy.org/">Omarchy</a> world, <a href="https://github.com/goodroot/hyprwhspr">Hyprwhspr</a> is getting a lot of attention after a recent DHH tweet:</p>

<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I had no idea that local model dictation had gotten this good and this fast! I&#39;m blown away by how good hyprwhspr with Omarchy is just using a base model backed by the CPU. Unbelievably accurate. <a href="https://t.co/Jtz3eN84Jf">https://t.co/Jtz3eN84Jf</a></p>&mdash; DHH (@dhh) <a href="https://twitter.com/dhh/status/2007498242561593535?ref_src=twsrc%5Etfw">January 3, 2026</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

</div>

<p>He’s right: local dictation is <em>shockingly</em> good now. The catch is Hyprwhspr uses Python virtual environments, which don’t mix well with <a href="http://mise.jdx.dev/">mise</a>. Fortunately <a href="https://github.com/peteonrails">Pete Jackson</a> <a href="https://github.com/basecamp/omarchy/discussions/3872">saw that and created</a> <a href="https://github.com/peteonrails/voxtype/">Voxtype</a> to solve exactly this issue!</p>

<p>EDIT: five minutes after I posted this, DHH confirmed that Voxtype ships will ship with Omarchy 3.3! 🎉</p>

<div class="jekyll-twitter-plugin"><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Voxtype is shipping with Omarchy 3.3 👍 <a href="https://t.co/Pt1EkgNLoi">https://t.co/Pt1EkgNLoi</a></p>&mdash; DHH (@dhh) <a href="https://twitter.com/dhh/status/2008856834258645389?ref_src=twsrc%5Etfw">January 7, 2026</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

</div>

<h2 id="why-voxtype">Why Voxtype</h2>

<p>Voxtype is built in Rust, so you don’t need Python virtual environments which means it works well with mise. It’s fast, it just works, and when <a href="https://github.com/peteonrails/voxtype/issues/26">I opened an issue asking for an Omarchy theme</a>, <a href="https://github.com/peteonrails/voxtype/releases/tag/v0.4.4">the author shipped it immediately</a>. Now it looks <em>stunning</em> in my setup.</p>

<p>With Vulkan enabled, transcription is almost instant on my Ryzen AI 9 HX370. The video at the top is not sped up. Longer text also transcribes instantly.</p>

<p>If you want to copy my exact configuration, here it is.</p>

<h2 id="install">Install</h2>

<pre><code class="language-bash">sudo pacman -S wtype ydotool wl-clipboard vulkan-icd-loader # last only if you want to use your GPU
sudo yay -S voxtype

voxtype setup --download
voxtype setup gpu # if you want to use your GPU
voxtype setup systemd
</code></pre>

<p>Restart Waybar after the changes:</p>

<pre><code class="language-bash">pkill -SIGUSR2 waybar
</code></pre>

<h2 id="voxtype-config">Voxtype config</h2>

<p><code>~/.config/voxtype/config.toml</code></p>

<pre><code class="language-toml">state_file = "auto"

[hotkey]
enabled = false

[audio]
device = "default"
sample_rate = 16000
max_duration_secs = 600

[audio.feedback]
enabled = true
# Sound theme: "default", "subtle", "mechanical", or path to custom theme directory
theme = "default"
volume = 0.7

[whisper]
model = "base.en"
language = "en"
translate = false
on_demand_loading = true # saves your GPU until it's needed

[output]
mode = "type"
fallback_to_clipboard = true

# Delay between typed characters in milliseconds
# 0 = fastest possible, increase if characters are dropped
type_delay_ms = 1

[output.notification]
on_recording_start = false
on_recording_stop = false
on_transcription = true

[text]
replacements = { "hyperwhisper" = "hyprwhspr" }

[status]
icon_theme = "omarchy"
</code></pre>

<h2 id="waybar-integration">Waybar integration</h2>

<p><code>~/.config/waybar/config.jsonc</code></p>

<pre><code class="language-jsonc">"custom/voxtype": {
  "exec": "voxtype status --follow --format json",
  "return-type": "json",
  "format": "{}",
  "tooltip": true
},
</code></pre>

<p>And add it to <code>modules-right</code>:</p>

<pre><code class="language-jsonc">"modules-right": [
  "group/tray-expander",
  "custom/voxtype",
  "bluetooth",
  "network",
  "pulseaudio",
  "cpu",
  "battery"
]
</code></pre>

<p><code>~/.config/waybar/style.css</code></p>

<pre><code class="language-css">@import "voxtype.css";
@import "../omarchy/current/theme/waybar.css";
</code></pre>

<p><code>~/.config/waybar/voxtype.css</code></p>

<pre><code class="language-css">#custom-voxtype {
  margin: 0 16px 0 0;
  font-size: 12px;
  font-weight: bold;
  border-top: 2px solid transparent;
  border-bottom: 2px solid transparent;
  transition: color 150ms ease-in-out, border-color 150ms ease-in-out;
}

#custom-voxtype.recording {
  color: #ff5555;
  animation: pulse 1s ease-in-out infinite;
}

#custom-voxtype.transcribing {
  color: #ff5555;
}

#custom-voxtype.stopped {
  color: #6272a4;
}

@keyframes pulse {
  0% { opacity: 1; }
  50% { opacity: 0.5; }
  100% { opacity: 1; }
}
</code></pre>

<h2 id="keybinding">Keybinding</h2>

<p>In your Hyprland config:</p>

<pre><code class="language-ini"># Voxtype
bindd = SHIFT, XF86AudioMicMute, Transcribe, exec, voxtype record toggle
</code></pre>

<p>That’s it. Use your voice whenever possible. It’s faster, more natural, and keeps you in flow.</p>

]]>
</content:encoded>

</item>



<item>

<title>Nano Banana with RubyLLM</title>

<link>https://paolino.me/nano-banana-with-rubyllm/</link>

<guid isPermaLink="true">
https://paolino.me/nano-banana-with-rubyllm/
</guid>

<pubDate>Thu, 23 Oct 2025 00:00:00 +0000</pubDate>

<description>
<![CDATA[
Google wired Nano Banana into the chat interface generateContent, not the image API’s predict. Counterintuitive if you’re using RubyLLM, which makes you think in terms of actions like paint instead of chat.

]]>
</description>



<content:encoded>
<![CDATA[

<p>Google wired Nano Banana into the chat interface <code>generateContent</code>, not the image API’s <code>predict</code>. Counterintuitive if you’re using RubyLLM, which makes you think in terms of <em>actions</em> like <a href="https://rubyllm.com/image-generation/"><code>paint</code></a> instead of <a href="https://rubyllm.com/chat/"><code>chat</code></a>.</p>

<p>Once you know that quirk, it’s straightforward. Only caveat: you need the latest trunk or v1.9+, because that’s where we taught RubyLLM to unpack inline file data from chat responses.</p>

<h2 id="wire-it-up">Wire It Up</h2>

<pre><code class="language-ruby">chat = RubyLLM
         .chat(model: "gemini-2.5-flash-image")
         .with_temperature(1.0) # optional, but you like creativity, right?
         .with_params(generationConfig: { responseModalities: ["image"] }) # also optional, if you prefer the model to return only images

response = chat.ask "your prompt", with: ["all.png", "the.jpg", "attachments.png", "you.png", "want.jpg"]

image_io = response.content[:attachments].first.source
</code></pre>

<p>That <code>StringIO</code> holds the generated image. Stream it to S3, attach it to Active Storage, or keep it in memory for a downstream processor.</p>

<p>Want a file?</p>

<pre><code class="language-ruby">response.content[:attachments].first.save "nano-banana.png"
</code></pre>

<p>That’s it. Chat endpoint, one call. Ship the image feature and go enjoy the rest of your day.</p>

]]>
</content:encoded>

</item>



<item>

<title>RubyLLM 1.4-1.5.1: Three Releases in Three Days</title>

<link>https://paolino.me/rubyllm-1.4-1.5.1/</link>

<guid isPermaLink="true">
https://paolino.me/rubyllm-1.4-1.5.1/
</guid>

<pubDate>Fri, 01 Aug 2025 00:00:00 +0000</pubDate>

<description>
<![CDATA[
Three releases in three days. Wednesday, Friday, and Friday again. Each one shipped as soon as it was ready.

]]>
</description>



<content:encoded>
<![CDATA[

<p>Three releases in three days. Wednesday, Friday, and Friday again. Each one shipped as soon as it was ready.</p>

<h2 id="140-the-structured-output-release-wednesday">1.4.0: The Structured Output Release (Wednesday)</h2>

<p>Getting LLMs to return data in the format you need has always been painful.</p>

<p>We all had code like this:</p>

<pre><code class="language-ruby"># The old struggle
response = chat.ask("Return user data as JSON. ONLY JSON. NO MARKDOWN.")
begin
  data = JSON.parse(response.content.gsub(/```json\n?/, '').gsub(/```\n?/, ''))
rescue JSON::ParserError
  # Hope and pray
end
</code></pre>

<p>Now with structured output:</p>

<pre><code class="language-ruby"># Define your schema with the RubyLLM::Schema DSL
class PersonSchema &lt; RubyLLM::Schema
  string :name
  integer :age
  array :skills, of: :string
end

# Get perfectly structured JSON every time
chat = RubyLLM.chat.with_schema(PersonSchema)
response = chat.ask("Generate a Ruby developer profile")

# =&gt; {"name" =&gt; "Yukihiro", "age" =&gt; 59, "skills" =&gt; ["Ruby", "C", "Language Design"]}
</code></pre>

<p>No more regex. No more parsing. Just data structures that work.</p>

<p>Oh, and Daniel Friis released <a href="https://github.com/danielfriis/ruby_llm-schema">RubyLLM::Schema</a> just for the occasion, but you can use any gem you want with RubyLLM, or even write your own JSON schema from scratch.</p>

<h2 id="rails-generators-from-zero-to-chat">Rails Generators: From Zero to Chat</h2>

<p>We didn’t have Rails generators before. Now we do:</p>

<pre><code class="language-bash">rails generate ruby_llm:install
</code></pre>

<p>This creates everything you need:</p>
<ul>
  <li>Migrations</li>
  <li>Models with <code>acts_as_chat</code>, <code>acts_as_message</code>, and <code>acts_as_tool_call</code></li>
  <li>A clean initializer</li>
</ul>

<p>Your Chat model works like any Rails model:</p>

<pre><code class="language-ruby">chat = Chat.create!(model: "gpt-4.1-nano")
response = chat.ask("Explain Ruby blocks")
# Messages are automatically persisted with proper associations
</code></pre>

<p>From <code>rails new</code> to working chat in under 5 minutes.</p>

<h2 id="tool-call-transparency">Tool Call Transparency</h2>

<p>New callback to see what your AI is doing:</p>

<pre><code class="language-ruby">chat.on_tool_call do |tool_call|
  puts "🔧 AI is calling: #{tool_call.name}"
  puts "   Arguments: #{tool_call.arguments}"

  Rails.logger.info "[AI Tool] #{tool_call.name}: #{tool_call.arguments}"
end

chat.ask("What's the weather in Tokyo?").with_tools([weather_tool])
# =&gt; 🔧 AI is calling: get_weather
#    Arguments: {"location": "Tokyo"}
</code></pre>

<p>Essential for debugging and auditing AI behavior.</p>

<h2 id="direct-parameter-provider-access">Direct Parameter Provider Access</h2>

<p>Need that one weird parameter? Use <code>with_params</code>:</p>

<pre><code class="language-ruby"># OpenAI's JSON mode
chat.with_params(response_format: { type: "json_object" })
     .ask("List Ruby features as JSON")
</code></pre>

<p>No waiting for us to wrap every provider option.</p>

<h2 id="critical-bug-fixes-and-other-improvements-in-140">Critical Bug Fixes and Other Improvements in 1.4.0</h2>

<ul>
  <li><strong>Anthropic multiple tool calls</strong>: Was only processing the first tool call, silently ignoring the rest</li>
  <li><strong>Streaming errors</strong>: Now handled properly in both Faraday V1 and V2</li>
  <li><strong>Test fixtures</strong>: Removed 60MB of unnecessary test data</li>
  <li><strong>Message ordering</strong>: Fixed race conditions in streaming responses</li>
  <li><strong>JRuby support</strong>: Now officially tested and supported</li>
  <li><strong>Direct access to raw responses</strong>: Get the raw responses from Faraday for debugging</li>
  <li><strong>GPUStack support</strong>: A production-ready alternative to Ollama</li>
</ul>

<p><a href="https://github.com/crmne/ruby_llm/releases/tag/1.4.0">Full release notes for 1.4.0 available on GitHub.</a></p>

<h2 id="150-two-new-providers-friday">1.5.0: Two New Providers (Friday)</h2>

<h3 id="mistral-ai">Mistral AI</h3>

<p>63 models from France, from tiny to massive:</p>

<pre><code class="language-ruby">RubyLLM.configure do |config|
  config.mistral_api_key = ENV['MISTRAL_API_KEY']
end

# Efficient small model
chat = RubyLLM.chat(model: 'ministral-3b-latest')

# Their flagship model
chat = RubyLLM.chat(model: 'mistral-large-latest')

# Vision with Pixtral
vision = RubyLLM.chat(model: 'pixtral-12b-latest')
vision.ask("What's in this image?", with: "path/to/image.jpg")
</code></pre>

<h3 id="perplexity">Perplexity</h3>

<p>Real-time web search meets LLMs:</p>

<pre><code class="language-ruby">RubyLLM.configure do |config|
  config.perplexity_api_key = ENV['PERPLEXITY_API_KEY']
end

# Get current information with web search
chat = RubyLLM.chat(model: 'sonar-pro')
response = chat.ask("What are the latest Ruby 3.4 features?")
# Searches the web and returns current information
</code></pre>

<p><a href="https://github.com/crmne/ruby_llm/releases/tag/1.5.0">Full release notes for 1.5.0 available on GitHub.</a></p>

<h3 id="rails-generator-fixes">Rails Generator Fixes</h3>

<ul>
  <li>Fixed migration order (Chats → Messages → Tool Calls)</li>
  <li>Fixed PostgreSQL detection that was broken by namespace collision</li>
  <li>PostgreSQL users now get <code>jsonb</code> columns instead of <code>json</code></li>
</ul>

<h2 id="151-quick-fixes-also-friday">1.5.1: Quick Fixes (Also Friday)</h2>

<p>Found issues Friday afternoon. Fixed them. Shipped them. That’s it.</p>

<p>Why make users wait through the weekend with broken code?</p>

<ul>
  <li>Fixed Mistral model capabilities (was a Hash, should be Array)</li>
  <li>Fixed Google Imagen output modality</li>
  <li>Updated to JRuby 10.0.1.0</li>
  <li>Added JSON schema validation for model registry</li>
</ul>

<p><a href="https://github.com/crmne/ruby_llm/releases/tag/1.5.1">Full release notes for 1.5.1 available on GitHub.</a></p>

<h2 id="the-philosophy-ship-when-ready">The Philosophy: Ship When Ready</h2>

<p>Three days. Three releases. Each one made someone’s code work better.</p>

<p>We could have bundled everything into one release next week. But every moment we wait is a moment someone’s dealing with a bug we already fixed.</p>

<p>The structured output in 1.4.0? People needed that since before RubyLLM existed. The PostgreSQL fix in 1.5.0? Someone’s migrations were failing Thursday. The Mistral fix? Breaking someone’s code Friday morning.</p>

<p>When code is ready, you ship.</p>

<h2 id="what-you-can-build-now">What You Can Build Now</h2>

<p>With structured output and multiple providers, you can build real features:</p>

<pre><code class="language-ruby"># Extract structured data from any text
class InvoiceSchema &lt; RubyLLM::Schema
  string :invoice_number
  date :date
  float :total
  array :line_items do
    object do
      string :description
      float :amount
    end
  end
end

# Use Mistral for cost-effective extraction
extractor = RubyLLM.chat(model: 'ministral-8b-latest')
                    .with_schema(InvoiceSchema)

invoice_data = extractor.ask("Extract invoice details from: #{pdf_text}")
# Reliable data extraction at a fraction of GPT-4's cost

# Use Perplexity for current information
researcher = RubyLLM.chat(model: 'sonar-deep-research')
market_data = researcher.ask("Current Ruby job market trends in 2025")
# Real-time data, not training cutoff guesses
</code></pre>

<h2 id="use-it">Use It</h2>

<pre><code class="language-ruby">gem 'ruby_llm', '~&gt; 1.5'
</code></pre>

<p>Full backward compatibility. Your 1.0 code still runs. These releases just made everything better.</p>

]]>
</content:encoded>

</item>



</channel>
</rss>
