Monday, April 6, 2026

Image Generation with ComfyUI From Bash

The best tool I've found to run AI image and video-generation models is ComfyUI. It supports a lot of different models and has a dataflow programming architecture that allows fairly sophisticated use cases. The problem for me, however, is that it's awkward to use it just to generate an image or two. It's also hard to use the web interface to iterate over different prompts or parameters. So this post describes a workflow for calling ComfyUI from Bash and previewing images directly from a terminal.

ComfyUI has a REST interface that can be accessed from the command line via curl. This is used by the web interface, so you get the full power of the program without having to click on little boxes to change parameters.

I won't cover installing ComfyUI. You'll have to figure that out yourself. But let's assume you have ComfyUI installed and running, and you've downloaded the model files for Z-Image Turbo and can generate images in your browser. Now you want to do the same from a command line.

First, you'll need a JSON file for your workflow. Load up the ComfyUI web interface and open up your workflow. As of this writing, you can right-click on the workflow tab at the top of the window and select "Export (API)" to create this file. So for example, if you do this with the default Z-Image Turbo workflow, you'll get something like this:

{
  "9": {
    "inputs": {
      "filename_prefix": "z-image-turbo",
      "images": [
        "57:8",
        0
      ]
    },
    "class_type": "SaveImage",
    "_meta": {
      "title": "Save Image"
    }
  },
  "57:30": {
    "inputs": {
      "clip_name": "qwen_3_4b.safetensors",
      "type": "lumina2",
      "device": "default"
    },
    "class_type": "CLIPLoader",
    "_meta": {
      "title": "Load CLIP"
    }
  },
  "57:33": {
    "inputs": {
      "conditioning": [
        "57:27",
        0
      ]
    },
    "class_type": "ConditioningZeroOut",
    "_meta": {
      "title": "ConditioningZeroOut"
    }
  },
  "57:8": {
    "inputs": {
      "samples": [
        "57:3",
        0
      ],
      "vae": [
        "57:29",
        0
      ]
    },
    "class_type": "VAEDecode",
    "_meta": {
      "title": "VAE Decode"
    }
  },
  "57:28": {
    "inputs": {
      "unet_name": "z_image_turbo_bf16.safetensors",
      "weight_dtype": "default"
    },
    "class_type": "UNETLoader",
    "_meta": {
      "title": "Load Diffusion Model"
    }
  },
  "57:27": {
    "inputs": {
      "text": "A sea lion on a beach, holding a sign that says, \"Command-Line Interfaces Rock!\"",
      "clip": [
        "57:30",
        0
      ]
    },
    "class_type": "CLIPTextEncode",
    "_meta": {
      "title": "CLIP Text Encode (Prompt)"
    }
  },
  "57:13": {
    "inputs": {
      "width": 1024,
      "height": 1024,
      "batch_size": 1
    },
    "class_type": "EmptySD3LatentImage",
    "_meta": {
      "title": "EmptySD3LatentImage"
    }
  },
  "57:11": {
    "inputs": {
      "shift": 3,
      "model": [
        "57:28",
        0
      ]
    },
    "class_type": "ModelSamplingAuraFlow",
    "_meta": {
      "title": "ModelSamplingAuraFlow"
    }
  },
  "57:3": {
    "inputs": {
      "seed": 277911290314474,
      "steps": 8,
      "cfg": 1,
      "sampler_name": "res_multistep",
      "scheduler": "simple",
      "denoise": 1,
      "model": [
        "57:11",
        0
      ],
      "positive": [
        "57:27",
        0
      ],
      "negative": [
        "57:33",
        0
      ],
      "latent_image": [
        "57:13",
        0
      ]
    },
    "class_type": "KSampler",
    "_meta": {
      "title": "KSampler"
    }
  },
  "57:29": {
    "inputs": {
      "vae_name": "ae.safetensors"
    },
    "class_type": "VAELoader",
    "_meta": {
      "title": "Load VAE"
    }
  }
}

This JSON file describes the graph that's used to generate the image. You'll need to wrap this in a larger JSON object and send that to ComfyUI to get an image back. So create a file named zimageturbo.json and put the following in it:

{
  "client_id": "e70c5721-5e9f-43b4-bcb3-23bf05fec938",
  "prompt_id": "5719972c-88d3-45b7-b441-ef06f0b1a011",
  "prompt":
    /* insert your workflow JSON from above here */
}

The long HEX numbers are just UUIDs I generated with uuidgen. You'll need to generate a new prompt_id UUID one for each prompt you submit and a new client_id from each computer you want to connect from.

Now you can send this to the ComfyUI instance with curl. Assuming your server is at the default http://127.0.0.1:8188, you can do:

$ curl -s --url "http://127.0.0.1:8188/prompt" --json @zimageturbo.json

Now you wait for the image to be generated. You can submit requests to the history/[prompt_id] endpoint (filling in the prompt_id to match your JSON file) to see when the image has completed and where to find it:

$ curl -s --url "http://127.0.0.1:8188/history/5719972c-88d3-45b7-b441-ef06f0b1a011"

Before the image is completed, this will return nothing. After the image is completed, this will return some JSON:

{
  "5719972c-88d3-45b7-b441-ef06f0b1a011": {
    "prompt": [
      7,
      "5719972c-88d3-45b7-b441-ef06f0b1a011",
      {
        "9": {
          "inputs": {
            "filename_prefix": "z-image-turbo",
            "images": [
              "57:8",
              0
            ]
          },
          "class_type": "SaveImage",
          "_meta": {
            "title": "Save Image"
          }
        },
        "57:30": {
          "inputs": {
            "clip_name": "qwen_3_4b.safetensors",
            "type": "lumina2",
            "device": "default"
          },
          "class_type": "CLIPLoader",
          "_meta": {
            "title": "Load CLIP"
          }
        },
        "57:33": {
          "inputs": {
            "conditioning": [
              "57:27",
              0
            ]
          },
          "class_type": "ConditioningZeroOut",
          "_meta": {
            "title": "ConditioningZeroOut"
          }
        },
        "57:8": {
          "inputs": {
            "samples": [
              "57:3",
              0
            ],
            "vae": [
              "57:29",
              0
            ]
          },
          "class_type": "VAEDecode",
          "_meta": {
            "title": "VAE Decode"
          }
        },
        "57:28": {
          "inputs": {
            "unet_name": "z_image_turbo_bf16.safetensors",
            "weight_dtype": "default"
          },
          "class_type": "UNETLoader",
          "_meta": {
            "title": "Load Diffusion Model"
          }
        },
        "57:27": {
          "inputs": {
            "text": "A sea lion on a beach, holding a sign that says, \"Command-Line Interfaces Rock!\"",
            "clip": [
              "57:30",
              0
            ]
          },
          "class_type": "CLIPTextEncode",
          "_meta": {
            "title": "CLIP Text Encode (Prompt)"
          }
        },
        "57:13": {
          "inputs": {
            "width": 1024,
            "height": 1024,
            "batch_size": 1
          },
          "class_type": "EmptySD3LatentImage",
          "_meta": {
            "title": "EmptySD3LatentImage"
          }
        },
        "57:11": {
          "inputs": {
            "shift": 3.0,
            "model": [
              "57:28",
              0
            ]
          },
          "class_type": "ModelSamplingAuraFlow",
          "_meta": {
            "title": "ModelSamplingAuraFlow"
          }
        },
        "57:3": {
          "inputs": {
            "seed": 277911290314474,
            "steps": 8,
            "cfg": 1.0,
            "sampler_name": "res_multistep",
            "scheduler": "simple",
            "denoise": 1.0,
            "model": [
              "57:11",
              0
            ],
            "positive": [
              "57:27",
              0
            ],
            "negative": [
              "57:33",
              0
            ],
            "latent_image": [
              "57:13",
              0
            ]
          },
          "class_type": "KSampler",
          "_meta": {
            "title": "KSampler"
          }
        },
        "57:29": {
          "inputs": {
            "vae_name": "ae.safetensors"
          },
          "class_type": "VAELoader",
          "_meta": {
            "title": "Load VAE"
          }
        }
      },
      {
        "client_id": "e70c5721-5e9f-43b4-bcb3-23bf05fec938",
        "create_time": 1775495893332
      },
      [
        "9"
      ]
    ],
    "outputs": {
      "9": {
        "images": [
          {
            "filename": "z-image-turbo_00000_.png",
            "subfolder": "",
            "type": "output"
          }
        ]
      }
    },
    "status": {
      "status_str": "success",
      "completed": true,
      "messages": [
        [
          "execution_start",
          {
            "prompt_id": "5719972c-88d3-45b7-b441-ef06f0b1a011",
            "timestamp": 1775495893332
          }
        ],
        [
          "execution_cached",
          {
            "nodes": [],
            "prompt_id": "5719972c-88d3-45b7-b441-ef06f0b1a011",
            "timestamp": 1775495893333
          }
        ],
        [
          "execution_success",
          {
            "prompt_id": "5719972c-88d3-45b7-b441-ef06f0b1a011",
            "timestamp": 1775495906934
          }
        ]
      ]
    },
    "meta": {
      "9": {
        "node_id": "9",
        "display_node": "9",
        "parent_node": null,
        "real_node_id": "9"
      }
    }
  }
}

You need to look at the contents of [prompt_id]["outputs"]["9"]. You can extract this with the jq tool:

$ curl -s --url "http://127.0.0.1:8188/history/5719972c-88d3-45b7-b441-ef06f0b1a011" | 
jq '.["5719972c-88d3-45b7-b441-ef06f0b1a011"].outputs.["9"]'

Which will display:

{
  "images": [
    {
      "filename": "z-image-turbo_00000_.png",
      "subfolder": "",
      "type": "output"
    }
  ]
}

With that information, we can use the view endpoint to fetch the image:

$ curl -s --get --url "http://127.0.0.1:8188/view" -d 'filename=z-image-turbo_00000_.png' \
-d 'subfolder=' -d 'type=output' -o z-image-turbo_00000_.png

You can view the resulting image with a tool like timg or chafa. If you have a new enough terminal emulator, this can even display the full-resolution image.

Sea lions can't read or write, so this might be the best you can hope for.

And that's all there is to it. You can modify the JSON file to edit the prompt, image width and height, random seed, diffusion steps, or anything else you want to change. You can also export other workflows and use any ComfyUI-supported model from the command line this way.

Friday, March 27, 2026

Generating Formatted Long Division Practice Problems

Here's a Python script I wrote to generate a bunch of random long division problems with solutions, showing work.

#!/usr/bin/env python3

import random

# Print 100 random 4-by-1 digit long division problems,
# showing the complete solution process.

def long_division(dividend, divisor):
    quotient = dividend // divisor
    remainder = dividend - quotient * divisor

    indent = " "*6

    print(indent + "      %4d R %d" % (quotient, remainder))
    print(indent + "    ------")
    print(indent + "%3d ) %4d" % (divisor, dividend))

    q = ((quotient // 1) % 10,
         (quotient // 10) % 10,
         (quotient // 100) % 10,
         (quotient // 1000) % 10)

    d = ((dividend // 1) % 10,
         (dividend // 10) % 10,
         (dividend // 100) % 10,
         (dividend // 1000) % 10)

    indent += " "*4

    rem = 0
    digit = 3

    # find the first non-zero quotient digit
    while digit > -1:
        rem = rem * 10 + d[digit]
        if q[digit] > 0:
            break
        digit -= 1
    if digit < 0:
        return

    while digit >= 0:

        # subtract the product from the remainder
        print(indent + "- %*d" % (4 - digit, q[digit] * divisor))
        print(indent + "-------")
        rem = rem - (q[digit] * divisor)
        print(indent + "  %*d" % (4 - digit, rem), end="")

        digit -= 1

        # find the next non-zero quotient digit, bringing more quotient
        # digits down

        while digit >= 0:
            rem = rem * 10 + d[digit]
            print("%d" % d[digit], end="")
            if q[digit] > 0:
                break
            digit -= 1

        print("")

for i in range(100):

    print("")
    print("-"*20)
    print("#%d" % (i+1))

    dividend = random.randint(1000,9999)
    divisor = random.randint(2,9)

    long_division(dividend, divisor)

This just prints the problems to stdout, and you can grep for the important lines if you want to remove the solutions:

$ python3 division.py > solutions
$ grep -E "\#|\)| ------$" solutions > problems

Here's a sample output:

--------------------
#1
             207 R 1
          ------
        6 ) 1243
          - 12
          -------
             043
          -   42
          -------
               1
--------------------
#2
             555 R 4
          ------
        9 ) 4999
          - 45
          -------
             49
          -  45
          -------
              49
          -   45
          -------
               4
--------------------
#3
             609 R 2
          ------
        6 ) 3656
          - 36
          -------
             056
          -   54
          -------
               2
--------------------
#4
            1014 R 3
          ------
        6 ) 6087
          - 6
          -------
            008
          -   6
          -------
              27
          -   24
          -------
               3

Monday, March 9, 2026

Image Generation with stable-diffusion.cpp

This post documents how I set up stable-diffusion.cpp to generate images on my PC, running on Linux Mint 22.2, with an Nvidia GeForce RTX 5070 GPU.

Prerequisites

The nvidia-cuda-toolkit package in Linux Mint 22.2 is too old for the RTX 50x0, so I had to go to Nvidia's CUDA web site to download a newer version. This is a little tricky because you have to know what version of Ubuntu your Linux Mint is based off of. Check /etc/upstream-release/lsb-release to see, then download and install the appropriate deb packages from Nvidia's site. Then put the CUDA directory in your PATH

You'll also need some random packages to build this. I can't remember them all, but at least cmake and git are required.

If you go to the stable-diffusion.cpp Git repository, there's a build guide that should walk you through the process of downloading and building the software. However, if you have multiple versions of the nvidia-cuda-toolkit package installed, you need to tell CMake where to find the correct nvcc. Most of the instructions I found to do this were incorrect for my situation (maybe because I had previously installed the system nvidia-cuda-toolkit package). I had to use the command:

$ cmake .. -DSD_CUDA=ON -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.8/bin/nvcc

in place of the command

$ cmake .. -DSD_CUDA=ON

from the stable-diffusion.cpp/build/ directory in order to detect the Nvidia-provided version of nvcc.

If you don't have a GPU but have at least 16 GB of RAM, you can compile stable-diffusion.cpp to run on your CPU instead and still have decent image generation, but it will be dramatically slower. My Ryzen 5700X CPU is about 130x slower than my RTX 5070 for this. With less RAM, you may still be able to get away with heavily quantized models but might not like the results.

After you've built stable-diffusion.cpp, you need some model files. There are links to these in the Markdown files in stable-diffusion.cpp/docs/. I recommend looking at z_image.md for Z-Image Turbo. This is probably the best all-around local image generation model as of early March 2025. It is notable for running in a relatively small footprint, following prompts very well, and avoiding "body horror" like giving people extra limbs.

You'll need a GGUF file for Z-Image Turbo. The different options are various sizes, differing in the amount of quantization. You'll want the largest one that fits in your GPU's VRAM, though you have to leave some space for the work buffers. The Q8_0 version should work for GPUs with 8 GB of VRAM or more.

You also need a VAE safetensors file and the Qwen3 4B text encoder GGUF file linked in the Markdown file. Once again, the Q8_0 version of Qwen3 4B is probably fine if you have 8 GB of VRAM.

Generating Images

There are some serious limitations with the way stable-diffusion.cpp uses VRAM, so the best way to do all of this is to run headless. Kill any GUI programs running on your Linux box, log out of the desktop GUI, and SSH in from another computer.

Once you've built stable-diffusion.cpp and downloaded all the model files, you can run stable-diffusion.cpp/build/bin/sd-cli. I put my models in the ~/ai/models directory, generated a GGUF version of the VAE, and renamed the model files to simplify them, so my invocation looks like:

$ ./sd-cli --mmap --diffusion-model ~/ai/models/z_image_turbo.gguf \
--vae ~/ai/models/flux_ae.gguf --llm ~/ai/models/qwen_3_4b.gguf \
--cfg-scale 1.0 --offload-to-cpu --diffusion-fa -H 1024 -W 1024 \
--steps 9 -s 42 -p "Cute illustration of a sea lion on a rocky beach, \
holding a sign that says \"Will generate images for fish\""

You can look at sd-cli's help text to see what the options do. Some notable ones are --offload-to-cpu and --diffusion-fa flags which reduce VRAM usage.

This command will generate the image file output.png which you can view from your terminal with a tool like timg or chafa if you have a compatible terminal emulator (Ghostty for example). Here's what mine looked like:

Notice that the text doesn't match the prompt. That's pretty common with small image generators (especially quantized versions), and you just have to keep trying with different seed values (the -s parameter above) or slightly tweaked prompts until you get what you want.

What's the Point?

Why bother with all this to run Z-Image Turbo locally when you can just use Google or ChatGPT to generate images? I think this is most useful for integrating image generation into a larger workflow. For example, you can generate hundreds of random images overnight to help generate ideas for a project. Or you could use an edit model to change a whole directory of family photos into cartoon-style drawings. And it all runs locally on your machine, which lets you deploy it to places without fast network access. With a fast enough GPU, you could build a camera that automatically cartoonifies live images.

Tips

  • Z-Image Turbo prefers long, detailed prompts. I think the easiest way to work with it is to use a large language model (LLM) to generate long prompts from your shorter ones. I've had good results with Ministral 3 8B
  • Since Z-Image Turbo tends to produce similar images even for different seeds, to get more diversity, use an LLM to rewrite your prompts and turn up the randomness.
  • Unlike PyTorch-based projects like ComfyUI that can load models incrementally in whatever VRAM is available, stable-diffusion.cpp requires enough VRAM to load the entire diffusion model with extra space for working memory. Until that's fixed, this approach is only useful for smaller models (unless you have a giant GPU or don't mind running much slower on the CPU).
  • Once you have generation working, you can try an edit model like Flux.2 Klein 4B. It's a little tricky to use but opens up new fun applications.

What About the Ethics of AI Art?

Yes, it is true that AI image models are trained on huge quantities of other peoples' art without permission or attribution. Feel free to avoid this technology completely if you want.

On the other hand, these models are very fun to goof around with! After you generate a few hundred images, however, you'll notice a bland sameness to the outputs and a casual disregard of physics that shows the limits of current tech.

The two best uses I've found are generating random images to help brainstorm and generating filler content that nobody is going to scrutinize. For anything else, you'll want a real artist to at least refine the output and tell you all the things you're not seeing.