Skip to content

Conversation

@blob42
Copy link

@blob42 blob42 commented Dec 1, 2025

  • Allows quick reloading with SIGHUP without when filesytem watch is
    not available like on a default docker setup docker kill -s SIGHUP
  • Also refactored the signal handling in a single logical unit
  • Updated the config readme and the script for reloading on changes

Summary by CodeRabbit

  • New Features

    • Configuration hot reloading: apply config changes dynamically by sending a reload signal (no restart required).
  • Documentation

    • Added "Hot Reloading Configuration" guidance and Docker-specific SIGHUP instructions; linked the restart-on-config-change guide.
  • Examples

    • Updated watcher example to send the reload signal (SIGHUP), keep the process running, improve messaging, and handle watcher/process errors gracefully.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 1, 2025

Walkthrough

Adds SIGHUP-driven hot-reload: main now listens in a loop for OS signals, handling SIGHUP to reload configuration and SIGINT/SIGTERM for graceful shutdown; documentation and example watcher updated to send SIGHUP (or docker equivalent) instead of restarting the process.

Changes

Cohort / File(s) Summary
Documentation
docs/configuration.md
Inserted a "Hot Reloading Configuration" section describing triggering a reload via SIGHUP and included a Docker-specific command hint.
Example watcher / README
docs/examples/restart-on-config-change/README.md
Updated watcher narrative and logic to send SIGHUP (reload) instead of restarting; added inotify error handling, target-process absence checks, updated messages, and a Docker hint to use docker kill -s SIGHUP.
Signal handling / runtime
llama-swap.go
Replaced single-shot signal receive with a for-range signal loop; added SIGHUP to notifications and handling to call reloadProxyManager(); retained SIGINT/SIGTERM graceful shutdown (HTTP server shutdown with timeout, ProxyManager close) and default unhandled-signal logging.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant OS as OS / User
participant Main as llama-swap (main)
participant HTTP as HTTP Server
participant PM as ProxyManager
Note over OS,Main: Signal delivery (SIGHUP / SIGINT/SIGTERM)
OS->>Main: SIGHUP
Main->>PM: reloadProxyManager()
PM-->>Main: reload result
Main->>Main: log reload outcome
Note over OS,Main: Graceful shutdown flow
OS->>Main: SIGINT/SIGTERM
Main->>HTTP: Shutdown (graceful, timeout)
HTTP-->>Main: closed
Main->>PM: Close ProxyManager
PM-->>Main: closed
Main->>Main: close exit channel / exit

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus areas:
    • llama-swap.go: correctness of the for-range signal loop, distinguishing SIGHUP vs SIGINT/SIGTERM, race conditions during reload vs shutdown.
    • reloadProxyManager() concurrency, error handling, and resource cleanup.
    • docs/examples/restart-on-config-change/README.md: watcher correctness around inotify errors and process-existence checks.

Possibly related PRs

Suggested reviewers

  • mostlygeek

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: reload config on SIGHUP' directly and accurately summarizes the main change—adding SIGHUP signal handling for configuration reload across the codebase.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/examples/restart-on-config-change/README.md (1)

16-34: Script no longer watches for multiple config changes after switching to SIGUSR1

With kill -SIGUSR1 $PID the llama-swap process keeps running, so the subsequent wait $PID blocks until the process eventually exits. That means:

  • The first config change triggers a reload via SIGUSR1.
  • Further changes are not detected because the loop doesn’t reach the next inotifywait until the process dies.

This is at odds with the description that the script “can be used to watch config.yaml for changes and send the SIGUSR1 signal … when it detects a change.”

Consider restructuring so the process is started once and the loop only waits on inotify + sends SIGUSR1 while the PID is alive, for example:

-while true; do
-    # Start the process again
-    ./llama-swap-linux-amd64 -config $1 -listen :1867 &
-    PID=$!
-    echo "Started llama-swap with PID $PID"
-
-    # Wait for modifications in the specified directory or file
-    inotifywait -e modify "$1"
-
-    # Check if process exists before sending signal
-    if kill -0 $PID 2>/dev/null; then
-        echo "Sending SIGUSR1 to $PID"
-        kill -SIGUSR1 $PID
-        wait $PID
-    else
-        echo "Process $PID no longer exists"
-    fi
-    sleep 1
-done
+# Start the process once
+./llama-swap-linux-amd64 -config "$1" -listen :1867 &
+PID=$!
+echo "Started llama-swap with PID $PID"
+
+while kill -0 "$PID" 2>/dev/null; do
+    # Wait for modifications in the specified file
+    inotifywait -e modify "$1"
+
+    if kill -0 "$PID" 2>/dev/null; then
+        echo "Sending SIGUSR1 to $PID"
+        kill -SIGUSR1 "$PID"
+    else
+        echo "Process $PID no longer exists"
+        break
+    fi
+done
+
+wait "$PID"
🧹 Nitpick comments (2)
docs/configuration.md (1)

387-397: Hot‑reload section is clear and consistent with implementation

The SIGUSR1 instructions and Docker example match the new signal handling behavior and the linked example README; this looks good. If you expect many Windows users, consider a brief note that SIGUSR1 is POSIX‑only and that Windows should rely on -watch-config or other mechanisms.

llama-swap.go (1)

80-81: SIGUSR1 handling and signal loop look good; check Windows build and reload concurrency

The refactored signal loop cleanly separates shutdown (SIGINT/SIGTERM) from reload (SIGUSR1), and the logic matches the docs and example output. A couple of follow‑ups to consider:

  • Cross‑platform build: This file has no build tags in the snippet. If it’s compiled on Windows as well, syscall.SIGUSR1 may not be defined there, which would break Windows builds. If that’s the case, either:
    • Add an appropriate build tag (e.g. //go:build !windows) to this file, or
    • Guard SIGUSR1 registration behind a platform check / separate Unix‑only file.
  • Reload serialization (optional): reloadProxyManager can now be triggered both via fsnotify (-watch-config) and SIGUSR1. If those are ever used together, you could see concurrent reload calls mutating srv.Handler and conf. If that scenario is realistic for your users, consider serializing reloads (e.g. via a mutex or a single “reload” goroutine) to avoid subtle races.

Please verify that:

  • Cross‑platform builds (especially GOOS=windows) still compile successfully with syscall.SIGUSR1 in this file.
  • You’re comfortable with the current concurrency story for reloadProxyManager or want to guard it with a mutex.

Also applies to: 173-201

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dea9873 and e5fa388.

📒 Files selected for processing (3)
  • docs/configuration.md (1 hunks)
  • docs/examples/restart-on-config-change/README.md (3 hunks)
  • llama-swap.go (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-05T22:57:53.793Z
Learnt from: mostlygeek
Repo: mostlygeek/llama-swap PR: 155
File: proxy/process.go:359-372
Timestamp: 2025-06-05T22:57:53.793Z
Learning: In Go, cmdWaitChan in proxy/process.go is a signal channel (chan struct{}) that follows the pattern where waitForCmd() closes the channel when cmd.Wait() returns, and stopCommand() waits on it. Reading from a closed channel returns immediately, so <-cmdWaitChan won't block indefinitely.

Applied to files:

  • llama-swap.go
📚 Learning: 2025-10-15T12:15:22.706Z
Learnt from: tesurijp
Repo: mostlygeek/llama-swap PR: 350
File: llama-swap-windows.go:17-17
Timestamp: 2025-10-15T12:15:22.706Z
Learning: In the llama-swap project, the `listenStr` flag format includes the colon prefix (e.g., `:8080`), so when constructing URLs, it can be directly concatenated with `http://localhost` to produce the correct format like `http://localhost:8080`.

Applied to files:

  • docs/examples/restart-on-config-change/README.md
🧬 Code graph analysis (1)
llama-swap.go (1)
proxy/proxymanager.go (1)
  • ProxyManager (30-53)
🪛 LanguageTool
docs/examples/restart-on-config-change/README.md

[grammar] ~3-~3: Ensure spelling is correct
Context: ...he configuration file can take a bit of trail and error to get a model configuration ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: run-tests
🔇 Additional comments (1)
docs/examples/restart-on-config-change/README.md (1)

39-48: Updated sample output matches new reload behavior

The example log sequence (SIGUSR1 received, “Reloading configuration…”, “Configuration Changed”, “Configuration Reloaded”) aligns with the new signal handling and reloadProxyManager messages in llama-swap.go.

@blob42 blob42 marked this pull request as draft December 1, 2025 11:18
@blob42 blob42 force-pushed the feat-sigusr1-reload branch 3 times, most recently from f4ff658 to 169b3fb Compare December 1, 2025 13:49
@blob42 blob42 marked this pull request as ready for review December 1, 2025 13:49
@ryan-steed-usa
Copy link
Contributor

Nice idea, would it make more sense to re-appropriate SIGHUP like other popular daemons? SIGUSR1 and SIGUSR2 would then be available for future posibilities.

@blob42
Copy link
Author

blob42 commented Dec 1, 2025

@ryan-steed-usa I first started with SIGHUP as a quick patch but the docker kill command strangely terminated the container. Let me try again maybe I overlooked something.

EDIT: I changed it to SIGHUP. On my first try I didn't compile properly :)

@blob42 blob42 force-pushed the feat-sigusr1-reload branch from 169b3fb to b38cfcb Compare December 1, 2025 17:27
@blob42 blob42 changed the title feat: reload config on SIGUSR1 feat: reload config on SIGHUP Dec 1, 2025
- Allows quick reloading with SIGHUP without when filesytem watch is
  not available like on a default docker setup `docker kill -s SIGHUP`

- Also refactored the signal handling in a single logical unit

Signed-off-by: blob42 <[email protected]>
@blob42 blob42 force-pushed the feat-sigusr1-reload branch from b38cfcb to c9e0dfb Compare December 1, 2025 17:29
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
llama-swap.go (1)

88-119: Data race on srv.Handler during concurrent reload.

The srv.Handler field is accessed concurrently without synchronization:

  • Write access: Lines 100 and 117 assign a new ProxyManager to srv.Handler
  • Read access: The HTTP server reads srv.Handler.ServeHTTP() for each incoming request

In Go, unsynchronized concurrent read/write access to the same memory location constitutes a data race, even for pointer assignments. This can lead to undefined behavior, crashes, or serving requests with a partially-initialized handler.

Recommended solutions:

  1. Wrap srv.Handler in a synchronized wrapper (preferred for minimal disruption):
// Add near top of main():
type SafeHandler struct {
    mu      sync.RWMutex
    handler http.Handler
}

func (sh *SafeHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
    sh.mu.RLock()
    h := sh.handler
    sh.mu.RUnlock()
    h.ServeHTTP(w, r)
}

func (sh *SafeHandler) Set(h http.Handler) {
    sh.mu.Lock()
    sh.handler = h
    sh.mu.Unlock()
}

Then modify the server setup and reloadProxyManager:

-srv := &http.Server{
-    Addr: *listenStr,
-}
+safeHandler := &SafeHandler{}
+srv := &http.Server{
+    Addr:    *listenStr,
+    Handler: safeHandler,
+}

And in reloadProxyManager:

-    if currentPM, ok := srv.Handler.(*proxy.ProxyManager); ok {
+    if sh, ok := srv.Handler.(*SafeHandler); ok {
+        sh.mu.RLock()
+        currentPM, isPM := sh.handler.(*proxy.ProxyManager)
+        sh.mu.RUnlock()
+        if isPM {
             // ... reload logic ...
-            srv.Handler = newPM
+            sh.Set(newPM)
+        }
     } else {
         // ... initial load ...
-        srv.Handler = newPM
+        if sh, ok := srv.Handler.(*SafeHandler); ok {
+            sh.Set(newPM)
+        }
     }
  1. Use atomic.Value for lock-free reads (alternative):
var handler atomic.Value  // stores http.Handler
srv.Handler = http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
    handler.Load().(http.Handler).ServeHTTP(w, r)
})
// In reloadProxyManager: handler.Store(newPM)
♻️ Duplicate comments (1)
docs/examples/restart-on-config-change/README.md (1)

3-3: Typo fixed: "trial and error" is now correct.

This addresses the previous review feedback.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 169b3fb and c9e0dfb.

📒 Files selected for processing (3)
  • docs/configuration.md (1 hunks)
  • docs/examples/restart-on-config-change/README.md (2 hunks)
  • llama-swap.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/configuration.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-05T22:57:53.793Z
Learnt from: mostlygeek
Repo: mostlygeek/llama-swap PR: 155
File: proxy/process.go:359-372
Timestamp: 2025-06-05T22:57:53.793Z
Learning: In Go, cmdWaitChan in proxy/process.go is a signal channel (chan struct{}) that follows the pattern where waitForCmd() closes the channel when cmd.Wait() returns, and stopCommand() waits on it. Reading from a closed channel returns immediately, so <-cmdWaitChan won't block indefinitely.

Applied to files:

  • llama-swap.go
🧬 Code graph analysis (1)
llama-swap.go (1)
proxy/proxymanager.go (1)
  • ProxyManager (30-53)
🔇 Additional comments (2)
llama-swap.go (2)

80-80: LGTM: SIGHUP added to signal notifications.

The signal is correctly registered for handling configuration reloads.


173-201: Signal handling loop correctly implements reload and shutdown.

The refactored signal handling properly distinguishes between:

  • Graceful shutdown on SIGINT/SIGTERM (with timeout and cleanup)
  • Hot reload on SIGHUP
  • Logging for unexpected signals

The loop structure ensures the goroutine exits after shutdown signals but continues processing after reload signals.

Comment on lines +34 to +35
echo "Sending SIGHUP to $PID"
kill -USR1 $PID
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Script sends SIGUSR1 but llama-swap expects SIGHUP.

There's a critical mismatch between the comment and the actual signal sent:

  • Line 34 comment: "Sending SIGHUP to $PID"
  • Line 35 command: kill -USR1 $PID (sends SIGUSR1, not SIGHUP)
  • llama-swap.go line 80: Only listens for syscall.SIGHUP

SIGUSR1 and SIGHUP are different signals (typically signal 10 vs. signal 1). The current script will send SIGUSR1, which llama-swap won't handle, so configuration reload will not work.

Apply this fix:

     if kill -0 $PID 2>/dev/null; then
         echo "Sending SIGHUP to $PID"
-        kill -USR1 $PID
+        kill -HUP $PID
     else

Note: The Docker example on line 13 correctly uses docker kill -s SIGHUP, which is consistent with the intended behavior.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo "Sending SIGHUP to $PID"
kill -USR1 $PID
if kill -0 $PID 2>/dev/null; then
echo "Sending SIGHUP to $PID"
kill -HUP $PID
else
🤖 Prompt for AI Agents
In docs/examples/restart-on-config-change/README.md around lines 34-35, the
script comment says "Sending SIGHUP to $PID" but the command uses `kill -USR1
$PID`, which sends SIGUSR1 while llama-swap only listens for SIGHUP; change the
command to send SIGHUP (e.g., `kill -HUP $PID` or `kill -s SIGHUP $PID`) so the
signal matches the comment and the process' SIGHUP handler — keep the existing
Docker example unchanged since it already uses SIGHUP.

Comment on lines +52 to 58
...
Sending SIGHUP to 495455
Received SIGHUP. Reloading configuration...
Configuration Changed
Configuration Reloaded
...
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Output example shows correct SIGHUP flow but won't match current script.

The example output correctly demonstrates the SIGHUP reload sequence:

  • "Sending SIGHUP to [PID]"
  • "Received SIGHUP. Reloading configuration..."
  • "Configuration Changed"
  • "Configuration Reloaded"

However, this output won't occur with the current script because it sends SIGUSR1 instead of SIGHUP (see comment on lines 34-35). After fixing the signal in line 35, this example will be accurate.

🤖 Prompt for AI Agents
docs/examples/restart-on-config-change/README.md lines 52-58: the example output
shows a SIGHUP-based reload sequence but the actual example script sends SIGUSR1
(see comment/usage around line 34-35), so update the script to send SIGHUP
instead of SIGUSR1 (change the signal argument at line 35 to "SIGHUP"), then
re-run the example to ensure the README output matches runtime behavior.

@ryan-steed-usa
Copy link
Contributor

@blob42 I've tested this with Podman (via my forked franken-builds) and it works well. Thanks!

@ryan-steed-usa
Copy link
Contributor

ryan-steed-usa commented Dec 1, 2025

I still believe that this PR introduces useful functionality but it has helped me to understand why Docker/Podman with -watch-config didn't work as expected. The problem seems related to commit 6a058e4 and use of the example single-file volume mount -v /path/to/custom/config.yaml:/app/config.yaml. Using a directory mount like -v /path/to/custom/config:/app/config coupled with -config /app/config/config.yaml -watch-config works as desired with rootless Podman and root Docker! This is a good candidate for a reworked version of restart-on-config-change/README.md.

@mostlygeek
Copy link
Owner

mostlygeek commented Dec 2, 2025

The --watch-config works in docker if you volume mount a directory with the config file in it and use a custom entrypoint like:

docker run -v /path/to/config_dir:/config --entrypoint "/app/llama-swap" llama-swap:cpu -config /config/config.yaml

It is a problem with the fsnotify library I used that is able to detect changes for a directory but not a file when it is volume mounted.

Perhaps instead of listening for new signals a run.sh as a the entrypoint to handle this would be better.

@ryan-steed-usa
Copy link
Contributor

Overriding the entrypoint isn't necessary either since we already use the exec form of ENTRYPOINT:

ENTRYPOINT [ "/app/llama-swap", "-config", "/app/config.yaml" ]

Command line arguments to docker run will be appended after all elements in an exec form ENTRYPOINT, and will override all elements specified using CMD.

This allows arguments to be passed to the entry point, i.e., docker run -d will pass the -d argument to the entry point. You can override the ENTRYPOINT instruction using the docker run --entrypoint flag.

So a command like this works by appending the arguments:

docker run -v /path/to/config_dir:/config llama-swap:cpu -config /config/config.yaml -watch-config

It would help if this were to be mentioned in the docs.

@mostlygeek
Copy link
Owner

@ryan-steed-usa thanks. I forgot that with go’s flags package defining -config /config/config.yaml would over ride the one defined in the entrypoint.

I’ll be sure to update the docs as mounting a config directory is the workaround for making —watch-config work reliably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants