Lumpiasty
6546676dd6
add llama-swap optimizations recommended by claude
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-05-31 05:18:47 +02:00
Lumpiasty
fc58a6507b
disable mlock
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
ci/woodpecker/cron/renovate Pipeline was successful
2026-05-24 19:16:24 +02:00
Lumpiasty
6096b7019d
fix path to llama-server binary
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
ci/woodpecker/cron/renovate Pipeline was successful
2026-05-23 09:32:11 +02:00
Lumpiasty
c161da3657
add mlock and disable mmap in llama-server
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-05-21 23:18:05 +02:00
Lumpiasty
fc2c15d154
move whisper to gpu
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-05-21 22:02:34 +02:00
Lumpiasty
989732e1b5
move kokoro to separate deployment
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-05-21 21:34:33 +02:00
Lumpiasty
ab438be629
fix tts model path
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-05-21 21:11:29 +02:00
Lumpiasty
4556ca3c08
add ffmpeg for whisper
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-05-21 20:58:56 +02:00
Lumpiasty
611f9f3886
add tts and sst to llama-swap and openwebui
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-05-21 20:43:54 +02:00
Lumpiasty
c82f60e90a
switch text encoder to ponpoke/flux2-klein-4b-uncensored-text-encoder Q4_K_M
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-05-20 17:50:06 +02:00
Lumpiasty
b41342be01
switch image model to FLUX.2-klein-4B (Apache 2.0, 4-step, unified gen+edit)
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-05-20 16:49:30 +02:00
Lumpiasty
d3434a4102
remove unused qwen nothink chat template
2026-05-20 01:38:24 +02:00
Lumpiasty
de2822fee1
switch llama-swap to unified-vulkan image with FLUX.1-dev image generation
...
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
- Update deployment to unified-vulkan-2026-05-19 (includes llama-server,
sd-server, whisper-server in one image)
- Fix binary paths: /app/llama-server -> llama-server (now on PATH)
- Migrate groups -> matrix to allow FLUX to evict the always-on 0.8B model
when image generation is requested
- Add FLUX.1-dev Q4_K_S model via sd-server
- Configure OpenWebUI image generation to use llama-swap sd-server
- Update renovate versioning regex to treat all unified-vulkan date tags as
patch updates for automerge
2026-05-20 01:11:57 +02:00
Lumpiasty
55ac337a63
enable MTP on MTP models
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
ci/woodpecker/cron/renovate Pipeline was successful
2026-05-18 20:31:57 +02:00
Lumpiasty
f3ad488bc8
add MTP version of Qwen3.6-35B
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-05-18 19:42:48 +02:00
Lumpiasty
8cbe2ef794
add abliterated versions of Qwen3.6-35B
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
ci/woodpecker/cron/renovate Pipeline was successful
2026-04-30 18:35:32 +02:00
Lumpiasty
b0f20de80b
qwen3.6 and cleanup of llama-swap config
...
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
ci/woodpecker/cron/renovate Pipeline was successful
- Deleting unused models
- Cleaned up, unified and fixed qwen3.5 sampling params to thinking and non-thinking params, no futrher differentiation
- kv cache quant q4_0 everywhere
2026-04-23 00:43:37 +02:00
Lumpiasty
328b14ded7
add cpu version of Qwen3.5-35B-A3B
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-04-18 23:21:52 +02:00
Lumpiasty
480fb1c6d6
set bonsai model ctx to 64k
2026-04-18 15:48:30 +02:00
Lumpiasty
ae7f53c395
add bonsai model
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
ci/woodpecker/cron/renovate Pipeline was successful
2026-04-14 18:31:18 +02:00
Lumpiasty
8160a52176
add gemma 4 models
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-04-04 02:48:02 +02:00
Lumpiasty
ad3b2229c2
get rid of openrouter proxying via llama-swap
ci/woodpecker/push/flux-reconcile-source Pipeline was successful
2026-04-04 02:39:26 +02:00
Lumpiasty
054df42d8b
update qwen3.5 4b ctx size to 128k
2026-03-30 21:05:00 +02:00
Lumpiasty
9e74ed6a19
increase --fit-target to 1.5GB
2026-03-29 23:50:45 +02:00
Lumpiasty
ce0b13ebb3
change kv cache quant to q8_0
2026-03-20 00:57:39 +01:00
Lumpiasty
7e7b3e3d71
add max ctx on llama.cpp
2026-03-17 01:33:35 +01:00
Lumpiasty
79315d32db
add GLM-4.7-Flash model
2026-03-16 18:19:28 +01:00
Lumpiasty
2d295d24e0
add 27b q3 variant of qwen3.5
2026-03-13 04:00:10 +01:00
Lumpiasty
e8efa9ddc1
lower kv cache quant to q4_0 and increase ctx to 64k
2026-03-13 04:00:10 +01:00
Lumpiasty
c88dd2899a
remove ttl of all models in llama-swap
2026-03-13 04:00:10 +01:00
Lumpiasty
39fc38d62b
add qwen3.5 4b heretic
2026-03-13 04:00:10 +01:00
Lumpiasty
e72a79be8f
add glm-5 from openrouter to llama-swap
2026-03-13 04:00:10 +01:00
Lumpiasty
4fda343b01
clean up llama-swap config
2026-03-13 04:00:10 +01:00
Lumpiasty
266ced7362
adjust parameters of qwen3-coder-next
2026-03-13 04:00:10 +01:00
Lumpiasty
8a074839b1
automatically fit context on qwen3.5 2b and 4b
2026-03-13 04:00:10 +01:00
Lumpiasty
42038207fc
Add Q3_K_M variand of Qwen3.5-9B
2026-03-13 04:00:10 +01:00
Lumpiasty
28cb53c031
fiix thinking versions of Qwen3.5 small
2026-03-13 04:00:10 +01:00
Lumpiasty
46a7e24932
add 2B, 4B, 9B versions of Qwen3.5 in thinking + nonthinking variants
2026-03-13 04:00:10 +01:00
Lumpiasty
cd7ebac6b9
increase target margin of 2048MB of VRAM
2026-03-13 04:00:10 +01:00
Lumpiasty
ba9db6ce41
add Qwen3.5 Small 0.8B model and replace Qwen3-VL-2B as task model
2026-03-13 04:00:10 +01:00
Lumpiasty
6dd9a717e2
shorten context for qwen3-vl-2b and lower kv cache quant
2026-03-13 04:00:10 +01:00
Lumpiasty
c67b6f7ebe
add path to mmproj in qwen3.5 heretic
2026-03-13 04:00:10 +01:00
Lumpiasty
78a81c5b72
Add mmproj-url for Qwen3.5-35B-A3B-heretic model
2026-03-13 04:00:10 +01:00
Lumpiasty
2bb23c4ed0
add gemma-3-270m-it-qat model
2026-03-13 04:00:10 +01:00
Lumpiasty
8c29fc8018
Add Qwen3.5-35B-A3B-heretic models
2026-03-13 04:00:10 +01:00
Lumpiasty
2836542569
Add always loaded Qwen3-VL-2B-Instruct
2026-03-13 04:00:10 +01:00
Lumpiasty
1e68450d8a
Add Qwen3.5-35-A3B model
2026-03-13 04:00:10 +01:00
Lumpiasty
2c83eb26b3
automatically fit models by llama.cpp
2026-03-13 04:00:10 +01:00
Lumpiasty
b61e3b5c08
add schema reference to config.yaml
2026-03-13 04:00:10 +01:00
Lumpiasty
59bf4a1aa6
configure llama-swap to log llama.cpp output
2026-03-13 04:00:10 +01:00