|
|
79315d32db
|
add GLM-4.7-Flash model
|
2026-03-16 18:19:28 +01:00 |
|
|
|
2d295d24e0
|
add 27b q3 variant of qwen3.5
|
2026-03-13 04:00:10 +01:00 |
|
|
|
e8efa9ddc1
|
lower kv cache quant to q4_0 and increase ctx to 64k
|
2026-03-13 04:00:10 +01:00 |
|
|
|
c88dd2899a
|
remove ttl of all models in llama-swap
|
2026-03-13 04:00:10 +01:00 |
|
|
|
39fc38d62b
|
add qwen3.5 4b heretic
|
2026-03-13 04:00:10 +01:00 |
|
|
|
e72a79be8f
|
add glm-5 from openrouter to llama-swap
|
2026-03-13 04:00:10 +01:00 |
|
|
|
4fda343b01
|
clean up llama-swap config
|
2026-03-13 04:00:10 +01:00 |
|
|
|
266ced7362
|
adjust parameters of qwen3-coder-next
|
2026-03-13 04:00:10 +01:00 |
|
|
|
8a074839b1
|
automatically fit context on qwen3.5 2b and 4b
|
2026-03-13 04:00:10 +01:00 |
|
|
|
42038207fc
|
Add Q3_K_M variand of Qwen3.5-9B
|
2026-03-13 04:00:10 +01:00 |
|
|
|
28cb53c031
|
fiix thinking versions of Qwen3.5 small
|
2026-03-13 04:00:10 +01:00 |
|
|
|
46a7e24932
|
add 2B, 4B, 9B versions of Qwen3.5 in thinking + nonthinking variants
|
2026-03-13 04:00:10 +01:00 |
|
|
|
cd7ebac6b9
|
increase target margin of 2048MB of VRAM
|
2026-03-13 04:00:10 +01:00 |
|
|
|
ba9db6ce41
|
add Qwen3.5 Small 0.8B model and replace Qwen3-VL-2B as task model
|
2026-03-13 04:00:10 +01:00 |
|
|
|
6dd9a717e2
|
shorten context for qwen3-vl-2b and lower kv cache quant
|
2026-03-13 04:00:10 +01:00 |
|
|
|
c67b6f7ebe
|
add path to mmproj in qwen3.5 heretic
|
2026-03-13 04:00:10 +01:00 |
|
|
|
78a81c5b72
|
Add mmproj-url for Qwen3.5-35B-A3B-heretic model
|
2026-03-13 04:00:10 +01:00 |
|
|
|
2bb23c4ed0
|
add gemma-3-270m-it-qat model
|
2026-03-13 04:00:10 +01:00 |
|
|
|
8c29fc8018
|
Add Qwen3.5-35B-A3B-heretic models
|
2026-03-13 04:00:10 +01:00 |
|
|
|
2836542569
|
Add always loaded Qwen3-VL-2B-Instruct
|
2026-03-13 04:00:10 +01:00 |
|
|
|
1e68450d8a
|
Add Qwen3.5-35-A3B model
|
2026-03-13 04:00:10 +01:00 |
|
|
|
2c83eb26b3
|
automatically fit models by llama.cpp
|
2026-03-13 04:00:10 +01:00 |
|
|
|
b61e3b5c08
|
add schema reference to config.yaml
|
2026-03-13 04:00:10 +01:00 |
|
|
|
59bf4a1aa6
|
configure llama-swap to log llama.cpp output
|
2026-03-13 04:00:10 +01:00 |
|
|
|
63a8e2f7ac
|
add Qwen3-Coder-Next model
|
2026-03-13 04:00:10 +01:00 |
|
|
|
9032060930
|
add abliterated versions of qwen3-vl
|
2026-03-13 04:00:08 +01:00 |
|
|
|
f13c3ae3e7
|
Add 8B and 2B variants of qwen3-vl
|
2026-03-13 04:00:08 +01:00 |
|
|
|
669beccc35
|
fix Qwen3-VL-4B-Instruct-GGUF models looping issue
|
2026-03-13 04:00:08 +01:00 |
|
|
|
5eb7b7bb0c
|
add qwen3-vl thinking variant
|
2026-03-13 04:00:08 +01:00 |
|
|
|
0b677d0faf
|
add qwen3-vl, fix librechat taking over settings and clean up llama config
|
2026-03-13 04:00:08 +01:00 |
|
|
|
9544f4719f
|
Add Qwen2.5-VL models
|
2026-03-13 04:00:07 +01:00 |
|
|
|
eb4ac7acf4
|
add qwen3-4b-2507 model
|
2026-03-13 04:00:07 +01:00 |
|
|
|
93855dc712
|
llama automatic unloading and longer start timeout
|
2026-03-13 04:00:07 +01:00 |
|
|
|
241dce4524
|
disable warmups
|
2026-03-13 04:00:07 +01:00 |
|
|
|
17805e6b31
|
add gemma3 model
|
2026-03-13 04:00:07 +01:00 |
|
|
|
32eea7c3af
|
add gemma3n
|
2026-03-13 04:00:07 +01:00 |
|
|
|
de3ef465f0
|
add qwen3 no thinking
|
2026-03-13 04:00:07 +01:00 |
|