diff --git a/README.md b/README.md index fa047e8..6073881 100644 --- a/README.md +++ b/README.md @@ -1,106 +1,203 @@ # Homelab -## Goals +This repo contains configuration and documentation for my homelab setup, which is based on Talos OS for Kubernetes cluster and MikroTik router. -Wanting to set up homelab kubernetes cluster. +## Architecture -### Software +Physical setup consists of MikroTik router which connects to the internet and serves as a gateway for the cluster and other devices in the home network as shown in the diagram below. -1. Running applications - 1. NAS, backups, security recorder - 2. Online presence, website, email, communicators (ts3, matrix?) - 3. Git server, container registry - 4. Environment to deploy my own apps - 5. Some LLM server, apps for my own use - 6. Public services like Tor, mirrors of linux distros etc. - 7. [Some frontends](https://libredirect.github.io/) - 8. [Awesome-Selfhosted](https://github.com/awesome-selfhosted/awesome-selfhosted), [Awesome Sysadmin](https://github.com/awesome-foss/awesome-sysadmin) -2. Managing them hopefully using GitOps - 1. FluxCD, Argo etc. - 2. State of cluster in git, all apps version pinned - 3. Some bot to inform about updates? -3. It's a home**lab** - 1. Should be open to experimenting - 2. Avoiding vendor lock-in, changing my mind shouldn't block me for too long - 3. Backups of important data in easy to access format - 4. Expecting downtime, no critical workloads - 5. Trying to keep it reasonably up anyways +```mermaid +%%{init: {"flowchart": {"ranker": "tight-tree"}}}%% +flowchart TD -### Infrastructure + subgraph internet[Internet] + ipv4[IPv4 Internet] + ipv6[IPv6 Internet] + he_tunnel[Hurricane Electric IPv6 Tunnel Broker] + isp[ISP] + end -1. Using commodity hardware -2. Reasonably scalable -3. Preferably mobile workloads, software should be a bit more flexible than me moving disks and data -4. Replication is overkill for most data -5. Preferably dynamically configured network - 1. BGP with OpenWRT router - 2. Dynamically allocated host subnets - 3. Load-balancing (MetalLB?), ECMP on router - 4. Static IP configurations on nodes -6. IPv6 native, IPv4 accessible - 1. IPv6 has whole block routed to us which gives us control over address routing and usage - 2. Which allows us to expose services directly to the internet without complex router config - 3. Which allows us to use eg. ExternalDNS to autoconfigure domain names for LB - 4. But majority of the world still runs IPv4, which should be supported for public services - 5. Exposing IPv4 service may require additional reconfiguration of router, port forwarding, manual domain setting or controller doing this some day in future - 6. One public IPv4 address means probably extensive use of rule-based ingress controllers - 7. IPv6 internet from pods should not be NATed - 8. IPv4 internet from pods should be NATed by router - -### Current implementation idea - -1. Cluster server nodes running Talos -2. OpenWRT router - 1. VLAN / virtual interface, for cluster - 2. Configuring using Ansible - 3. Peering with cluster using BGP - 4. Load-balancing using ECMP -3. Cluster networking - 1. Cilium CNI - 2. Native routing, no encapsulation or overlay - 3. Using Cilium's network policies for firewall needs - 4. IPv6 address pool - 1. Nodes: 2001:470:61a3:100::/64 - 2. Pods: 2001:470:61a3:200::/64 - 3. Services: 2001:470:61a3:300::/112 - 4. Load balancer: 2001:470:61a3:400::/112 - 5. IPv4 address pool - 1. Nodes: 192.168.1.32/27 - 2. Pods: 10.42.0.0/16 - 3. Services: 10.43.0.0/16 - 4. Load balancer: 10.44.0.0/16 -4. Storage - 1. OS is installed on dedicated disk - 2. Mayastor managing all data disks - 1. DiskPool for each data disk in cluster, labelled by type SSD or HDD - 2. Creating StorageClass for each topology need (type, whether to replicate, on which node etc.) - -## Working with repo - -Repo is preconfigured to use with nix and vscode - -Install nix, vscode should pick up settings and launch terminals in `nix develop` with all needed utils. - -## Bootstrapping cluster - -1. Configure OpenWRT, create dedicated interface for connecting server - 1. Set up node subnet, routing - 2. Create static host entry `kube-api.homelab.lumpiasty.xyz` pointing at ipv6 of first node -2. Connect server -3. Grab Talos ISO, dd it to usb stick -4. Boot it and using keyboard set up static ip ipv6 subnet, should become reachable from pc -5. `talosctl gen config homelab https://kube-api.homelab.lumpiasty.xyz:6443` -6. Generate secrets `talosctl gen secrets`, **backup, keep `secrets.yml` safe** -7. Generate config files `make gen-talos-config` -8. Apply config to first node `talosctl apply-config --insecure -n 2001:470:61a3:100::2 -f controlplane.yml` -9. Wait for reboot then `talosctl bootstrap --talosconfig=talosconfig -n 2001:470:61a3:100::2` -10. Set up router and CNI - -## Updating Talos config - -Update patches and re-generate and apply configs. + subgraph home[Home network] + router[MikroTik Router] + cluster[Talos cluster] + lan[LAN] + mgmt[Management network] + cam[Camera system] + router --> lan + router --> cluster + router --> mgmt + router --> cam + end + ipv4 -- "Public IPv4 address" --> isp + ipv6 -- "Routed /48 IPv6 prefix" --> he_tunnel -- "6in4 Tunnel" --> isp + isp --> router ``` -make gen-talos-config -make apply-talos-config + +Devices are separated into VLANs and subnets for isolation and firewalling between devices and services. Whole internal network is configured to eliminate NAT where unnecessary. Pods on the Kubernetes cluster communicate with the router using native IP routing, there is no encapsulation, overlay network nor NAT on the nodes. Router knows where to direct packets destined for the pods because the cluster announces its IP prefixes to the router using BGP. Router also performs NAT for IPv4 traffic from the cluster to and from the internet, while IPv6 traffic is routed directly to the internet without NAT. High level logical routing diagram is shown below. + +```mermaid +flowchart TD + isp[ISP] --- gpon + + subgraph device[MikroTik CRS418-8P-8G-2s+] + direction TB + gpon[SFP GPON ONU] + pppoe[PPPoE client] + + he_tunnel[HE Tunnel] + + router[Router]@{ shape: cyl } + + dockers[""" + Dockers Containers (bridge) + 2001:470:61a3:500::/64 + 172.17.0.0/16 + """]@{ shape: cloud } + tailscale["Tailscale Container"] + + lan[""" + LAN (vlan2) + 2001:470:61a3::/64 + 192.168.0.0/24 + """]@{ shape: cloud } + + mgmt[""" + Management network (vlan1) + 192.168.255.0/24 + """]@{ shape: cloud } + + cam[""" + Camera system (vlan3) + 192.168.3.0/24 + """]@{ shape: cloud } + + cluster[""" + Kubernetes cluster (vlan4) + 2001:470:61a3:100::/64 + 192.168.1.0/24 + """]@{ shape: cloud } + + gpon --- pppoe -- """ + 139.28.40.212 + Default IPv4 gateway + """ --- router + + pppoe --- he_tunnel -- """ + 2001:470:61a3:: incoming + Default IPv6 gateway + """ --- router + + router -- """ + 2001:470:61a3:500:ffff:ffff:ffff:ffff + 172.17.0.1/16 + """ --- dockers --- tailscale + + router -- """ + 2001:470:61a3:0:ffff:ffff:ffff:ffff + 192.168.0.1 + """--- lan + + router -- """ + 192.168.255.10 + """--- mgmt + + router -- "192.168.3.1" --- cam + router -- """ + 2001:470:61a3:100::1 + 192.168.1.1 + """ --- cluster + + end + + subgraph k8s[K8s cluster] + direction TB + pod_network[""" + Pod networks + 2001:470:61a3:200::/104 + 10.42.0.0/16 + (Dynamically allocated /120 IPv6 and /24 IPv4 prefixes per node) + """]@{ shape: cloud } + + service_network[""" + Service network + 2001:470:61a3:300::/112 + 10.43.0.0/16 + (Advertises vIP addresses via BGP from nodes hosting endpoints) + """]@{ shape: cloud } + + load_balancer[""" + Load balancer network + 2001:470:61a3:400::/112 + 10.44.0.0/16 + (Advertises vIP addresses via BGP from nodes hosting endpoints) + """]@{ shape: cloud } + end + + cluster -- "Routes exported via BGP" ----- k8s ``` + +Currently the k8s cluster consists of single node (hostname anapistula-delrosalae), which is a PC with Ryzen 5 3600, 64GB RAM, RX 580 8GB (for accelerating LLMs), 1TB NVMe SSD, 2TB and 3TB HDDs and serves both as control plane and worker node. + +## Software stack + +The cluster itself is based on [Talos Linux](https://www.talos.dev/) (which is also a Kubernetes distribution) and uses [Cilium](https://cilium.io/) as CNI, IPAM, kube-proxy replacement, Load Balancer, and BGP control plane. Persistent volumes are managed by [OpenEBS LVM LocalPV](https://openebs.io/docs/user-guides/local-storage-user-guide/local-pv-lvm/lvm-overview). Applications are deployed using GitOps (this repo) and reconciled on cluster using [Flux](https://fluxcd.io/). Git repository is hosted on [Gitea](https://gitea.io/) running on a cluster itself. Secets are kept in [OpenBao](https://openbao.org/) (HashiCorp Vault fork) running on a cluster and synced to cluster objects using [Vault Secrets Operator](https://github.com/hashicorp/vault-secrets-operator). Deployments are kept up to date using self hosted [Renovate](https://www.mend.io/renovate/) bot updating manifests in the Git repository. Incoming HTTP traffic is routed to cluster using [Nginx Ingress Controller](https://kubernetes.github.io/ingress-nginx/) and certificates are issued by [cert-manager](https://cert-manager.io/) with [Let's Encrypt](https://letsencrypt.org/) ACME issuer with [cert-manager-webhook-ovh](https://github.com/aureq/cert-manager-webhook-ovh) resolving DNS-01 challanges. Cluster also runs [CloudNativePG](https://cloudnative-pg.io/) operator for managing PostgreSQL databases. High level core cluster software architecture is shown on the diagram below. + +```mermaid +flowchart TD + router[MikroTik Router] + router -- "Routes HTTP traffic" --> nginx + cilium -- "Announces routes via BGP" --> router + subgraph cluster[K8s cluster] + direction TB + flux[Flux CD] -- "Reconciles manifests" --> kubeapi[Kube API Server] + flux -- "Fetches Git repo" --> gitea[Gitea] + + + kubeapi -- "Configs, Services, Pods" --> cilium[Cilium] + cilium -- "Routing" --> services[Services] -- "Endpoints" --> pods[Pods] + cilium -- "Configures routing, interfaces, IPAM" --> pods[Pods] + + + kubeapi -- "Ingress rules" --> nginx[NGINX Ingress Controller] -- "Routes HTTP traffic" ---> pods + + kubeapi -- "Certificate requests" --> cert_manager[cert-manager] -- "Provides certificates" --> nginx + cert_manager -- "ACME DNS-01 challanges" --> dns_webhook[cert-manager-webhook-ovh] -- "Resolves DNS challanges" --> ovh[OVH DNS] + cert_manager -- "Requests DNS-01 challanges" --> acme[Let's Encrypt ACME server] -- "Verifies domain ownership" --> ovh + + kubeapi -- "Assigns pods" --> kubelet[Kubelet] -- "Manages" --> pods + + kubeapi -- "PVs, LvmVols" --> openebs[OpenEBS LVM LocalPV] + openebs -- "Mounts volumes" --> pods + openebs -- "Manages" --> lv[LVM LVs] + + kubeapi -- "Gets Secret refs" --> vault_operator[Vault Secrets Operator] -- "Syncs secrets" --> kubeapi + vault_operator -- "Retrieves secrets" --> vault[OpenBao] -- "Secret storage" --> lv + vault -- "Auth method" --> kubeapi + + gitea -- "Stores repositories" --> lv + + gitea --> renovate[Renovate Bot] -- "Updates manifests" --> gitea + + + end +``` + + + +## Applications / Services + +| Logo | Name | Address | Description | +|------|------|---------|-------------| +| Gitea | Gitea | https://gitea.lumpiasty.xyz/ | Private Git repository hosting and artifact storage (Docker, Helm charts) | +| OpenBao | OpenBao | https://openbao.lumpiasty.xyz:8200/ | Secret storage (HashiCorp Vault compatible) | +| Renovate | Renovate | | Bot for keeping dependencies up to date | +| cert-manager | cert-manager | | Automatic TLS certificate management | +| Nginx | Nginx Ingress Controller | | Ingress controller for routing external traffic to services in the cluster | +| CloudNativePG | CloudNativePG | | PostgreSQL operator for managing PostgreSQL instances | +| Immich | Immich | https://immich.lumpiasty.xyz/ | Self-hosted photo and video backup and streaming service | +| iSpeak3 | iSpeak3.pl | [ts3server://ispeak3.pl](ts3server://ispeak3.pl) | Public TeamSpeak 3 voice communication server | +| LLaMA.cpp | LLaMA.cpp | https://llama.lumpiasty.xyz/ | LLM inference server running local models with GPU acceleration | +| Open WebUI | Open WebUI | https://openwebui.lumpiasty.xyz/ | Web UI for chatting with LLMs running on the cluster | +| Frigate | Frigate | https://frigate.lumpiasty.xyz/ | NVR for camera system with AI object detection and classification | + diff --git a/devenv.nix b/devenv.nix index 21d7479..fa779c7 100644 --- a/devenv.nix +++ b/devenv.nix @@ -40,6 +40,7 @@ in restic openbao pv-migrate + mermaid-cli ]; # Scripts diff --git a/docs/assets/cert-manager.svg b/docs/assets/cert-manager.svg new file mode 100644 index 0000000..31646eb --- /dev/null +++ b/docs/assets/cert-manager.svg @@ -0,0 +1,211 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/assets/cloudnativepg.svg b/docs/assets/cloudnativepg.svg new file mode 100644 index 0000000..0d48a57 --- /dev/null +++ b/docs/assets/cloudnativepg.svg @@ -0,0 +1,22 @@ + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/assets/frigate.svg b/docs/assets/frigate.svg new file mode 100644 index 0000000..3d01f2a --- /dev/null +++ b/docs/assets/frigate.svg @@ -0,0 +1,3 @@ + + + diff --git a/docs/assets/gitea.svg b/docs/assets/gitea.svg new file mode 100644 index 0000000..4329134 --- /dev/null +++ b/docs/assets/gitea.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/assets/immich.svg b/docs/assets/immich.svg new file mode 100644 index 0000000..376fa6f --- /dev/null +++ b/docs/assets/immich.svg @@ -0,0 +1,29 @@ + + + + + + + + + + + + diff --git a/docs/assets/llama-cpp.svg b/docs/assets/llama-cpp.svg new file mode 100644 index 0000000..dcbe9cc --- /dev/null +++ b/docs/assets/llama-cpp.svg @@ -0,0 +1,87 @@ + + + + + + + + + + + + + + + + + + diff --git a/docs/assets/nginx.svg b/docs/assets/nginx.svg new file mode 100644 index 0000000..27062a8 --- /dev/null +++ b/docs/assets/nginx.svg @@ -0,0 +1,2 @@ + +file_type_nginx \ No newline at end of file diff --git a/docs/assets/open-webui.png b/docs/assets/open-webui.png new file mode 100644 index 0000000..10c84f4 Binary files /dev/null and b/docs/assets/open-webui.png differ diff --git a/docs/assets/openbao.svg b/docs/assets/openbao.svg new file mode 100644 index 0000000..5187590 --- /dev/null +++ b/docs/assets/openbao.svg @@ -0,0 +1,8 @@ + + + + + + + + \ No newline at end of file diff --git a/docs/assets/renovate.svg b/docs/assets/renovate.svg new file mode 100644 index 0000000..c45aa45 --- /dev/null +++ b/docs/assets/renovate.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/docs/assets/teamspeak.svg b/docs/assets/teamspeak.svg new file mode 100644 index 0000000..351cedf --- /dev/null +++ b/docs/assets/teamspeak.svg @@ -0,0 +1,24 @@ + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file