Prepare scripts to move drbd primary #1

Open
opened 2025-12-08 12:51:21 +00:00 by Ghostinvisible-forgejo-org · 6 comments

See also https://invisible.forgejo.org/infrastructure/k8s-cluster/src/branch/main/k8s-maintenance.md

Steps

  1. drain k3s node
    • remove forgejo.org/drbd=primary label
    • drain node
  2. move nfs and move drbd
  3. add forgejo.org/drbd=primary label to new primary
See also https://invisible.forgejo.org/infrastructure/k8s-cluster/src/branch/main/k8s-maintenance.md # Steps 1. drain k3s node - remove `forgejo.org/drbd=primary` label - drain node 2. move nfs and move drbd 3. add `forgejo.org/drbd=primary` label to new primary

this way we don't need to move all instances to nfs before maintenace. We need to drain anyways if nfs stops because of drbd move.

this way we don't need to move all instances to nfs before maintenace. We need to drain anyways if nfs stops because of drbd move.

Maybe it's easier to just scale down deployments / statefulsets / ..., which have pvc on drbd / nfs.
That ways it's more easy. 🤔

Maybe it's easier to just scale down deployments / statefulsets / ..., which have pvc on drbd / nfs. That ways it's more easy. 🤔

@earl-warren can we at least have a more extensive description on how to proceed this?
Especially when we also need to move the floating ip.

@earl-warren can we at least have a more extensive description on how to proceed this? Especially when we also need to move the floating ip.

I added a link to https://invisible.forgejo.org/infrastructure/k8s-cluster/src/branch/main/k8s-maintenance.md in the description to remember where those improvements should go.

I added a link to https://invisible.forgejo.org/infrastructure/k8s-cluster/src/branch/main/k8s-maintenance.md in the description to remember where those improvements should go.

I thought of planning a disaster recovery exercise and improve the documentation at the same time? End of July would work for me (i.e. after the v12 release is published and the dust settles). What do you think?

I thought of planning a disaster recovery exercise and improve the documentation at the same time? End of July would work for me (i.e. after the v12 release is published and the dust settles). What do you think?

sounds good

sounds good
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
infrastructure/k8s-cluster#1
No description provided.