开源软件名称(OpenSource Name):weaveworks/kured开源软件地址(OpenSource Url):https://github.com/weaveworks/kured开源编程语言(OpenSource Language):Go 88.6%开源软件介绍(OpenSource Introduction):kured - Kubernetes Reboot Daemon
IntroductionKured (KUbernetes REboot Daemon) is a Kubernetes daemonset that performs safe automatic node reboots when the need to do so is indicated by the package management system of the underlying OS.
Kubernetes & OS CompatibilityThe daemon image contains versions of
See the release notes for specific version compatibility information, including which combination have been formally tested. Versions >=1.1.0 enter the host mount namespace to invoke
InstallationTo obtain a default installation without Prometheus alerting interlock or Slack notifications: latest=$(curl -s https://api.github.com/repos/weaveworks/kured/releases | jq -r .[0].tag_name)
kubectl apply -f "https://github.com/weaveworks/kured/releases/download/$latest/kured-$latest-dockerhub.yaml" If you want to customise the installation, download the manifest and edit it in accordance with the following section before application. ConfigurationThe following arguments can be passed to kured via the daemonset pod template: Kubernetes Reboot Daemon
Usage:
kured [flags]
Flags:
--alert-filter-regexp regexp.Regexp alert names to ignore when checking for active alerts
--alert-firing-only only consider firing alerts when checking for active alerts
--annotate-nodes if set, the annotations 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' will be given to nodes undergoing kured reboots
--blocking-pod-selector stringArray label selector identifying pods whose presence should prevent reboots
--drain-grace-period int time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default -1)
--drain-timeout duration timeout after which the drain is aborted (default: 0, infinite time)
--ds-name string name of daemonset on which to place lock (default "kured")
--ds-namespace string namespace containing daemonset on which to place lock (default "kube-system")
--end-time string schedule reboot only before this time of day (default "23:59:59")
--force-reboot force a reboot even if the drain fails or times out
-h, --help help for kured
--lock-annotation string annotation in which to record locking node (default "weave.works/kured-node-lock")
--lock-release-delay duration delay lock release for this duration (default: 0, disabled)
--lock-ttl duration expire lock annotation after this duration (default: 0, disabled)
--log-format string use text or json log format (default "text")
--message-template-drain string message template used to notify about a node being drained (default "Draining node %s")
--message-template-reboot string message template used to notify about a node being rebooted (default "Rebooting node %s")
--message-template-uncordon string message template used to notify about a node being successfully uncordoned (default "Node %s rebooted & uncordoned successfully!")
--node-id string node name kured runs on, should be passed down from spec.nodeName via KURED_NODE_ID environment variable
--notify-url string notify URL for reboot notifications (cannot use with --slack-hook-url flags)
--period duration sentinel check period (default 1h0m0s)
--post-reboot-node-labels strings labels to add to nodes after uncordoning
--pre-reboot-node-labels strings labels to add to nodes before cordoning
--prefer-no-schedule-taint string Taint name applied during pending node reboot (to prevent receiving additional pods from other rebooting nodes). Disabled by default. Set e.g. to "weave.works/kured-node-reboot" to enable tainting.
--prometheus-url string Prometheus instance to probe for active alerts
--reboot-command string command to run when a reboot is required (default "/bin/systemctl reboot")
--reboot-days strings schedule reboot on these days (default [su,mo,tu,we,th,fr,sa])
--reboot-delay duration delay reboot for this duration (default: 0, disabled)
--reboot-sentinel string path to file whose existence triggers the reboot command (default "/var/run/reboot-required")
--reboot-sentinel-command string command for which a zero return code will trigger a reboot command
--skip-wait-for-delete-timeout int when seconds is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node
--slack-channel string slack channel for reboot notifications
--slack-hook-url string slack hook URL for reboot notifications [deprecated in favor of --notify-url]
--slack-username string slack username for reboot notifications (default "kured")
--start-time string schedule reboot only after this time of day (default "0:00")
--time-zone string use this timezone for schedule inputs (default "UTC") Reboot Sentinel File & PeriodBy default kured checks for the existence of
Reboot Sentinel CommandAlternatively, a reboot sentinel command can be used. If a reboot
sentinel command is used, the reboot sentinel file presence will be
ignored. When the command exits with code For example, if you're using RHEL or its derivatives, you can
set the sentinel command to configuration:
rebootSentinelCommand: sh -c "! needs-restarting --reboothint" Setting a scheduleBy default, kured will reboot any time it detects the sentinel, but this
may cause reboots during odd hours. While service disruption does not
normally occur, anything is possible and operators may want to restrict
reboots to predictable schedules. Use --reboot-days=mon,tue,wed,thu,fri
--start-time=9am
--end-time=5pm
--time-zone=America/Los_Angeles Times can be formatted in numerous ways, including Note that when using smaller time windows, you should consider shortening
the sentinel check period ( Blocking Reboots via AlertsYou may find it desirable to block automatic node reboots when there are active alerts - you can do so by providing the URL of your Prometheus server: --prometheus-url=http://prometheus.monitoring.svc.cluster.local By default the presence of any active (pending or firing) alerts will block reboots, however you can ignore specific alerts: --alert-filter-regexp=^(RebootRequired|AnotherBenignAlert|...$ You can also only block reboots for firing alerts: --alert-firing-only=true See the section on Prometheus metrics for an important application of this filter. Blocking Reboots via PodsYou can also block reboots of an individual node when specific pods are scheduled on it: --blocking-pod-selector=runtime=long,cost=expensive Since label selector strings use commas to express logical 'and', you can specify this parameter multiple times for 'or': --blocking-pod-selector=runtime=long,cost=expensive
--blocking-pod-selector=name=temperamental In this case, the presence of either an (appropriately labelled) expensive long running job or a known temperamental pod on a node will stop it rebooting.
Adding node labels before and after rebootsIf you need to add node labels before and after the reboot process, you can use --pre-reboot-node-labels=zalando=notready
--post-reboot-node-labels=zalando=ready Labels can be comma-delimited (e.g. Note that label keys specified by these two flags should match. If they do not match, a warning will be generated. Prometheus MetricsEach kured pod exposes a single gauge metric ( # HELP kured_reboot_required OS requires reboot due to software updates.
# TYPE kured_reboot_required gauge
kured_reboot_required{node="ip-xxx-xxx-xxx-xxx.ec2.internal"} 0 The purpose of this metric is to power an alert which will summon an operator if the cluster cannot reboot itself automatically for a prolonged period: # Alert if a reboot is required for any machines. Acts as a failsafe for the
# reboot daemon, which will not reboot nodes if there are pending alerts save
# this one.
ALERT RebootRequired
IF max(kured_reboot_required) != 0
FOR 24h
LABELS { severity="warning" }
ANNOTATIONS {
summary = "Machine(s) require being rebooted, and the reboot daemon has failed to do so for 24 hours",
impact = "Cluster nodes more vulnerable to security exploits. Eventually, no disk space left.",
description = "Machine(s) require being rebooted, probably due to kernel update.",
} If you choose to employ such an alert and have configured kured to
probe for active alerts before rebooting, be sure to specify
NotificationsWhen you specify a formatted URL using Alternatively you can use the
Here is the syntax:
More details here: containrrr.dev/shoutrrr/v0.5/services/overview Overriding Lock ConfigurationThe Similarly OperationThe example commands in this section assume that you have not overriden the default lock annotation, daemonset name or namespace; if you have, you will have to adjust the commands accordingly. TestingYou can test your configuration by provoking a reboot on a node: sudo touch /var/run/reboot-required Disabling RebootsIf you need to temporarily stop kured from rebooting any nodes, you can take the lock manually: kubectl -n kube-system annotate ds kured weave.works/kured-node-lock='{"nodeID":"manual"}' Don't forget to release it afterwards! Manual UnlockIn exceptional circumstances, such as a node experiencing a permanent failure whilst rebooting, manual intervention may be required to remove the cluster lock: kubectl -n kube-system annotate ds kured weave.works/kured-node-lock-
Automatic UnlockIn exceptional circumstances (especially when used with cluster-autoscaler) a node which holds lock might be killed thus annotation will stay there for ever. Using Delaying Lock ReleaseUsing BuildingKured now uses Go Modules, so build instructions vary depending on where you have checked out the repository: Building outside $GOPATH: make Building inside $GOPATH: GO111MODULE=on make You can find the current preferred version of Golang in the go.mod file. If you are interested in contributing code to kured, please take a look at our development docs. Frequently Asked/Anticipated Questions
Why is there no |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论