11 KiB
11 KiB
Change Log
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
Unreleased
[0.8.7] - 2020-02-18
Added
- Add memory read error.
- Add support for building NPD on MacOS.
- Add support for containerd health check.
- Add metric for
per-cpu,per-stagetiming.
Fixed
- Fix an issue that kubelet may be restarted by NPD health checker unexpectedly. Make log pattern check configurable in health checker.
- Exit the process when there is a timeout in plugin daemon
0.8.6 - 2020-01-22
Added
- Windows build now supported.
- Added metrics to retrieve stats such as
procs_runningandprocs_blocked. - Added metrics to retrieve network stats.
- Added metric to retrieve guest OS features such as unknown modules, ktd, and kernel integrity.
Changed
- Print result's message when status is unknown.
Fixed
- Fixed custom plugin command timeout when the command spawns a long running child process.
0.8.5 - 2020-11-18
Added
- Added problem detection for buffer I/O error.
- Added CPU load average metrics support.
- Added kubelet apiserver connection check in health checker.
Changed
- Will now catch hung task with pattern like
tasks airflow scheduler: *. - Better handling to avoid duplicating disk bytes metrics used on fstype and mount types.
Fixed
- Fixed the deployment yaml to prevent NPD from scheduling onto windows nodes.
- Fixed memory unit for
/proc/meminfometrics. - Fixed OOMKilling detection for new linux kernel v5.1+.
0.8.4 - 2020-09-01
Added
- Added
FSTypeandMountOptionas labels to the metricdisk_usage_bytes. - Added
DockerContainerStartupFailureevent indocker-monitor.jsonto detect docker issue docker/for-linux#647.
Fixed
- Reduced log spam generated by the custom plugin monitor.
0.8.3 - 2020-06-30
Added
health-checkerbinary now included in the docker image.
Changed
--enable-repair=trueis now the default for docker and kubelet health checker.- Custom plugin will now only generate status update log when the status has changed.
- Limit the size of custom plugin output to 4kb, extra output will be drained and discarded.
Fixed
- Fix a race condition that services may be killed periodically when
--enable-repair=true, and systemd service restart time equals the health check period.
0.8.2 - 2020-05-28
Added
- Added an
--event-namespaceflag to make event namespace configurable. - Added
rhelsupport in OS version. - Added
health-checkeras a custom plugin. Thehealth-checkercan be used to monitor healthiness of kubelet, docker and CRI container runtimes (e.g.containerd,cri-o) and restart them if they are not healthy ifenable-repairis turned on.
Fixed
- #420 Added
missing
lsblkto the container image.
0.8.1 - 2020-02-25
Added
- Added
host_uptimemetrics for CentOS. - Now collecting a lot more useful CPU/disk/memory metrics.
Changed
- Improved
network_problem.shto supportnf_conntrackand report error when conntrack table is 90% full.
Fixed
- #366 Fixed
building with
ENABLE_JOURNALD=0. - Fixed the first 0 value metrics reported for
disk_avg_queue_len. - Fix a few metric units for disk metrics and the calculation for
disk_avg_queue_len.
0.8.0 - 2019-10-30
Added
- Added Stackdriver exporter.
- Added a
k8s-exporter-heartbeat-periodflag to make the heart beat period of K8s exporter configurable.
Changed
- Changed the default heart beat period of K8s exporter from
1mto5m.
Fixed
- Addressed an issue with a panic caused by closing an already closed channel.
- Fixed several potential busy loops.
0.7.1 - 2019-08-27
Added
- Added validation that permanent problems habe a preset default condition.
Changed
- Empty LogPath will now use journald's default path.
- Systemd monitor now looks back 5 minutes.
- Bumped base image to
registry.k8s.io/debian-base-amd64:1.0.0. - Updated the detection method for docker overlay2 issues.
- Moved NPD into the kube-system namespace.
Fixed
- #202 Fixed an issue that condition can't switch back to false for custom plugins.
0.7.0 - 2019-07-25
Added
- Added a system stats monitor is added into NPD as a new problem daemon. It
collects useful node problem related system stats with OpenCensus such as
disk/io_time,disk/weighted_ioanddisk/avg_queue_len. - Besides node condition and events, problems detected by existing problem
daemons are also collected into OpenCensus as metrics:
problem_counter{reason="PROBLEM_REASON"} xxxfor events andproblem_gauge{reason="PROBLEM_REASON",type="PROBLEM_TYPE"} 1 or 0for conditions. - A Prometheus exporter is added to export all OpenCensus metrics collected by NPD through Prometheus.
- A plugin system for problem daemons is added. Problem daemons can be disabled
at compile time with build tags, such as
disable_system_stats_monitor,disable_system_log_monitoranddisable_custom_plugin_monitor. - A problem exporter interface is added. The original kubernetes problem
reporting logic was moved into
k8sexporter. Prometheus support is implemented asprometheusexporter.
0.6.6 - 2019-08-13
Changed
- Updated the detection method for docker overlay2 issues.
Fixed
- #202 Fixed an issue that condition can't switch back to false for custom plugins.
0.6.5 - 2019-07-24
Fixed
- #295 Added configurable timeout to wait for apiserver to be ready before starting problem detection.
0.6.4 - 2019-06-13
Changed
- Switch from godep to go modules resulting in bumping versions of many dependencies.
- Changed custom plugin handling to run immediately on startup.
Fixed
- #269 Fixed
issue so that using
--versionshould not require monitors to be specified.
0.6.3 - 2019-04-05
Added
- Added better handling and reporting when missing required flags.
Fixed
- Disabled glog writing to files for the log-counter plugin.
0.6.2 - 2019-01-07
Added
- Added resource limites to NPD deployment.
- Added log-counter to dockerfile.
- Added
enable_message_change_based_condition_updateoption to enable condition update when messages change for custom plugin.
Fixed
- #232 Explicitly include libsystemd0 in the image.
0.6.1 - 2018-11-28
Changed
- Bumped base image to
registry.k8s.io/debian-base-amd64:0.4.0.
0.6.0 - 2018-11-27
Added
- Added ConfigMap for NPD config.
- Added readonly filesystem detection.
- Added frequent kubelet/docker restart detection.
- Added corrupt docker overlay2 issue detection.
Changed
- Bumped Kubernetes client version to 1.9.
- Updated OOMKilling pattern to support new kernel.
0.5.0 - 2018-06-22
Added
- Added custom problem detector plugin interface.
- Added custom network plugin monitor.
- Added a kernel log counter custom problem detector to detect problems which have the same pattern.
Changed
- Changed default port from 10256 to 20256 to avoid conflict with kube-proxy.
- Bumped golang version from 1.8 to 1.9.
- Bumped base image to
registry.k8s.io/debian-base-amd64:0.3.
Fixed
- Fixed an error in the labels applied to the daemonset label selector.
0.4.1 - 2017-06-21
Added
- Added docker image pull error detection.
0.4.0 - 2017-04-31
Added
- Added "kernel log generator" container for test purposes.
- Added ABRT adaptor config.
0.3.0 - 2017-03-15
Added
- Added look back support in kernel monitor. Kernel monitor will look back for specified amount of time to detect old problems during each start or restart.
- Added support for running node-problem-detector standalone.
- Added
-hostname-overrideoption to provide custom node name. - Added
-portoption to provide custom listening port for service. - Added
-addressoption to define binding address. - Added journald support.
- Added travis presubmit test.
- Added arbitrary system log support.
Changed
- Update kubernetes version to v1.4.0-beta.3
Fixed
- Only change transition timestamp when condition has changed.
- #47 Don't
report KernelDeadlock on
unregister_netdeviceevent. - #48 Use system boot time instead of "StartPattern".
0.2.0 - 2016-08-23
Added
- Add support for some kernel oops detection.
Changed
- Change NPD to get node name from
NODE_NAMEenv first beforeos.Hostname, and update the example to get node name from downward api and setNODE_NAME.
0.1.0 - 2016-06-09
Added
- Initial version of node problem detector.