Compare commits

...

506 Commits

Author SHA1 Message Date
Kubernetes Prow Robot dea6d70d46
Merge pull request #1058 from kubernetes/dependabot/go_modules/test/k8s-c425b6bb1e
chore(deps): bump the k8s group across 1 directory with 2 updates
2025-06-18 11:34:51 -07:00
Kubernetes Prow Robot 9d69c8e71a
Merge pull request #1069 from kubernetes/dependabot/github_actions/actions-all-615e0d520e
chore(deps): bump the actions-all group across 1 directory with 4 updates
2025-06-18 10:40:51 -07:00
dependabot[bot] 6bbddb55de
chore(deps): bump the actions-all group across 1 directory with 4 updates
Bumps the actions-all group with 4 updates in the / directory: [step-security/harden-runner](https://github.com/step-security/harden-runner), [github/codeql-action](https://github.com/github/codeql-action), [actions/dependency-review-action](https://github.com/actions/dependency-review-action) and [ossf/scorecard-action](https://github.com/ossf/scorecard-action).


Updates `step-security/harden-runner` from 2.12.0 to 2.12.1
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](0634a2670c...002fdce3c6)

Updates `github/codeql-action` from 3.28.17 to 3.29.0
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](60168efe1c...ce28f5bb42)

Updates `actions/dependency-review-action` from 4.7.0 to 4.7.1
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](38ecb5b593...da24556b54)

Updates `ossf/scorecard-action` from 2.4.1 to 2.4.2
- [Release notes](https://github.com/ossf/scorecard-action/releases)
- [Changelog](https://github.com/ossf/scorecard-action/blob/main/RELEASE.md)
- [Commits](f49aabe0b5...05b42c6244)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-version: 2.12.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: github/codeql-action
  dependency-version: 3.29.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
- dependency-name: actions/dependency-review-action
  dependency-version: 4.7.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: ossf/scorecard-action
  dependency-version: 2.4.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-16 22:33:01 +00:00
dependabot[bot] cd3b7503bb
chore(deps): bump the k8s group across 1 directory with 2 updates
Bumps the k8s group with 2 updates in the /test directory: [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) and [k8s.io/component-base](https://github.com/kubernetes/component-base).


Updates `k8s.io/apimachinery` from 0.33.0 to 0.33.1
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.33.0...v0.33.1)

Updates `k8s.io/component-base` from 0.32.3 to 0.32.5
- [Commits](https://github.com/kubernetes/component-base/compare/v0.32.3...v0.32.5)

---
updated-dependencies:
- dependency-name: k8s.io/apimachinery
  dependency-version: 0.33.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/component-base
  dependency-version: 0.32.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-16 22:14:26 +00:00
Kubernetes Prow Robot 9b473a0e56
Merge pull request #1061 from marqc/default_to_ipfamily_agnostic_localhost_host_address
feat!: Set default host address value to `localhost`.
2025-06-16 14:22:58 -07:00
Marek Chodor a765aaecf7 feat!: Set default host address value to `localhost`.
Usage of `localhost` is family agnostic and will work regardless if
cluster is IPv4 or IPv6. The current value of `127.0.0.1` only works
for IPv4 clusters.

BREAKING CHANGE: It may break in rare cases where `localhost` does not
resolve as `127.0.0.1` (if OS config does not follow RFC5735 and
RFC6761).
2025-06-06 10:30:26 +00:00
Kubernetes Prow Robot 9e366f58cd
Merge pull request #1066 from SergeyKanzhelev/updateApprovers
update sig node approvers to the current list
2025-06-04 14:48:37 -07:00
Sergey Kanzhelev fc10031a7e update sig node approvers to the current list 2025-06-04 20:57:43 +00:00
Kubernetes Prow Robot ca907dc101
Merge pull request #1064 from wangzhen127/release-0.8.21
Update version to v0.8.21
2025-06-02 22:02:37 -07:00
Zhen Wang 62223078ef Update version to v0.8.21 2025-06-02 14:48:31 -07:00
Kubernetes Prow Robot 9fe113c522
Merge pull request #1056 from kubernetes/dependabot/github_actions/actions-all-4016cb32aa
chore(deps): bump the actions-all group across 1 directory with 2 updates
2025-06-02 14:02:38 -07:00
dependabot[bot] dc065c42f0
chore(deps): bump the actions-all group across 1 directory with 2 updates
Bumps the actions-all group with 2 updates in the / directory: [github/codeql-action](https://github.com/github/codeql-action) and [actions/dependency-review-action](https://github.com/actions/dependency-review-action).


Updates `github/codeql-action` from 3.28.15 to 3.28.17
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](45775bd823...60168efe1c)

Updates `actions/dependency-review-action` from 4.6.0 to 4.7.0
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](ce3cf9537a...38ecb5b593)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: 3.28.17
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: actions/dependency-review-action
  dependency-version: 4.7.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-05-12 21:38:09 +00:00
Kubernetes Prow Robot 8d4eb38a42
Merge pull request #1053 from hakman/depup
Update dependencies
2025-04-28 08:47:55 -07:00
Ciprian Hacman 0147098968 Update dependencies
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
Signed-off-by: Ciprian Hacman <chacman@microsoft.com>
2025-04-28 17:41:41 +03:00
Kubernetes Prow Robot 308b7cfa4a
Merge pull request #1052 from kubernetes/dependabot/docker/golang-sha256d9db32125db0c3a680cfb7a1afcaefb89c898a075ec148fdc2f0f646cc2ed509
chore(deps): bump golang from 1.24-bookworm@sha256:fa1a01d362a7b9df68b021d59a124d28cae6d99ebd1a876e3557c4dd092f1b1d to sha256:d9db32125db0c3a680cfb7a1afcaefb89c898a075ec148fdc2f0f646cc2ed509
2025-04-21 22:41:42 -07:00
Kubernetes Prow Robot 1721f9dbf7
Merge pull request #1047 from kubernetes/dependabot/github_actions/actions-all-ffffe315b5
chore(deps): bump the actions-all group across 1 directory with 4 updates
2025-04-21 15:55:41 -07:00
dependabot[bot] 3f96666db7
chore(deps): bump golang
Bumps golang from 1.24-bookworm@sha256:fa1a01d362a7b9df68b021d59a124d28cae6d99ebd1a876e3557c4dd092f1b1d to sha256:d9db32125db0c3a680cfb7a1afcaefb89c898a075ec148fdc2f0f646cc2ed509.

---
updated-dependencies:
- dependency-name: golang
  dependency-version: sha256:d9db32125db0c3a680cfb7a1afcaefb89c898a075ec148fdc2f0f646cc2ed509
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-21 21:55:30 +00:00
dependabot[bot] 387571b357
chore(deps): bump the actions-all group across 1 directory with 4 updates
Bumps the actions-all group with 4 updates in the / directory: [step-security/harden-runner](https://github.com/step-security/harden-runner), [github/codeql-action](https://github.com/github/codeql-action), [actions/dependency-review-action](https://github.com/actions/dependency-review-action) and [actions/upload-artifact](https://github.com/actions/upload-artifact).


Updates `step-security/harden-runner` from 2.11.0 to 2.11.1
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](4d991eb9b9...c6295a65d1)

Updates `github/codeql-action` from 3.28.11 to 3.28.15
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](6bb031afdd...45775bd823)

Updates `actions/dependency-review-action` from 4.5.0 to 4.6.0
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](3b139cfc5f...ce3cf9537a)

Updates `actions/upload-artifact` from 4.6.1 to 4.6.2
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](4cec3d8aa0...ea165f8d65)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-version: 2.11.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: github/codeql-action
  dependency-version: 3.28.15
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: actions/dependency-review-action
  dependency-version: 4.6.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
- dependency-name: actions/upload-artifact
  dependency-version: 4.6.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-21 21:17:21 +00:00
Kubernetes Prow Robot aede9d7e7f
Merge pull request #1048 from hakman/depup
Update dependencies
2025-04-15 20:27:08 -07:00
Ciprian Hacman 0f1ee66855 Update dependencies
Signed-off-by: Ciprian Hacman <chacman@microsoft.com>
2025-04-14 16:52:20 +03:00
Kubernetes Prow Robot be0d387ec1
Merge pull request #1043 from kubernetes/dependabot/docker/golang-fa1a01d
chore(deps): bump golang from `d7d795d` to `fa1a01d`
2025-04-04 11:30:37 -07:00
Kubernetes Prow Robot 78f51bf173
Merge pull request #1039 from kubernetes/dependabot/go_modules/k8s-de55f699dd
chore(deps): bump the k8s group across 2 directories with 4 updates
2025-04-04 10:36:37 -07:00
dependabot[bot] 87129900cf
chore(deps): bump golang from `d7d795d` to `fa1a01d`
Bumps golang from `d7d795d` to `fa1a01d`.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-25 00:15:39 +00:00
dependabot[bot] 59c46ad62c
chore(deps): bump the k8s group across 2 directories with 4 updates
Bumps the k8s group with 2 updates in the / directory: [k8s.io/api](https://github.com/kubernetes/api) and [k8s.io/client-go](https://github.com/kubernetes/client-go).
Bumps the k8s group with 2 updates in the /test directory: [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) and [k8s.io/component-base](https://github.com/kubernetes/component-base).


Updates `k8s.io/api` from 0.31.6 to 0.31.7
- [Commits](https://github.com/kubernetes/api/compare/v0.31.6...v0.31.7)

Updates `k8s.io/apimachinery` from 0.31.6 to 0.31.7
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.6...v0.31.7)

Updates `k8s.io/client-go` from 0.31.6 to 0.31.7
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kubernetes/client-go/compare/v0.31.6...v0.31.7)

Updates `k8s.io/apimachinery` from 0.31.6 to 0.31.7
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.6...v0.31.7)

Updates `k8s.io/component-base` from 0.29.14 to 0.29.15
- [Commits](https://github.com/kubernetes/component-base/compare/v0.29.14...v0.29.15)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/client-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/component-base
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-24 22:00:26 +00:00
Kubernetes Prow Robot 4022575bf9
Merge pull request #1037 from guettli/patch-2
Removed draino from README
2025-03-24 11:56:33 -07:00
Kubernetes Prow Robot f6bb4f7b55
Merge pull request #846 from wenjianhn/uefi-cper
Add UEFI Common Platform Error Record (CPER) support
2025-03-11 22:19:46 -07:00
Jian Wen 5562632053 Add UEFI Common Platform Error Record (CPER) support
CPER is the format used to describe platform hardware error by various
tables, such as ERST, BERT and HEST etc.

The event severity message is printed here:
https://github.com/torvalds/linux/blob/v6.7/drivers/firmware/efi/cper.c#L639

Examples are as below.

Corrected error:
kernel: {37}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 162
kernel: {37}[Hardware Error]: It has been corrected by h/w and requires no further action
kernel: {37}[Hardware Error]: event severity: corrected
kernel: {37}[Hardware Error]:  Error 0, type: corrected
kernel: {37}[Hardware Error]:   section_type: memory error
kernel: {37}[Hardware Error]:   error_status: 0x0000000000000400
kernel: {37}[Hardware Error]:   physical_address: 0x000000b50c68ce80
kernel: {37}[Hardware Error]:   node: 1 card: 4 module: 0 rank: 0 bank: 1 device: 14 row: 58165 column: 816
kernel: {37}[Hardware Error]:   error_type: 2, single-bit ECC
kernel: {37}[Hardware Error]:   DIMM location: CPU 2 DIMM 30

Recoverable error:
kernel: {3}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 4
kernel: {3}[Hardware Error]: event severity: recoverable
kernel: {3}[Hardware Error]:  Error 0, type: recoverable
kernel: {3}[Hardware Error]:  fru_text: B1
kernel: {3}[Hardware Error]:   section_type: memory error
kernel: {3}[Hardware Error]:   error_status: 0x0000000000000400
kernel: {3}[Hardware Error]:   physical_address: 0x000000393cfe5040
kernel: {3}[Hardware Error]:   node: 2 card: 0 module: 0 rank: 0 bank: 3 device: 0 row: 34719 column: 320
kernel: {3}[Hardware Error]:   DIMM location: not present. DMI handle: 0x0000

Fatal error:
kernel: BERT: Error records from previous boot:
kernel: [Hardware Error]: event severity: fatal
kernel: [Hardware Error]:  Error 0, type: fatal
kernel: [Hardware Error]:  fru_text: DIMM B5
kernel: [Hardware Error]:   section_type: memory error
kernel: [Hardware Error]:   error_status: 0x0000000000000400
kernel: [Hardware Error]:   physical_address: 0x000000393d7e4040
kernel: [Hardware Error]:   node: 2 card: 4 module: 0 rank: 0 bank: 3 device: 0 row: 34743 column: 256

Steps to test the new metrics.

$ echo "kernel: {37}[Hardware Error]: event severity: corrected" | sudo tee /dev/kmsg
$ echo "kernel: {3}[Hardware Error]: event severity: recoverable" | sudo tee /dev/kmsg
$ echo "kernel: [Hardware Error]: event severity: fatal" | sudo tee /dev/kmsg

Expected metrics are as below:
$ curl localhost:20257/metrics
problem_counter{reason="CperHardwareErrorCorrected"} 1
problem_counter{reason="CperHardwareErrorFatal"} 1
problem_counter{reason="CperHardwareErrorRecoverable"} 1
...
problem_gauge{reason="CperHardwareErrorFatal",type="CperHardwareErrorFatal"} 1

Signed-off-by: Jian Wen <wenjianhn@gmail.com>
2025-03-12 11:00:50 +08:00
Thomas Güttler d6cfed982a
Removed draino from README
The draino project was not updated for 5 years.
2025-03-11 15:36:28 +01:00
Kubernetes Prow Robot 01e1cf033e
Merge pull request #1021 from nickbp/master
feat(k8sExporter): Options to allow disabling Events or Node Conditions
2025-03-10 23:57:48 -07:00
Kubernetes Prow Robot a099a5ed5c
Merge pull request #1024 from chrishenzie/metric-group
Move disk and memory metrics in custom group to compute
2025-03-10 22:43:48 -07:00
Kubernetes Prow Robot 5520e3df51
Merge pull request #1036 from kubernetes/dependabot/github_actions/actions-all-d1a25f988a
chore(deps): bump github/codeql-action from 3.28.10 to 3.28.11 in the actions-all group
2025-03-10 21:33:52 -07:00
Kubernetes Prow Robot c53e4f4308
Merge pull request #845 from wenjianhn/monitor-xfs
Monitor XFS shutdown
2025-03-10 21:33:46 -07:00
Kubernetes Prow Robot 2a651d1f98
Merge pull request #1034 from kubernetes/dependabot/docker/golang-d7d795d
chore(deps): bump golang from `6260304` to `d7d795d`
2025-03-10 16:39:55 -07:00
Kubernetes Prow Robot e8b584ab52
Merge pull request #1019 from kubernetes/dependabot/go_modules/github.com/spf13/pflag-1.0.6
chore(deps): bump github.com/spf13/pflag from 1.0.5 to 1.0.6
2025-03-10 16:39:48 -07:00
Kubernetes Prow Robot e858c3d1df
Merge pull request #1035 from daveoy/dy/994-2
chore(go.mod): update deps to fix high cve #994
2025-03-10 15:35:47 -07:00
dependabot[bot] 3bb752c25a
chore(deps): bump github/codeql-action in the actions-all group
Bumps the actions-all group with 1 update: [github/codeql-action](https://github.com/github/codeql-action).


Updates `github/codeql-action` from 3.28.10 to 3.28.11
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](b56ba49b26...6bb031afdd)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-10 22:03:05 +00:00
Dave Young 1ff64afbc9
Merge branch 'master' into dy/994-2 2025-03-10 16:54:48 -05:00
Dave Young f69e7033e9 chore(test): update go.mod deps for test 2025-03-10 16:49:08 -05:00
Dave Young f24ca57199 chore(go.mod): update deps for high cves #994 2025-03-10 16:48:56 -05:00
dependabot[bot] 32d7c72755
chore(deps): bump golang from `6260304` to `d7d795d`
Bumps golang from `6260304` to `d7d795d`.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-10 21:46:16 +00:00
dependabot[bot] 2707945338
chore(deps): bump github.com/spf13/pflag from 1.0.5 to 1.0.6
Bumps [github.com/spf13/pflag](https://github.com/spf13/pflag) from 1.0.5 to 1.0.6.
- [Release notes](https://github.com/spf13/pflag/releases)
- [Commits](https://github.com/spf13/pflag/compare/v1.0.5...v1.0.6)

---
updated-dependencies:
- dependency-name: github.com/spf13/pflag
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-10 21:41:12 +00:00
Kubernetes Prow Robot c846b0ebaa
Merge pull request #1027 from kubernetes/dependabot/docker/golang-1.24-bookworm
chore(deps): bump golang from 1.23-bookworm to 1.24-bookworm
2025-03-10 14:11:53 -07:00
Kubernetes Prow Robot 7039f066c7
Merge pull request #1020 from kubernetes/dependabot/go_modules/test/github.com/spf13/pflag-1.0.6
chore(deps): bump github.com/spf13/pflag from 1.0.5 to 1.0.6 in /test
2025-03-10 14:11:47 -07:00
Kubernetes Prow Robot 7ea55106c2
Merge pull request #1030 from kubernetes/dependabot/go_modules/github.com/avast/retry-go/v4-4.6.1
chore(deps): bump github.com/avast/retry-go/v4 from 4.6.0 to 4.6.1
2025-03-07 14:53:52 -08:00
Kubernetes Prow Robot 3b92e70bc1
Merge pull request #1026 from kubernetes/dependabot/go_modules/k8s-f2d2ccb244
chore(deps): bump the k8s group across 2 directories with 4 updates
2025-03-07 14:53:46 -08:00
Kubernetes Prow Robot 4b9d196acd
Merge pull request #1029 from kubernetes/dependabot/github_actions/actions-all-a9188bd723
chore(deps): bump the actions-all group across 1 directory with 4 updates
2025-03-07 14:17:45 -08:00
dependabot[bot] cf267168c2
chore(deps): bump the actions-all group across 1 directory with 4 updates
Bumps the actions-all group with 4 updates in the / directory: [step-security/harden-runner](https://github.com/step-security/harden-runner), [github/codeql-action](https://github.com/github/codeql-action), [ossf/scorecard-action](https://github.com/ossf/scorecard-action) and [actions/upload-artifact](https://github.com/actions/upload-artifact).


Updates `step-security/harden-runner` from 2.10.3 to 2.11.0
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](c95a14d0e5...4d991eb9b9)

Updates `github/codeql-action` from 3.28.1 to 3.28.10
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](b6a472f63d...b56ba49b26)

Updates `ossf/scorecard-action` from 2.4.0 to 2.4.1
- [Release notes](https://github.com/ossf/scorecard-action/releases)
- [Changelog](https://github.com/ossf/scorecard-action/blob/main/RELEASE.md)
- [Commits](62b2cac7ed...f49aabe0b5)

Updates `actions/upload-artifact` from 4.6.0 to 4.6.1
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](65c4c4a1dd...4cec3d8aa0)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: ossf/scorecard-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-03 22:01:21 +00:00
dependabot[bot] 416ec8b3c2
chore(deps): bump the k8s group across 2 directories with 4 updates
Bumps the k8s group with 2 updates in the / directory: [k8s.io/api](https://github.com/kubernetes/api) and [k8s.io/client-go](https://github.com/kubernetes/client-go).
Bumps the k8s group with 2 updates in the /test directory: [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) and [k8s.io/component-base](https://github.com/kubernetes/component-base).


Updates `k8s.io/api` from 0.31.4 to 0.31.6
- [Commits](https://github.com/kubernetes/api/compare/v0.31.4...v0.31.6)

Updates `k8s.io/apimachinery` from 0.31.4 to 0.31.6
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.4...v0.31.6)

Updates `k8s.io/client-go` from 0.31.4 to 0.31.6
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kubernetes/client-go/compare/v0.31.4...v0.31.6)

Updates `k8s.io/apimachinery` from 0.31.4 to 0.31.6
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.4...v0.31.6)

Updates `k8s.io/component-base` from 0.29.12 to 0.29.14
- [Commits](https://github.com/kubernetes/component-base/compare/v0.29.12...v0.29.14)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/client-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/component-base
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-03 21:57:13 +00:00
dependabot[bot] 7cb27449aa
chore(deps): bump github.com/avast/retry-go/v4 from 4.6.0 to 4.6.1
Bumps [github.com/avast/retry-go/v4](https://github.com/avast/retry-go) from 4.6.0 to 4.6.1.
- [Release notes](https://github.com/avast/retry-go/releases)
- [Commits](https://github.com/avast/retry-go/compare/4.6.0...4.6.1)

---
updated-dependencies:
- dependency-name: github.com/avast/retry-go/v4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-27 05:08:00 +00:00
Kubernetes Prow Robot 186f0182b5
Merge pull request #1031 from kubernetes/dependabot/go_modules/test/github.com/avast/retry-go/v4-4.6.1
chore(deps): bump github.com/avast/retry-go/v4 from 4.6.0 to 4.6.1 in /test
2025-02-26 21:06:30 -08:00
dependabot[bot] 628f021ffb
chore(deps): bump github.com/avast/retry-go/v4 in /test
Bumps [github.com/avast/retry-go/v4](https://github.com/avast/retry-go) from 4.6.0 to 4.6.1.
- [Release notes](https://github.com/avast/retry-go/releases)
- [Commits](https://github.com/avast/retry-go/compare/4.6.0...4.6.1)

---
updated-dependencies:
- dependency-name: github.com/avast/retry-go/v4
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-24 22:26:54 +00:00
dependabot[bot] 92597e574d
chore(deps): bump golang from 1.23-bookworm to 1.24-bookworm
Bumps golang from 1.23-bookworm to 1.24-bookworm.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-17 21:59:19 +00:00
Chris Henzie 72f3041d2b Move disk and memory metrics in custom group to compute 2025-02-13 15:29:09 -08:00
Nick Parker 8d237a6c7c feat(k8sExporter): Options to allow disabling Events or Node Conditions
Both outputs are currently hardcoded to being enabled, this allows disabling one or the other. Defaults to both enabled to retain current behavior.

Larger clusters can save some etcd I/O by skipping one of these outputs if they aren't being consumed. In our case we aren't consuming the Events so writing them just creates more churn.
2025-02-04 14:53:19 +13:00
dependabot[bot] c5f6fbc3d1
chore(deps): bump github.com/spf13/pflag from 1.0.5 to 1.0.6 in /test
Bumps [github.com/spf13/pflag](https://github.com/spf13/pflag) from 1.0.5 to 1.0.6.
- [Release notes](https://github.com/spf13/pflag/releases)
- [Commits](https://github.com/spf13/pflag/compare/v1.0.5...v1.0.6)

---
updated-dependencies:
- dependency-name: github.com/spf13/pflag
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-03 21:58:42 +00:00
Kubernetes Prow Robot 12a8f5578c
Merge pull request #1010 from kubernetes/dependabot/github_actions/actions-all-09526a9899
chore(deps): bump the actions-all group with 3 updates
2025-01-18 22:54:34 -08:00
dependabot[bot] d2cbde95e5
chore(deps): bump the actions-all group with 3 updates
Bumps the actions-all group with 3 updates: [step-security/harden-runner](https://github.com/step-security/harden-runner), [github/codeql-action](https://github.com/github/codeql-action) and [actions/upload-artifact](https://github.com/actions/upload-artifact).


Updates `step-security/harden-runner` from 2.10.2 to 2.10.3
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](0080882f6c...c95a14d0e5)

Updates `github/codeql-action` from 3.28.0 to 3.28.1
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](48ab28a6f5...b6a472f63d)

Updates `actions/upload-artifact` from 4.5.0 to 4.6.0
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](6f51ac03b9...65c4c4a1dd)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-13 21:17:44 +00:00
Kubernetes Prow Robot 66336e630a
Merge pull request #1000 from kubernetes/dependabot/docker/golang-2e83858
chore(deps): bump golang from `ef30001` to `2e83858`
2025-01-07 22:50:29 -08:00
Kubernetes Prow Robot 93bc55b659
Merge pull request #998 from kubernetes/dependabot/github_actions/actions-all-1509149478
chore(deps): bump the actions-all group with 2 updates
2025-01-07 22:16:29 -08:00
dependabot[bot] 72f1e1de7b
chore(deps): bump golang from `ef30001` to `2e83858`
Bumps golang from `ef30001` to `2e83858`.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-30 21:33:27 +00:00
dependabot[bot] 0a997f8116
chore(deps): bump the actions-all group with 2 updates
Bumps the actions-all group with 2 updates: [github/codeql-action](https://github.com/github/codeql-action) and [actions/upload-artifact](https://github.com/actions/upload-artifact).


Updates `github/codeql-action` from 3.27.9 to 3.28.0
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](df409f7d92...48ab28a6f5)

Updates `actions/upload-artifact` from 4.4.3 to 4.5.0
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](b4b15b8c7c...6f51ac03b9)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-30 21:24:25 +00:00
Kubernetes Prow Robot 053539efd8
Merge pull request #999 from googs1025/refactor/custom_plugin
chore: refactor custom plugin monitor method
2024-12-28 14:24:13 +01:00
googs1025 cf0870fa12 chore: refactor custom plugin monitor method 2024-12-27 09:24:47 +08:00
Kubernetes Prow Robot 334a857fbe
Merge pull request #997 from googs1025/monitor_log
feature: add custom message for systemlogmonitor rule
2024-12-23 16:20:12 +01:00
googs1025 f5433f460d feature: add custom message for systemlogmonitor rule 2024-12-23 19:45:29 +08:00
Kubernetes Prow Robot 93e64ac709
Merge pull request #996 from kubernetes/dependabot/go_modules/k8s-87f99dd5df
Bump the k8s group across 2 directories with 4 updates
2024-12-18 18:48:09 +01:00
Kubernetes Prow Robot 146ce4aa86
Merge pull request #995 from kubernetes/dependabot/github_actions/actions-all-d4e08d60db
Bump github/codeql-action from 3.27.6 to 3.27.9 in the actions-all group
2024-12-18 18:14:09 +01:00
dependabot[bot] d99fca5f0a
Bump the k8s group across 2 directories with 4 updates
Bumps the k8s group with 2 updates in the / directory: [k8s.io/api](https://github.com/kubernetes/api) and [k8s.io/client-go](https://github.com/kubernetes/client-go).
Bumps the k8s group with 2 updates in the /test directory: [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) and [k8s.io/component-base](https://github.com/kubernetes/component-base).


Updates `k8s.io/api` from 0.31.3 to 0.31.4
- [Commits](https://github.com/kubernetes/api/compare/v0.31.3...v0.31.4)

Updates `k8s.io/apimachinery` from 0.31.3 to 0.31.4
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.3...v0.31.4)

Updates `k8s.io/client-go` from 0.31.3 to 0.31.4
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kubernetes/client-go/compare/v0.31.3...v0.31.4)

Updates `k8s.io/apimachinery` from 0.31.3 to 0.31.4
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.3...v0.31.4)

Updates `k8s.io/component-base` from 0.29.11 to 0.29.12
- [Commits](https://github.com/kubernetes/component-base/compare/v0.29.11...v0.29.12)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/client-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/component-base
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-16 21:34:00 +00:00
dependabot[bot] 17d7588bff
Bump github/codeql-action from 3.27.6 to 3.27.9 in the actions-all group
Bumps the actions-all group with 1 update: [github/codeql-action](https://github.com/github/codeql-action).


Updates `github/codeql-action` from 3.27.6 to 3.27.9
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](aa57810251...df409f7d92)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-16 21:22:01 +00:00
Kubernetes Prow Robot 26c77134bf
Merge pull request #993 from kubernetes/dependabot/github_actions/actions-all-bfc1b4bcc0
Bump the actions-all group across 1 directory with 2 updates
2024-12-10 20:16:09 +00:00
Kubernetes Prow Robot 7d29a1c293
Merge pull request #992 from kubernetes/dependabot/docker/golang-ef30001
Bump golang from `3f3b9da` to `ef30001`
2024-12-10 20:16:02 +00:00
Kubernetes Prow Robot 3a8a07ad81
Merge pull request #989 from kubernetes/dependabot/go_modules/k8s-a3c835c0f0
Bump the k8s group across 2 directories with 4 updates
2024-12-10 19:40:02 +00:00
dependabot[bot] cab30567cb
Bump the actions-all group across 1 directory with 2 updates
Bumps the actions-all group with 2 updates in the / directory: [github/codeql-action](https://github.com/github/codeql-action) and [actions/dependency-review-action](https://github.com/actions/dependency-review-action).


Updates `github/codeql-action` from 3.27.4 to 3.27.6
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](ea9e4e3799...aa57810251)

Updates `actions/dependency-review-action` from 4.4.0 to 4.5.0
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](4081bf99e2...3b139cfc5f)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: actions/dependency-review-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-09 22:08:43 +00:00
dependabot[bot] 53f404dfed
Bump golang from `3f3b9da` to `ef30001`
Bumps golang from `3f3b9da` to `ef30001`.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-09 21:41:05 +00:00
dependabot[bot] b92aae803d
Bump the k8s group across 2 directories with 4 updates
Bumps the k8s group with 2 updates in the / directory: [k8s.io/api](https://github.com/kubernetes/api) and [k8s.io/client-go](https://github.com/kubernetes/client-go).
Bumps the k8s group with 2 updates in the /test directory: [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) and [k8s.io/component-base](https://github.com/kubernetes/component-base).


Updates `k8s.io/api` from 0.31.2 to 0.31.3
- [Commits](https://github.com/kubernetes/api/compare/v0.31.2...v0.31.3)

Updates `k8s.io/apimachinery` from 0.31.2 to 0.31.3
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.2...v0.31.3)

Updates `k8s.io/client-go` from 0.31.2 to 0.31.3
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kubernetes/client-go/compare/v0.31.2...v0.31.3)

Updates `k8s.io/apimachinery` from 0.31.2 to 0.31.3
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.2...v0.31.3)

Updates `k8s.io/component-base` from 0.29.10 to 0.29.11
- [Commits](https://github.com/kubernetes/component-base/compare/v0.29.10...v0.29.11)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/client-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/component-base
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-25 21:51:35 +00:00
Kubernetes Prow Robot e8840b1a7d
Merge pull request #988 from kubernetes/dependabot/docker/golang-3f3b9da
Bump golang from `0e3377d` to `3f3b9da`
2024-11-20 02:57:00 +00:00
Kubernetes Prow Robot 29a98372ff
Merge pull request #987 from kubernetes/dependabot/github_actions/actions-all-bdb9acd0fa
Bump the actions-all group with 2 updates
2024-11-20 02:56:54 +00:00
Kubernetes Prow Robot daaa07d690
Merge pull request #986 from daveoy/master
feat(makefile): add CC switch on GOARCH
2024-11-20 01:18:53 +00:00
dependabot[bot] 411cd7bd82
Bump golang from `0e3377d` to `3f3b9da`
Bumps golang from `0e3377d` to `3f3b9da`.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-18 23:09:22 +00:00
dependabot[bot] d79c681e63
Bump the actions-all group with 2 updates
Bumps the actions-all group with 2 updates: [step-security/harden-runner](https://github.com/step-security/harden-runner) and [github/codeql-action](https://github.com/github/codeql-action).


Updates `step-security/harden-runner` from 2.10.1 to 2.10.2
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](91182cccc0...0080882f6c)

Updates `github/codeql-action` from 3.27.1 to 3.27.4
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](4f3212b617...ea9e4e3799)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-18 22:05:11 +00:00
Dave Young 45dde88c98 chore: update cc def for amd64 to match 2024-11-14 17:54:50 -06:00
Dave Young b5ce184179 feat(makefile): add CC switch on GOARCH 2024-11-14 17:43:19 -06:00
Jian Wen 2e15606dda Monitor XFS shutdown
Related kernel error messages are as below.

kernel: XFS (dm-4): Internal error xfs_iunlink_remove at line 2038 of file fs/xfs/xfs_inode.c.  Caller xfs_ifree+0x33/0x130 [xfs]
kernel: XFS (dm-4): Corruption detected. Unmount and run xfs_repair
kernel: XFS (dm-4): xfs_inactive_ifree: xfs_ifree returned error -117
kernel: XFS (dm-4): xfs_do_force_shutdown(0x1) called from line 1788 of file fs/xfs/xfs_inode.c.  Return address = 000000009d022bf1
kernel: XFS (dm-4): I/O Error Detected. Shutting down filesystem
kernel: XFS (dm-4): Please umount the filesystem and rectify the problem(s)

Signed-off-by: Jian Wen <wenjianhn@gmail.com>
2024-11-14 15:46:00 +08:00
Kubernetes Prow Robot 711760063a
Merge pull request #985 from kubernetes/dependabot/github_actions/actions-all-60785c3230
Bump github/codeql-action from 3.27.0 to 3.27.1 in the actions-all group
2024-11-14 02:14:47 +00:00
Kubernetes Prow Robot 2a3acd2669
Merge pull request #984 from kubernetes/dependabot/docker/golang-0e3377d
Bump golang from `2341ddf` to `0e3377d`
2024-11-14 01:40:46 +00:00
dependabot[bot] 04173ee934
Bump github/codeql-action from 3.27.0 to 3.27.1 in the actions-all group
Bumps the actions-all group with 1 update: [github/codeql-action](https://github.com/github/codeql-action).


Updates `github/codeql-action` from 3.27.0 to 3.27.1
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](662472033e...4f3212b617)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-11 22:11:22 +00:00
dependabot[bot] f675d34e49
Bump golang from `2341ddf` to `0e3377d`
Bumps golang from `2341ddf` to `0e3377d`.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-11-11 22:09:45 +00:00
Kubernetes Prow Robot de55c54059
Merge pull request #983 from googs1025/feature/add_qps_burst
chore: qps flag: use float32 instead of float64
2024-11-06 05:47:29 +00:00
googs1025 0d756b78fc chore: qps flag: use float32 instead of float64 2024-11-06 13:09:20 +08:00
Kubernetes Prow Robot 8b2ff03f5e
Merge pull request #982 from kubernetes/dependabot/go_modules/k8s-731bdb9787
Bump the k8s group across 2 directories with 4 updates
2024-10-31 17:17:27 +00:00
Kubernetes Prow Robot f4f5c479d9
Merge pull request #980 from googs1025/feature/add_qps_burst
feature: add QPS Burst flags for client cfg
2024-10-29 06:18:54 +00:00
Kubernetes Prow Robot c1dd00d65c
Merge pull request #981 from kubernetes/dependabot/github_actions/actions-all-351e7943d4
Bump the actions-all group with 3 updates
2024-10-29 00:12:54 +00:00
dependabot[bot] 68a97cf4cb
Bump the k8s group across 2 directories with 4 updates
Bumps the k8s group with 2 updates in the / directory: [k8s.io/api](https://github.com/kubernetes/api) and [k8s.io/client-go](https://github.com/kubernetes/client-go).
Bumps the k8s group with 2 updates in the /test directory: [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) and [k8s.io/component-base](https://github.com/kubernetes/component-base).


Updates `k8s.io/api` from 0.31.1 to 0.31.2
- [Commits](https://github.com/kubernetes/api/compare/v0.31.1...v0.31.2)

Updates `k8s.io/apimachinery` from 0.31.1 to 0.31.2
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.1...v0.31.2)

Updates `k8s.io/client-go` from 0.31.1 to 0.31.2
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kubernetes/client-go/compare/v0.31.1...v0.31.2)

Updates `k8s.io/apimachinery` from 0.31.1 to 0.31.2
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.1...v0.31.2)

Updates `k8s.io/component-base` from 0.29.9 to 0.29.10
- [Commits](https://github.com/kubernetes/component-base/compare/v0.29.9...v0.29.10)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/client-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/component-base
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-28 21:49:45 +00:00
dependabot[bot] 6c32180ce6
Bump the actions-all group with 3 updates
Bumps the actions-all group with 3 updates: [actions/checkout](https://github.com/actions/checkout), [github/codeql-action](https://github.com/github/codeql-action) and [actions/dependency-review-action](https://github.com/actions/dependency-review-action).


Updates `actions/checkout` from 4.2.1 to 4.2.2
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](eef61447b9...11bd71901b)

Updates `github/codeql-action` from 3.26.13 to 3.27.0
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](f779452ac5...662472033e)

Updates `actions/dependency-review-action` from 4.3.4 to 4.4.0
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](5a2ce3f5b9...4081bf99e2)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
- dependency-name: actions/dependency-review-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-28 21:31:44 +00:00
Kubernetes Prow Robot 7416db2236
Merge pull request #978 from kubernetes/dependabot/docker/golang-2341ddf
Bump golang from `18d2f94` to `2341ddf`
2024-10-28 18:10:54 +00:00
googs1025 17dcc94418 feature: add QPS Burst flags 2024-10-27 21:58:45 +08:00
dependabot[bot] 70e99e1e1f
Bump golang from `18d2f94` to `2341ddf`
Bumps golang from `18d2f94` to `2341ddf`.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-21 22:09:03 +00:00
Kubernetes Prow Robot 53e0152f64
Merge pull request #976 from wangzhen127/release-new-version
Update version to v0.8.20
2024-10-16 17:33:03 +01:00
Zhen Wang 16656c89f6 Release v0.8.20 2024-10-16 08:10:44 -07:00
Kubernetes Prow Robot 0f4d8b96c5
Merge pull request #961 from smileusd/upstream_add_black_list_in_log_watcher
add black list to aviod take too much efforts to translate in file log watcher
2024-10-16 16:09:03 +01:00
tashen 3a386a659e add skip list to aviod take too much efforts to translate in file log watcher 2024-10-16 10:56:15 +08:00
Kubernetes Prow Robot b64f13f702
Merge pull request #975 from kubernetes/dependabot/github_actions/actions-all-52dfd9c053
Bump the actions-all group across 1 directory with 3 updates
2024-10-16 00:33:09 +01:00
Kubernetes Prow Robot 2182ad0ddb
Merge pull request #955 from DigitalVeer/master
Move ReadonlyFilesystem Node Condition to a new configuration file
2024-10-16 00:33:03 +01:00
Kubernetes Prow Robot 335e7e82ca
Merge pull request #972 from hakman/make-depup
Update cloud.google.com/go/compute/metadata to v0.5.2
2024-10-15 23:51:03 +01:00
Kubernetes Prow Robot f392516a37
Merge pull request #974 from hakman/skip-ext4-e2e
Skip ext4 e2e tests
2024-10-15 22:47:04 +01:00
dependabot[bot] c39f74def4
Bump the actions-all group across 1 directory with 3 updates
Bumps the actions-all group with 3 updates in the / directory: [actions/checkout](https://github.com/actions/checkout), [github/codeql-action](https://github.com/github/codeql-action) and [actions/upload-artifact](https://github.com/actions/upload-artifact).


Updates `actions/checkout` from 4.2.0 to 4.2.1
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](d632683dd7...eef61447b9)

Updates `github/codeql-action` from 3.26.10 to 3.26.13
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](e2b3eafc8d...f779452ac5)

Updates `actions/upload-artifact` from 4.4.0 to 4.4.3
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](50769540e7...b4b15b8c7c)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-14 21:12:03 +00:00
Ciprian Hacman be754653e6 Skip ext4 e2e tests 2024-10-14 09:29:24 +03:00
Ciprian Hacman f0c5cd5d20 Update cloud.google.com/go/compute/metadata to v0.5.2 2024-10-12 06:31:04 +03:00
Veer Singh ee955f9170 Move ReadonlyFilesystem to separate config file
Moved the ReadonlyFilesystem Node Condition to a separate plugin
configuration file and updated NPD to contain the appropiate new flags.
2024-10-09 00:20:49 -07:00
Kubernetes Prow Robot 3b91ca0c09
Merge pull request #967 from kubernetes/dependabot/docker/golang-18d2f94
Bump golang from `dba79eb` to `18d2f94`
2024-10-04 22:12:27 +01:00
dependabot[bot] 13c44d92fd
Bump golang from `dba79eb` to `18d2f94`
Bumps golang from `dba79eb` to `18d2f94`.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-10-02 16:10:33 +00:00
Kubernetes Prow Robot 56eb3dcb61
Merge pull request #966 from hakman/dependabot-cleanup
chore: Dependabot config cleanup
2024-10-02 17:09:49 +01:00
Ciprian Hacman e1d071ba63 chore: Add "ok-to-test" labels for dependabot PRs 2024-10-02 15:12:44 +03:00
Ciprian Hacman a22c0649f8 chore: Remove unused dependabot docker configs 2024-10-02 15:02:53 +03:00
Ciprian Hacman 57c97d2d47 chore: Merge dependabot gomod configs 2024-10-02 15:01:52 +03:00
Kubernetes Prow Robot d83e1bcb53
Merge pull request #965 from wangzhen127/golang
Update golang to 1.23.1 in go.mod
2024-10-02 11:25:49 +01:00
Zhen Wang 8f5c2e14fe Update golang to 1.23.1 in go.mod 2024-10-01 21:11:50 -07:00
Kubernetes Prow Robot f1c1759ca0
Merge pull request #963 from kubernetes/dependabot/github_actions/actions-all-b6c4674bda
Bump the actions-all group with 2 updates
2024-10-01 23:39:49 +01:00
Kubernetes Prow Robot 798610a11a
Merge pull request #962 from kubernetes/dependabot/docker/golang-dba79eb
Bump golang from `1a5326b` to `dba79eb`
2024-10-01 17:57:50 +01:00
dependabot[bot] 66fbb738fd
Bump the actions-all group with 2 updates
Bumps the actions-all group with 2 updates: [actions/checkout](https://github.com/actions/checkout) and [github/codeql-action](https://github.com/github/codeql-action).


Updates `actions/checkout` from 4.1.7 to 4.2.0
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](692973e3d9...d632683dd7)

Updates `github/codeql-action` from 3.26.8 to 3.26.10
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](294a9d9291...e2b3eafc8d)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-30 21:31:42 +00:00
dependabot[bot] ae6fa3560e
Bump golang from `1a5326b` to `dba79eb`
Bumps golang from `1a5326b` to `dba79eb`.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-30 21:30:55 +00:00
Kubernetes Prow Robot 35ffe05910
Merge pull request #958 from kubernetes/dependabot/github_actions/actions-all-366513d706
Bump github/codeql-action from 3.26.7 to 3.26.8 in the actions-all group
2024-09-25 22:19:29 +01:00
dependabot[bot] ceee726210
Bump github/codeql-action from 3.26.7 to 3.26.8 in the actions-all group
Bumps the actions-all group with 1 update: [github/codeql-action](https://github.com/github/codeql-action).


Updates `github/codeql-action` from 3.26.7 to 3.26.8
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](8214744c54...294a9d9291)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-23 21:48:27 +00:00
Kubernetes Prow Robot 13a06ccad9
Merge pull request #956 from hakman/remove-update-deps
chore: Remove broken workflow update-deps.yml
2024-09-23 16:45:59 +01:00
Ciprian Hacman 68d08ac953 chore: Remove broken workflow update-deps.yml 2024-09-21 07:28:03 +03:00
Kubernetes Prow Robot dc4200d805
Merge pull request #952 from kubernetes/dependabot/go_modules/k8s-2090623a6d
Bump the k8s group with 3 updates
2024-09-18 23:08:50 +01:00
Kubernetes Prow Robot a88792f4bd
Merge pull request #949 from kubernetes/dependabot/docker/build-image/debian-base-0a17678
Bump build-image/debian-base from `b30608f` to `0a17678`
2024-09-18 23:08:43 +01:00
Kubernetes Prow Robot e4fd02e9f1
Merge pull request #951 from kubernetes/dependabot/go_modules/test/github.com/onsi/gomega-1.31.1
Bump github.com/onsi/gomega from 1.31.0 to 1.31.1 in /test
2024-09-18 22:08:57 +01:00
Kubernetes Prow Robot 3e1bf74cda
Merge pull request #950 from kubernetes/dependabot/go_modules/test/k8s-5baa8bdda1
Bump the k8s group in /test with 2 updates
2024-09-18 22:08:50 +01:00
Kubernetes Prow Robot a8973a8664
Merge pull request #948 from kubernetes/dependabot/docker/golang-1.23-bookworm
Bump golang from 1.22-bookworm to 1.23-bookworm
2024-09-18 22:08:44 +01:00
Kubernetes Prow Robot 3c43a0bd10
Merge pull request #953 from kubernetes/dependabot/github_actions/actions-all-a674c93f01
Bump the actions-all group with 9 updates
2024-09-18 21:04:44 +01:00
dependabot[bot] 9ed6527f0a
Bump the k8s group with 3 updates
Bumps the k8s group with 3 updates: [k8s.io/api](https://github.com/kubernetes/api), [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) and [k8s.io/client-go](https://github.com/kubernetes/client-go).


Updates `k8s.io/api` from 0.31.0 to 0.31.1
- [Commits](https://github.com/kubernetes/api/compare/v0.31.0...v0.31.1)

Updates `k8s.io/apimachinery` from 0.31.0 to 0.31.1
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.0...v0.31.1)

Updates `k8s.io/client-go` from 0.31.0 to 0.31.1
- [Changelog](https://github.com/kubernetes/client-go/blob/master/CHANGELOG.md)
- [Commits](https://github.com/kubernetes/client-go/compare/v0.31.0...v0.31.1)

---
updated-dependencies:
- dependency-name: k8s.io/api
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/client-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-18 17:02:05 +00:00
dependabot[bot] 9694ee4354
Bump the actions-all group with 9 updates
Bumps the actions-all group with 9 updates:

| Package | From | To |
| --- | --- | --- |
| [step-security/harden-runner](https://github.com/step-security/harden-runner) | `2.7.1` | `2.10.1` |
| [actions/checkout](https://github.com/actions/checkout) | `3.6.0` | `4.1.7` |
| [github/codeql-action](https://github.com/github/codeql-action) | `2.25.5` | `3.26.7` |
| [actions/dependency-review-action](https://github.com/actions/dependency-review-action) | `2.5.1` | `4.3.4` |
| [ossf/scorecard-action](https://github.com/ossf/scorecard-action) | `2.0.6` | `2.4.0` |
| [actions/upload-artifact](https://github.com/actions/upload-artifact) | `3.1.3` | `4.4.0` |
| [actions/setup-go](https://github.com/actions/setup-go) | `5.0.1` | `5.0.2` |
| [peter-evans/create-pull-request](https://github.com/peter-evans/create-pull-request) | `6` | `7` |
| [jacobtomlinson/gha-find-replace](https://github.com/jacobtomlinson/gha-find-replace) | `2` | `3` |


Updates `step-security/harden-runner` from 2.7.1 to 2.10.1
- [Release notes](https://github.com/step-security/harden-runner/releases)
- [Commits](a4aa98b93c...91182cccc0)

Updates `actions/checkout` from 3.6.0 to 4.1.7
- [Release notes](https://github.com/actions/checkout/releases)
- [Commits](https://github.com/actions/checkout/compare/v3.6.0...v4.1.7)

Updates `github/codeql-action` from 2.25.5 to 3.26.7
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](d05fceb045...8214744c54)

Updates `actions/dependency-review-action` from 2.5.1 to 4.3.4
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](0efb1d1d84...5a2ce3f5b9)

Updates `ossf/scorecard-action` from 2.0.6 to 2.4.0
- [Release notes](https://github.com/ossf/scorecard-action/releases)
- [Changelog](https://github.com/ossf/scorecard-action/blob/main/RELEASE.md)
- [Commits](99c53751e0...62b2cac7ed)

Updates `actions/upload-artifact` from 3.1.3 to 4.4.0
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](a8a3f3ad30...50769540e7)

Updates `actions/setup-go` from 5.0.1 to 5.0.2
- [Release notes](https://github.com/actions/setup-go/releases)
- [Commits](cdcb360436...0a12ed9d6a)

Updates `peter-evans/create-pull-request` from 6 to 7
- [Release notes](https://github.com/peter-evans/create-pull-request/releases)
- [Commits](https://github.com/peter-evans/create-pull-request/compare/v6...v7)

Updates `jacobtomlinson/gha-find-replace` from 2 to 3
- [Release notes](https://github.com/jacobtomlinson/gha-find-replace/releases)
- [Commits](https://github.com/jacobtomlinson/gha-find-replace/compare/v2...v3)

---
updated-dependencies:
- dependency-name: step-security/harden-runner
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-all
- dependency-name: github/codeql-action
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-all
- dependency-name: actions/dependency-review-action
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-all
- dependency-name: ossf/scorecard-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: actions-all
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-all
- dependency-name: actions/setup-go
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: actions-all
- dependency-name: peter-evans/create-pull-request
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-all
- dependency-name: jacobtomlinson/gha-find-replace
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: actions-all
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-18 17:02:05 +00:00
dependabot[bot] d6d4d93e4e
Bump github.com/onsi/gomega from 1.31.0 to 1.31.1 in /test
Bumps [github.com/onsi/gomega](https://github.com/onsi/gomega) from 1.31.0 to 1.31.1.
- [Release notes](https://github.com/onsi/gomega/releases)
- [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/gomega/compare/v1.31.0...v1.31.1)

---
updated-dependencies:
- dependency-name: github.com/onsi/gomega
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-18 17:01:56 +00:00
dependabot[bot] 490faeace5
Bump the k8s group in /test with 2 updates
Bumps the k8s group in /test with 2 updates: [k8s.io/apimachinery](https://github.com/kubernetes/apimachinery) and [k8s.io/component-base](https://github.com/kubernetes/component-base).


Updates `k8s.io/apimachinery` from 0.31.0 to 0.31.1
- [Commits](https://github.com/kubernetes/apimachinery/compare/v0.31.0...v0.31.1)

Updates `k8s.io/component-base` from 0.29.2 to 0.29.9
- [Commits](https://github.com/kubernetes/component-base/compare/v0.29.2...v0.29.9)

---
updated-dependencies:
- dependency-name: k8s.io/apimachinery
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
- dependency-name: k8s.io/component-base
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: k8s
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-18 17:01:52 +00:00
dependabot[bot] f692ac3136
Bump build-image/debian-base from `b30608f` to `0a17678`
Bumps build-image/debian-base from `b30608f` to `0a17678`.

---
updated-dependencies:
- dependency-name: build-image/debian-base
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-18 17:01:34 +00:00
dependabot[bot] c8659fb914
Bump golang from 1.22-bookworm to 1.23-bookworm
Bumps golang from 1.22-bookworm to 1.23-bookworm.

---
updated-dependencies:
- dependency-name: golang
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-09-18 17:01:32 +00:00
Kubernetes Prow Robot 200d46726c
Merge pull request #912 from sozercan/secure-repo
[StepSecurity] Apply security best practices
2024-09-18 18:00:44 +01:00
Kubernetes Prow Robot 3173ed132e
Merge pull request #944 from jingxu97/patch-5
Update update-deps.yml
2024-09-09 18:06:35 +01:00
Jing Xu a22fe2a52f
Update update-deps.yml
- update schedule to Thursday 8pm
- update versions to match both jobs
2024-08-22 17:27:39 -07:00
Kubernetes Prow Robot c7befef47e
Merge pull request #942 from jingxu97/patch-2
Update update-deps.yml update-go-version job
2024-08-17 13:23:35 -07:00
Jing Xu 5f99c4d9b8
Update update-deps.yml update-go-version job
Fix issue when creating PR after update the go version.
2024-08-16 22:29:31 -07:00
Kubernetes Prow Robot ba355ee23f
Merge pull request #940 from baihongru/master
Update abrt-adaptor.json
2024-08-16 17:07:23 -07:00
Kubernetes Prow Robot ac9382a5c1
Merge pull request #938 from jingxu97/patch-1
Update update-deps.yml with dockerfile update
2024-08-16 17:07:14 -07:00
Kubernetes Prow Robot c123dddac8
Merge pull request #939 from kubernetes/dependencies/update-1723788389
Update dependencies
2024-08-16 16:21:02 -07:00
baihongru daf4f4da3e
Update abrt-adaptor.json
Indicates the unified name of KernelOops
2024-08-16 16:40:05 +08:00
github-actions be9ba585dd Update dependencies 2024-08-16 06:06:29 +00:00
Jing Xu 09c3cfe7ad
Update update-deps.yml with dockerfile update
This change set up a job to update go version in dockerfile. This only updates 1.22 patch version.
2024-08-15 10:58:55 -07:00
Kubernetes Prow Robot 16921fe90f
Merge pull request #935 from kubernetes/dependencies/update-1723183595
Update dependencies
2024-08-15 08:25:18 -07:00
github-actions 289f11b28f Update dependencies 2024-08-09 06:06:34 +00:00
Kubernetes Prow Robot 612199f0c6
Merge pull request #915 from hakman/remove-nethealth
chore: Remove unused binary `nethealth`
2024-08-05 11:23:13 -07:00
Kubernetes Prow Robot 71a4f7a631
Merge pull request #933 from kubernetes/dependencies/update-1722578853
Update dependencies
2024-08-05 10:45:12 -07:00
github-actions 1fbfdfd4f7 Update dependencies 2024-08-02 06:07:33 +00:00
Kubernetes Prow Robot 5efc8884d1
Merge pull request #931 from kubernetes/dependencies/update-1721369186
Update dependencies
2024-07-21 21:22:21 -07:00
github-actions c0bccb7c76 Update dependencies 2024-07-19 06:06:26 +00:00
Kubernetes Prow Robot 369020d878
Merge pull request #928 from BenTheElder/community-bucket
move to community staging bucket
2024-07-18 08:37:54 -07:00
Benjamin Elder 34fd4f8a8d move to community staging bucket 2024-07-17 16:11:03 -07:00
Kubernetes Prow Robot f0308d29b4
Merge pull request #925 from BenTheElder/push-tar
CI: only push tar instead of also container image
2024-07-08 16:12:52 -07:00
Benjamin Elder 4e0b9150b9 CI: build container and push tar, PR: push tar
We only need the tar to run CI tests, but we should also test building the container.

We release the container and binaries independently of this, this script is for e2e tests.
2024-07-08 15:36:40 -07:00
Kubernetes Prow Robot 34e60f82ec
Merge pull request #917 from kubernetes/dependencies/update-1717740395
Update dependencies
2024-06-10 10:23:10 -07:00
github-actions 7a48ce2e38 Update dependencies 2024-06-07 06:06:34 +00:00
Ciprian Hacman 69da591e38 chore: Remove unused binary nethealth 2024-05-18 09:52:39 +03:00
Kubernetes Prow Robot 6c34d837ef
Merge pull request #914 from kubernetes/bump-v0.8.19
Bump NPD versions to v0.8.19
2024-05-17 23:11:21 -07:00
Kubernetes Prow Robot ecdccfb86c
Merge pull request #913 from kubernetes/print-tar-sha-md5
Add helper script to print sha/md5 of tar files
2024-05-17 19:05:22 -07:00
Zhen Wang 132ccc8e81 Bump NPD versions to v0.8.19 2024-05-18 01:11:21 +00:00
Zhen Wang 86750df7c2 Add helper script to print sha/md5 of tar files
During release, there is manual work of generating the sha and md5
values from the built tar. This PR adds the helper script to generate
those in markdown format, so that it easier and less error-prone.
2024-05-18 01:05:37 +00:00
Sertac Ozercan 19c6f4db70
updates
Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2024-05-17 21:39:40 +00:00
Kubernetes Prow Robot e4ecee1976
Merge pull request #910 from kubernetes/dependencies/update-1715925961
Update dependencies
2024-05-17 13:13:52 -07:00
StepSecurity Bot 0dde605376
[StepSecurity] Apply security best practices
Signed-off-by: StepSecurity Bot <bot@stepsecurity.io>
2024-05-17 18:41:33 +00:00
github-actions 09bbaa9c32 Update dependencies 2024-05-17 06:06:00 +00:00
Kubernetes Prow Robot f004190ea1
Merge pull request #909 from sudheernv/rocky-linux-support
Add rocky linux support to GetOSVersion
2024-05-15 21:10:14 -07:00
Sudheer Nedlumane 7ee2a4dcda Add rocky linux support to GetOSVersion 2024-05-15 16:48:25 -07:00
Kubernetes Prow Robot f39c93e0f4
Merge pull request #908 from hakman/bump-debian-base
Update Go to v1.22.3
2024-05-13 10:43:09 -07:00
Ciprian Hacman 8c22b69431 Update Go to v1.22.3 2024-05-13 19:42:51 +03:00
Ciprian Hacman 030599e642 Update BASEIMAGE to debian-base:bookworm-v1.0.3 2024-05-13 19:40:09 +03:00
Kubernetes Prow Robot 66f9e5187f
Merge pull request #907 from kubernetes/dependencies/update-1715321145
Update dependencies
2024-05-13 09:30:55 -07:00
github-actions 5f59f438ac Update dependencies 2024-05-10 06:05:45 +00:00
Kubernetes Prow Robot 0b89667d18
Merge pull request #906 from kubernetes/dependencies/update-1714716344
Update dependencies
2024-05-03 08:42:39 -07:00
github-actions 338430f835 Update dependencies 2024-05-03 06:05:43 +00:00
Kubernetes Prow Robot a45f174cfc
Merge pull request #905 from kubernetes/dependencies/update-1714111557
Update dependencies
2024-04-26 06:52:17 -07:00
github-actions 273c3f5266 Update dependencies 2024-04-26 06:05:56 +00:00
Kubernetes Prow Robot b4623de861
Merge pull request #903 from kubernetes/dependencies/update-1713506782
Update dependencies
2024-04-20 23:44:03 -07:00
github-actions 7d81d8e12a Update dependencies 2024-04-19 06:06:22 +00:00
Kubernetes Prow Robot da09edb63c
Merge pull request #901 from kubernetes/dependencies/update-1712901927
Update dependencies
2024-04-15 09:59:53 -07:00
github-actions e4f8f268e8 Update dependencies 2024-04-12 06:05:27 +00:00
Kubernetes Prow Robot ecf4224d46
Merge pull request #900 from wangzhen127/explain-compatibility
Explain compatibility in README
2024-04-05 19:59:30 -07:00
Kubernetes Prow Robot 0dd173c51f
Merge pull request #899 from wangzhen127/add-comment
Add comment to health checker repair function to explain the need of kill instead of restart
2024-04-05 12:40:49 -07:00
Zhen Wang 2813b15c58 Explain compatibility in README 2024-04-05 17:52:04 +00:00
Kubernetes Prow Robot 0f60f182e8
Merge pull request #898 from kubernetes/dependencies/update-1712297141
Update dependencies
2024-04-05 10:45:33 -07:00
Zhen Wang aed88103f1 Add comment to health checker repair function to explain the need of kill instead of restart 2024-04-05 16:56:03 +00:00
Kubernetes Prow Robot 13b65d06e9
Merge pull request #897 from wangzhen127/bump-v0.8.18
Bump NPD versions to v0.8.18
2024-04-05 00:12:21 -07:00
github-actions 098d5ba360 Update dependencies 2024-04-05 06:05:40 +00:00
Zhen Wang ea591f5ac3 Bump NPD versions to v0.8.18 2024-04-05 05:49:30 +00:00
Kubernetes Prow Robot d5346f245c
Merge pull request #896 from guettli/patch-1
Add MachineHealthChecks of Cluster API
2024-04-04 21:47:54 -07:00
Thomas Güttler 8dac51c9e7
Add MachineHealthChecks of Cluster API 2024-04-04 10:05:10 +02:00
Kubernetes Prow Robot 775a138ad6
Merge pull request #895 from kubernetes/dependencies/update-1712182654
Update dependencies
2024-04-03 16:39:16 -07:00
github-actions 6c34d567d4 Update dependencies 2024-04-03 22:17:34 +00:00
Kubernetes Prow Robot 4c92bd54a2
Merge pull request #894 from hakman/patch-4
Update Go to v1.22.2
2024-04-03 12:58:07 -07:00
Ciprian Hacman a1bc4f865d
Update Go to v1.22.2
I removed the SHA256 because the tag is reused for updates.
2024-04-03 21:36:06 +03:00
Kubernetes Prow Robot a78ccb3612
Merge pull request #890 from hbeberman/hbeberman/add_mariner_azurelinux
Add support for CBL-Mariner and Azure Linux
2024-04-02 17:30:11 -07:00
Kubernetes Prow Robot 1626b85f13
Merge pull request #892 from kubernetes/dependencies/update-1711692327
Update dependencies
2024-03-29 11:16:04 -07:00
Kubernetes Prow Robot 9aa45e0cee
Merge pull request #891 from hakman/bump-debian-base
Update BASEIMAGE to debian-base:bookworm-v1.0.2
2024-03-29 00:01:22 -07:00
github-actions 7ed9c90baf Update dependencies 2024-03-29 06:05:27 +00:00
Ciprian Hacman e37dcfc3ff Update BUILDER to latest digest 2024-03-29 07:39:25 +02:00
Ciprian Hacman c0e4778fc0 Update BASEIMAGE to debian-base:bookworm-v1.0.2 2024-03-29 07:30:37 +02:00
Henry Beberman fda3234b64 Add support for CBL-Mariner and Azure Linux 2024-03-28 18:34:00 +00:00
Kubernetes Prow Robot d4aa574df2
Merge pull request #864 from linxiulei/node_cache
Get Node object from APIServer cache
2024-03-26 03:43:19 -07:00
Kubernetes Prow Robot 8cd92dbaba
Merge pull request #856 from levaspb/master
fix bug in skip_initial_status
2024-03-26 00:49:19 -07:00
Kubernetes Prow Robot 325938f2d2
Merge pull request #889 from kubernetes/dependencies/update-1711401526
Update dependencies
2024-03-25 22:07:18 -07:00
github-actions 10378c8b11 Update dependencies 2024-03-25 21:18:45 +00:00
Kubernetes Prow Robot 629774d3ed
Merge pull request #888 from hakman/auto-depup
Update dependencies every week
2024-03-25 10:35:50 -07:00
Ciprian Hacman 014cd7d6ac Update dependencies every week 2024-03-25 18:05:28 +02:00
Kubernetes Prow Robot bc72eff716
Merge pull request #886 from sgeannina/master
Declare 'builder' image in two steps to allow overriding the value
2024-03-22 14:20:58 -07:00
Nina Segares ce1d2c5c53 Declare 'builder' image in two steps to allow overridding the value 2024-03-18 17:38:27 +13:00
Kubernetes Prow Robot b48e438737
Merge pull request #883 from wangzhen127/make-push-release
Add make release
2024-03-10 22:55:32 -07:00
Zhen Wang e14c3e4ae5 Add make release
Adds a `make release` command for releasing new NPD version. It stops
pushing the tar files to gs://kubernetes-release, because no one has
write permission to the GCS bucket any more. We haven't pushed NPD tar
files to that GCS bucket after v0.8.10. k/k has been using NPD v0.8.13+
since 1.29. NPD release should just include the tar files in the release
note.
2024-03-11 04:27:16 +00:00
Kubernetes Prow Robot 58211f19f7
Merge pull request #882 from kubernetes/revert-880-update-make-push
Revert "Remove push-tar"
2024-03-10 21:19:05 -07:00
Zhen Wang b193e6e392
Revert "Remove push-tar" 2024-03-10 11:50:58 -07:00
Kubernetes Prow Robot 1667bae479
Merge pull request #873 from wangzhen127/update-release-process
Update release process
2024-03-09 21:52:59 -08:00
Kubernetes Prow Robot 953ca74ac9
Merge pull request #879 from wangzhen127/bump-v0.8.17
Bump NPD versions to v0.8.17
2024-03-09 09:44:35 -08:00
Kubernetes Prow Robot c2e0519a1f
Merge pull request #880 from wangzhen127/update-make-push
Remove push-tar
2024-03-09 07:55:50 -08:00
Zhen Wang c74bf4e01c Remove push-tar
The release process and `make push` pushes the tar files to
`gs://kubernetes-release` historically. No one has write permission to the
GCS bucket anymore. We haven't pushed NPD tar files to that GCS bucket
after v0.8.10. k/k has been using NPD v0.8.13+ since 1.29.

This PR cleans up the Make file. NPD release should just include the tar
files in the release note.

Related issues:
- https://github.com/kubernetes/node-problem-detector/issues/874
- https://github.com/kubernetes/node-problem-detector/issues/878
2024-03-09 14:52:36 +00:00
Zhen Wang e8623bdba7 Update release process 2024-03-09 14:35:54 +00:00
Zhen Wang e4d293eb51 Bump NPD versions to v0.8.17
Also ran `make gomod` and `make fmt` in the repo for cleanup.
2024-03-09 14:32:45 +00:00
Kubernetes Prow Robot e14b3921e8
Merge pull request #875 from hakman/depup
Update dependencies
2024-03-06 11:17:36 -08:00
Kubernetes Prow Robot b0ede7b09c
Merge pull request #876 from hakman/go-1.22.1
Update Go to v1.22.1
2024-03-06 10:00:47 -08:00
Ciprian Hacman af3f5c5882 Update Go to v1.22.1 2024-03-06 07:14:19 +02:00
Ciprian Hacman 9769baefb9 Update dependencies 2024-03-06 07:12:46 +02:00
Kubernetes Prow Robot 855780c9c1
Merge pull request #869 from hakman/patch-3
Release v0.8.16
2024-02-28 09:22:37 -08:00
Ciprian Hacman 74c95a2486
Release v0.8.16 2024-02-28 07:44:32 +02:00
Kubernetes Prow Robot 31fe5c1534
Merge pull request #868 from hakman/depup
Update dependencies
2024-02-27 21:12:57 -08:00
Ciprian Hacman 08b2255c33 Update dependencies 2024-02-28 06:27:06 +02:00
Kubernetes Prow Robot faa2923c51
Merge pull request #867 from hakman/patch-2
Remove ENABLE_JOURNALD=0 from cloudbuild.yaml
2024-02-27 09:34:51 -08:00
Ciprian Hacman 9444907a56
Remove ENABLE_JOURNALD=0 from cloudbuild.yaml 2024-02-26 19:01:02 +02:00
Eric Lin 7dd7c14868 Get Node object from APIServer cache 2024-02-20 14:23:13 +00:00
Kubernetes Prow Robot d1166d3495
Merge pull request #860 from chotiwat/patch-1
Fix healthchecker execCommand output
2024-02-15 23:09:28 -08:00
Chotiwat Chawannakul 008a62bb90
fix execCommand output 2024-02-14 13:57:11 -08:00
Kubernetes Prow Robot b6235fb72d
Merge pull request #857 from hakman/go-1.21.6
Update Go to v1.21.6 and dependencies
2024-02-03 21:31:37 -08:00
Ciprian Hacman e1385935b8 Add script to verify dependencies 2024-02-04 06:45:07 +02:00
Ciprian Hacman ef98b9612e Update dependencies 2024-02-04 06:24:27 +02:00
Ciprian Hacman 58017fd35e Add depup Makefile target 2024-02-04 06:24:02 +02:00
Ciprian Hacman d0e447d8e1 Update Go to v1.21.6 2024-02-04 06:13:59 +02:00
Kubernetes Prow Robot b32c1c5bd4
Merge pull request #855 from acumino/patch-1
Update debian-base image
2024-02-03 20:00:39 -08:00
Stanislav f24dbb13f7 fix bug in skip_initial_status 2024-02-02 14:33:11 +01:00
Sonu Kumar Singh 45c3445b2a
update debian-base image 2024-02-01 14:15:27 +05:30
Kubernetes Prow Robot 84eb1e338f
Merge pull request #853 from mmiranda96/grpc-version-bump
Bump google.golang.org/grpc to v1.57.1
2024-01-18 23:58:59 +01:00
Mike Miranda 689a066c90 Bump google.golang.org/grpc to v1.57.1 2024-01-18 22:10:45 +00:00
Kubernetes Prow Robot 5b031d63cc
Merge pull request #852 from linxiulei/status_message
Update config/systemd-monitor.json to match all systemd StatusUnitFormat
2024-01-18 19:56:44 +01:00
Eric Lin ce82f2a81b Update config/systemd-monitor.json to match all systemd StatusUnitFormat 2024-01-18 17:16:05 +00:00
Kubernetes Prow Robot f262b500fd
Merge pull request #848 from linxiulei/revert
Support revert-pattern in logcounter
2024-01-17 20:16:32 +01:00
Eric Lin c225435bea Use --revert-pattern to discount proactive restarts 2024-01-17 18:24:24 +00:00
Eric Lin 1002df5e13 Add --revert-pattern for logcounter 2024-01-17 18:21:57 +00:00
Kubernetes Prow Robot 18630b6c78
Merge pull request #849 from linxiulei/status_message
Make pattern match all systemd StatusUnitFormat
2024-01-17 17:53:39 +01:00
Eric Lin 0fba03ef7a Make pattern match all systemd StatusUnitFormat 2024-01-14 20:02:13 +00:00
Kubernetes Prow Robot e9eddcc6d3
Merge pull request #844 from aojea/iptables
custom iptables version monitor plugin
2024-01-03 21:18:06 +01:00
Kubernetes Prow Robot 3704fa72a9
Merge pull request #819 from hakman/tag-releases
Tag releases via PR
2024-01-03 19:10:00 +01:00
Antonio Ojea 552b530e0b custom plugin to monitor iptables versions rules
iptables has two kernel backends, legacy and nft.

Quoting https://developers.redhat.com/blog/2020/08/18/iptables-the-two-variants-and-their-relationship-with-nftables

> It is also important to note that while iptables-nft
> can supplant iptables-legacy, you should never use them simultaneously.

However, we don't want to block the node operations because of this
reason, as there is no enough evidence this is causing big issues in the
wild, so we just signal and warn about this situation.

Once we have more information we can revisit this decision and
keep it as is or move it to permanent.
2023-12-21 09:34:04 +00:00
Kubernetes Prow Robot 30e04d41fa
Merge pull request #843 from acumino/bump-dep
bump k8s.io to 0.29.0
2023-12-15 18:10:25 +01:00
Kubernetes Prow Robot bdaa44eb23
Merge pull request #825 from kasbert/master
Add disk and memory percent_used
2023-12-15 15:54:49 +01:00
Kubernetes Prow Robot 9f639dd892
Merge pull request #830 from hakman/reviewer-chacman
Add hakman as a reviewer
2023-12-15 15:01:43 +01:00
Kubernetes Prow Robot e3c396e324
Merge pull request #828 from j4ckstraw/fix-pprof
fix: fix pprof by using DefaultServeMux
2023-12-15 15:01:34 +01:00
acumino 73a120de57 bump k8s.io to 0.29.0 2023-12-15 13:10:06 +05:30
Kubernetes Prow Robot 34b265af34
Merge pull request #834 from hakman/go-1.21.3
Update Go to v1.21.3
2023-11-01 20:18:14 +01:00
Ciprian Hacman d88694fbd1 Update Go to v1.21.3 2023-10-31 08:10:06 +02:00
Jarkko Sonninen 07900633cb Add disk and memory percent_used 2023-10-28 16:03:48 +03:00
Ciprian Hacman bf157f81f8 Add hakman as a reviewer 2023-10-08 09:33:37 +03:00
j4ckstraw e31cf7b137 fix: fix pprof by register handlers explicitly
see https://pkg.go.dev/net/http/pprof
> By default, all the profiles listed in runtime/pprof.Profile are
available (via Handler), in addition to the Cmdline, Profile, Symbol,
and Trace profiles defined in this package. If you are not using
DefaultServeMux, you will have to register handlers with the mux you are
using.

Signed-off-by: j4ckstraw <j4ckstraw@foxmail.com>
2023-10-08 10:49:24 +08:00
Kubernetes Prow Robot 07b7a42624
Merge pull request #823 from hakman/avast-retry
Update github.com/avast/retry-go to v4.5.0
2023-09-25 14:40:06 -07:00
Ciprian Hacman 27dcab4ba5 Prefix version with "v" 2023-09-25 06:16:43 +02:00
Ciprian Hacman aec1c74025 Tag releases via PR 2023-09-25 06:07:04 +02:00
Ciprian Hacman a5aadf719a Update github.com/avast/retry-go to v4.5.0 2023-09-24 11:25:19 +02:00
Kubernetes Prow Robot 698c8b067c
Merge pull request #820 from MartinForReal/master
Add retry for patch node requests and replace deprecated poll function
2023-09-21 10:12:29 -07:00
Fan Shang Xiang d04bb3a5b0 add retry for patch node requests and replace deprecated poll function 2023-09-21 07:36:42 +00:00
Kubernetes Prow Robot b3653a0aff
Merge pull request #817 from hakman/remove_test_vendor
Remove vendoring for tests
2023-09-20 22:58:18 -07:00
Kubernetes Prow Robot 95829b8991
Merge pull request #818 from hakman/updated-coreos-systemd
Update github.com/coreos/go-systemd to v22.5.0
2023-09-20 16:12:01 -07:00
Ciprian Hacman fdd522a951 Update github.com/coreos/go-systemd to v22.5.0 2023-09-20 07:13:42 +03:00
Ciprian Hacman 5326e106f0 Remove vendoring for tests 2023-09-20 00:17:14 +03:00
Kubernetes Prow Robot ed94dff2cd
Merge pull request #816 from hakman/remove_fakeclock
Remove dependency on code.cloudfoundry.org/clock
2023-09-19 09:39:07 -07:00
Ciprian Hacman 65e4aa3c5e Remove dependency on code.cloudfoundry.org/clock 2023-09-19 12:50:29 +03:00
Kubernetes Prow Robot fb498567b4
Merge pull request #815 from hakman/klogv2
Move glog/klog logging to klog/v2
2023-09-17 16:04:02 -07:00
Kubernetes Prow Robot 76bf7b7e77
Merge pull request #814 from hakman/go-1.21.1
Update Go to v1.21.1
2023-09-17 15:09:58 -07:00
Ciprian Hacman 5210373640 Init useful flags for klog/v2 2023-09-17 11:00:42 +03:00
Manuel Rüger e43459d86d Move glog/klog logging to klog/v2 2023-09-17 08:57:33 +03:00
Kubernetes Prow Robot eeab0ab06f
Merge pull request #813 from hakman/makefile-remove-defaults
Remove `GO111MODULE=on` and `-mod vendor` from Makefile
2023-09-14 23:36:19 -07:00
Kubernetes Prow Robot be3b1ad382
Merge pull request #807 from hakman/tests-mac
Update tests to run also on macOS
2023-09-14 23:36:12 -07:00
Ciprian Hacman 0d276ac19f Update Go to v1.21.1 2023-09-15 09:06:52 +03:00
Kubernetes Prow Robot e2ef1de56a
Merge pull request #812 from hakman/cloudbuild-cpu
Increase vCPU for Cloud Build
2023-09-14 22:18:12 -07:00
Ciprian Hacman d4a00d4f20 Remove GO111MODULE=on and -mod vendor from Makefile 2023-09-15 07:55:12 +03:00
Ciprian Hacman 188340e3e9 Remove Travis CI config 2023-09-15 07:52:38 +03:00
Ciprian Hacman e56fb7de12 Increase vCPU for Cloud Build 2023-09-15 07:32:18 +03:00
Kubernetes Prow Robot 2bb82faa7b
Merge pull request #806 from MartinForReal/master
Bump k8s.io/client-go to 1.28.2
2023-09-14 11:06:14 -07:00
Kubernetes Prow Robot 79ffff83cb
Merge pull request #801 from hakman/fix-docker-build
Update docker build and fix ARM64 image
2023-09-14 10:14:14 -07:00
Ciprian Hacman e9922b0da7 Fix docker build for multi-arch 2023-09-14 13:13:01 +03:00
Kubernetes Prow Robot d8e9d550dc
Merge pull request #810 from acumino/upg/img
Switch debian base image from bullseye to bookworm
2023-09-14 01:38:17 -07:00
Fan Shang Xiang 8283e091cd bump k8s.io/client-go to 1.28.2 2023-09-14 06:22:49 +00:00
Kubernetes Prow Robot 1bcf025f67
Merge pull request #809 from hakman/move-test-deps
Move test dependencies to test dir
2023-09-13 20:14:11 -07:00
acumino 574b25418f Swich debian base image from bullseye to bookworm 2023-09-13 20:50:50 +05:30
Ciprian Hacman 9ad24ea2c7 Move test dependencies to test dir 2023-09-11 21:14:19 +03:00
Ciprian Hacman f58f6cd208 Update tests to run also on macOS 2023-09-11 19:25:59 +03:00
Kubernetes Prow Robot af7c925522
Merge pull request #804 from mrueg/gopsutil-v3
Move gopsutil to v3
2023-09-11 08:36:11 -07:00
Manuel Rüger c4311bd207 Move github.com/shirou/gopsutil to v3 2023-09-11 12:18:22 +02:00
Kubernetes Prow Robot ba1e0b3146
Merge pull request #808 from hakman/fix-make-fmt
Update `net_collector_test.go` formatting
2023-09-08 12:01:11 -07:00
Kubernetes Prow Robot 8b33e32e3d
Merge pull request #805 from hakman/demove-dep-cadvisor
Remove direct dependency on google/cadvisor@v0.36.0
2023-09-08 07:36:18 -07:00
Ciprian Hacman 568fbe8437 Update net_collector_test.go formatting 2023-09-08 16:08:40 +03:00
Ciprian Hacman 2077606ba3 Remove direct dependency on google/cadvisor 2023-09-08 15:28:25 +03:00
Kubernetes Prow Robot 9ff6b0bde4
Merge pull request #793 from MartinForReal/master
Replace k8s.io/apimachinery/pkg/util/clock with k8s.io/utils/clock and bump k8s.io to 1.28
2023-09-07 08:10:58 -07:00
Fan Shang Xiang 469ba765fd bump k8s.io deps to v0.28.1 2023-08-31 06:31:28 +00:00
Fan Shang Xiang adbe770d74 replace k8s.io/apimachinery/pkg/util/clock with k8s.io/utils/clock 2023-08-31 06:27:32 +00:00
Kubernetes Prow Robot 4ce2aca621
Merge pull request #796 from dims/add-sig-node-approvers-reviewers-to-OWNERS-and-OWNERS_ALIASES-files
Add sig-node approvers and reviewers to OWNERS/OWNERS_ALIASES
2023-08-29 11:17:21 -07:00
Davanum Srinivas e0fa1d2898
Add sig-node approvers and reviewers to OWNERS/OWNERS_ALIASES
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-08-29 13:19:57 -04:00
Kubernetes Prow Robot 608e129d8f
Merge pull request #788 from k4leung4/bump-k8s
update k8s.io to latest 0.17 patch version for fix to CVE-2020-8564
2023-08-24 17:19:17 -07:00
Kubernetes Prow Robot cae2cad3a5
Merge pull request #794 from anshikavashistha/fix-broken-links-on-testgrid-dashboard
fixed broken links on node-problem-detector
2023-08-17 15:38:28 -07:00
Girish Sharma 594c1b6583 fixed broken links on node-problem-detector 2023-08-17 22:56:05 +05:30
Kubernetes Prow Robot c3c53894c3
Merge pull request #792 from dobesv/patch-1
Improve docs for health checker and system stats configs
2023-08-16 15:31:52 -07:00
Kubernetes Prow Robot c9da164ae6
Merge pull request #774 from MartinForReal/shafan/context
Add context to long running operations
2023-08-16 14:05:52 -07:00
Dobes Vandermeer c3a3774cf1
Tweak README.md 2023-08-16 12:42:48 -07:00
Dobes Vandermeer c9edf4072e
README.md updates 2023-08-16 12:40:49 -07:00
Kenny Leung a8f7a9f270
update k8s.io to latest 0.17 patch version
Signed-off-by: Kenny Leung <kleung@chainguard.dev>
2023-08-11 12:34:10 -07:00
Kubernetes Prow Robot 09b7fb8814
Merge pull request #773 from MartinForReal/shafan/boskos
Replace boskos client with the one in sigs.k8s.io/boskos
2023-08-03 20:30:21 -07:00
Kubernetes Prow Robot 5953ba1261
Merge pull request #780 from btiernay/issue-778-fix-make-build-in-docker
Fix missing build arg in `build-in-docker` make target
2023-08-02 10:48:45 -07:00
Kubernetes Prow Robot ed99195ed6
Merge pull request #776 from tuladhar/patch-1
Update README.md: Fix Broken Link to Kubernetes DaemonSet
2023-08-02 10:48:39 -07:00
Kubernetes Prow Robot db83d7fe0b
Merge pull request #781 from mmiranda96/update-cloudbuild
Update gcb-docker-gcloud image version
2023-08-01 12:20:24 -07:00
Fan Shang Xiang 471ab88240 add context to long running operations 2023-07-13 10:01:18 +08:00
Mike Miranda 1bf525de79 Update gcb-docker-gcloud image version 2023-07-10 19:52:15 +00:00
Bobby Tiernay c2b2b0b3df fix: missing build arg in `build-in-docker` make target 2023-07-07 12:06:54 -04:00
Fan Shang Xiang e14abd4ea5 bump boskos client
Signed-off-by: Fan Shang Xiang <shafan@microsoft.com>
2023-07-06 01:00:41 +08:00
Kubernetes Prow Robot 55586431bd
Merge pull request #772 from mmiranda96/fix/update-deps
Upgrade dependencies to latest
2023-07-05 09:44:59 -07:00
Puru 967fe3fbc7
Update README.md
Fix broken link to DaemonSet
2023-07-01 22:45:06 +05:45
Mike Miranda 5fd18a117f Update dependencies to latest 2023-06-29 18:30:57 +00:00
Kubernetes Prow Robot fd51f17ec1
Merge pull request #770 from testwill/ioutil
chore: remove refs to deprecated io/ioutil
2023-06-26 11:48:29 -07:00
Kubernetes Prow Robot d605f87d6d
Merge pull request #769 from MartinForReal/patch-1
Enable experimental features in docker in postsubmit jobs.
2023-06-26 05:45:45 -07:00
guoguangwu 1ccff37f96 chore: remove refs to deprecated io/ioutil 2023-06-26 13:57:11 +08:00
Fan Shang Xiang d573b5d00f
Enable experimental features in docker in postsubmit jobs. 2023-06-26 13:50:30 +08:00
Kubernetes Prow Robot 6b538a5d4e
Merge pull request #768 from testwill/pkg-import
chore: pkg imported more than once
2023-06-25 22:07:08 -07:00
Kubernetes Prow Robot e6fbdd434a
Merge pull request #760 from MartinForReal/master
bump k8s.io dependencies to 1.17.2
2023-06-25 21:41:16 -07:00
Kubernetes Prow Robot 6e30b17476
Merge pull request #733 from raghu-nandan-bs/variablize-kube-endpoints
optionally read node and port information from env variables for kube* services
2023-06-25 21:41:07 -07:00
guoguangwu da422bb452 chore: pkg imported more than once 2023-06-21 14:14:53 +08:00
Kubernetes Prow Robot e992542b57
Merge pull request #767 from testwill/ioutil
chore: remove refs to deprecated io/ioutil
2023-06-20 21:40:20 -07:00
guoguangwu 6dc23ca804 chore: remove refs to deprecated io/ioutil 2023-06-21 12:12:27 +08:00
Kubernetes Prow Robot 339e243472
Merge pull request #763 from btiernay/issue-752-fix-macos-makefile-error
fix: Makefile OS conditional
2023-06-15 17:06:18 -07:00
Fan Shang Xiang b5e4ef628b bump k8s.io to 1.17.2 2023-06-12 22:27:39 +08:00
Bobby Tiernay c27b4beb6d fix: Makefile OS conditional 2023-06-09 11:59:46 -04:00
Kubernetes Prow Robot f116c9264c
Merge pull request #756 from MartinForReal/master
Remove heapster from project dependencies
2023-05-30 09:31:51 -07:00
MartinForReal 75095b2573 remove heapster from project dependencies 2023-05-18 01:38:29 +00:00
Kubernetes Prow Robot af2226183f
Merge pull request #745 from xmcqueen/master
updated the custom plugin configuration doc
2023-05-17 01:06:33 -07:00
Kubernetes Prow Robot 8ec3f36293
Merge pull request #744 from miguelbernadi/patch-1
Remove godep as it's not actually used
2023-05-17 00:30:33 -07:00
Kubernetes Prow Robot d4aeca09f5
Merge pull request #755 from double12gzh/master
[fix] cannot patch resource "nodes/status" in API group
2023-05-15 15:15:35 -07:00
Kubernetes Prow Robot b610240ce3
Merge pull request #746 from mmiranda96/fix/update-go-1.20.3
Update Docker image to Go 1.20.3
2023-05-15 14:31:35 -07:00
Kubernetes Prow Robot aec734d822
Merge pull request #750 from aritraghosh/aritraghosh-aksnpd
Update README.md
2023-05-15 14:03:34 -07:00
JeffreyGuan 7fc7947bc3 [fix] cannot patch resource "nodes/status" in API group 2023-05-15 21:01:22 +08:00
Aritra Ghosh 343e0f226c
Update README.md
Co-authored-by: Mike Miranda <mikemp96@gmail.com>
2023-05-04 20:42:35 -07:00
Kubernetes Prow Robot 9fd58e318f
Merge pull request #749 from mmiranda96/reviewer-status
Add mmiranda96 as a reviewer
2023-05-03 10:40:14 -07:00
Kubernetes Prow Robot 7cc8ec6315
Merge pull request #748 from mmiranda96/fix/747
Split proc default and validation between Linux and Windows
2023-05-02 16:04:16 -07:00
Aritra Ghosh a50e83a5c3 Update README.md
Added AKS enablement of NPD
2023-04-18 21:29:27 -07:00
Mike Miranda c658f9717b Add mmiranda96 as a reviewer 2023-04-13 20:59:36 +00:00
Mike Miranda 22157af0e5 Split proc default and validation between Linux and Windows 2023-04-13 18:52:59 +00:00
Mike Miranda d229082e26 Update Docker image to Go 1.20.3 2023-04-11 18:55:37 +00:00
Brian McQueen 4906ebb182 updated the custom plugin doc to clarify the usage for reason and message 2023-04-10 18:57:28 -07:00
Miguel Bernabeu Diaz a7adf55137
Remove godep as it's not actually used 2023-03-22 10:07:57 +01:00
Kubernetes Prow Robot 6e57ca6e6c
Merge pull request #740 from vaibhav2107/registry-updates
Updated references from k8s.gcr.io to registry.k8s.io
2023-02-22 18:01:07 -08:00
Kubernetes Prow Robot 948f634d8f
Merge pull request #743 from acumino/upd/img
Update Debian image
2023-02-22 17:21:07 -08:00
acumino 00fc95a16a Update debian image 2023-02-21 14:30:52 +05:30
vaibhav2107 a5fd95c982 Updated references from k8s.gcr.io to registry.k8s.io 2023-02-14 17:10:08 +05:30
Kubernetes Prow Robot 6dbe19abbd
Merge pull request #739 from vteratipally/adjf
update golang to 1.20
2023-02-09 14:52:55 -08:00
Varsha Teratipally e8b55acc2b update golang to 1.20 2023-02-09 20:36:36 +00:00
corneredrat 706bf35086 update defaultHost var name 2023-02-09 23:22:55 +05:30
corneredrat 2e0ff3d14c fix unit tests 2023-02-09 23:10:38 +05:30
corneredrat 429777eb5d fux unit tests 2023-02-09 22:14:40 +05:30
corneredrat 07317328f1 update expected results 2023-02-09 21:59:47 +05:30
corneredrat e6ab24db7f update expected results 2023-02-09 21:38:31 +05:30
corneredrat d88e0dda02 fix test for kube endpoints 2023-02-09 15:58:06 +05:30
corneredrat a117c0c056 1. make vars private
2. expose endpoints via functions
3. add test cases
4. rename host addr var
2023-02-09 15:02:52 +05:30
corneredrat 83e520784b use consts 2023-02-04 22:48:49 +05:30
corneredrat 92e63b5991 move node endpoints initialization to separate section 2023-02-04 21:14:20 +05:30
Kubernetes Prow Robot ff4af1b398
Merge pull request #735 from zendesk/grosser/go-get
remove redundant go-get instructions from readme
2023-02-03 20:18:28 -08:00
Michael Grosser e98f0c09ba
remove redundant go-get instructions from readme 2023-02-03 18:29:52 -08:00
corneredrat f601956af9 name var for env keys appropriately 2023-02-01 21:43:53 +05:30
corneredrat 2415e30efe remove redundant initialization 2023-02-01 21:42:52 +05:30
corneredrat 6163859ae8 read node and port information from env varibles for kube* services 2023-02-01 14:09:07 +05:30
Kubernetes Prow Robot b586bd9231
Merge pull request #717 from zendesk/grosser/space
health-checker cri: fix invalid command
2023-01-31 23:51:06 -08:00
Kubernetes Prow Robot 7b6805491c
Merge pull request #716 from zendesk/grosser/silence
remove "Start watching journald" to avoid plugin log spam
2023-01-31 23:13:06 -08:00
Michael Grosser 8578b779e2
health-checker cri: fix invalid command 2023-01-31 21:53:59 -08:00
Michael Grosser a83ef25930
remove "Start watching journald" to avoid plugin log spam 2023-01-31 21:31:31 -08:00
Kubernetes Prow Robot 2bf62c0180
Merge pull request #627 from rgolangh/patch-1
Add medik8s.io to Remdy Systems section
2023-01-31 15:36:52 -08:00
Kubernetes Prow Robot 005e4e0259
Merge pull request #732 from acumino/clean-dep
Enforce that `github.com/onsi/ginkgo/ginkgo` is considered dependency
2023-01-31 14:44:49 -08:00
acumino 0b34230dd5 Run `go mod tidy and vendor` 2023-02-01 00:58:35 +05:30
acumino 95056202c6 Add ginkgo as dependency 2023-02-01 00:57:12 +05:30
Kubernetes Prow Robot d77d8f2992
Merge pull request #721 from UiPath/new-os-distributions
Add support for SLES, Oracle and Amazon Linux
2023-01-31 10:48:56 -08:00
Kubernetes Prow Robot 0dc032e76f
Merge pull request #715 from zendesk/grosser/check-result
log failed results at a higher verbosity level
2023-01-31 10:48:49 -08:00
Kubernetes Prow Robot ed3111fec1
Merge pull request #729 from c202c/patch-1
Update README.md to fix a typo
2023-01-31 10:07:02 -08:00
Kubernetes Prow Robot 49fbd5cf4b
Merge pull request #727 from yordis/fix-spelling
chore: fix misspelling
2023-01-31 10:06:56 -08:00
Kubernetes Prow Robot d1c8a8bfe2
Merge pull request #723 from jason1028kr/jasonjung/add-comments
add few comments for custom plugin monitor
2023-01-31 10:06:48 -08:00
Kubernetes Prow Robot 80fc2c206e
Merge pull request #724 from justdan96/patch-1
update BASEIMAGE to debian-base:bullseye-v1.4.2
2023-01-31 08:18:49 -08:00
Dan Bryant 7f0a62683e fixup extra packages for installation 2023-01-05 09:43:24 +00:00
Wanlong CHEN df6320d147
Update README.md
Fixed a typo
2022-12-29 14:59:11 +00:00
Yordis Prieto Lazo 0842910049 chore: fix misspelling 2022-12-18 22:58:07 -05:00
Dan de33c801a5
update BASEIMAGE to debian-base:bullseye-v1.4.2
This is the latest Debian Bullseye image, it is the same as used in other Kubernetes projects, i.e. https://github.com/kubernetes-sigs/blob-csi-driver/pull/765/files
2022-11-14 18:03:55 +00:00
jasonjung cc6c049522 add comments for cpm 2022-11-12 13:43:02 -08:00
Alexandru Matei 0afa7cc6ff Add support for SLES, Oracle and Amazon Linux 2022-10-27 14:54:42 +03:00
Michael Grosser 169ff4f9fe
log failed results at a higher verbosity level 2022-10-19 14:31:18 -07:00
Kubernetes Prow Robot 2f959a773c
Merge pull request #702 from karlhungus/output_stderr_allow_timeout_setting
output stdout and stderr from custom commands, allow setting timeout for critctl
2022-10-03 13:02:14 -07:00
Kubernetes Prow Robot 7bd6e85b29
Merge pull request #703 from pratikmallya/patch-1
Update comment to be consistent with reality
2022-09-18 21:20:29 -07:00
Pratik 0127a75e05
Update comment to be consistent with reality 2022-09-18 20:21:05 -07:00
Izaak Alpert (karlhungus) b6d8069610
allow setting crictl timeout 2022-09-15 14:31:41 -04:00
Izaak Alpert (karlhungus) 6de3fabc9f
output stdout and stderr from custom commands 2022-09-15 14:31:24 -04:00
Kubernetes Prow Robot 9b2d0be950
Merge pull request #695 from vteratipally/master
[release-blocker-fix] fix building multi-arch image
2022-09-02 11:34:06 -07:00
Kubernetes Prow Robot 5c85ab20f5
Merge pull request #697 from YuikoTakada/doc_fix
fix helm command in README
2022-09-01 11:47:20 -07:00
Yuiko Mouri 2fceddf00e fix helm command in README 2022-08-30 09:29:59 +09:00
vteratipally 92745daa62
fix building multi-arch image 2022-08-27 16:28:39 -07:00
Kubernetes Prow Robot d8b2940b3c
Merge pull request #679 from 2rs2ts/condition-change-event-severity
Use Warn severity on K8s Event when Node condition is True
2022-08-01 16:22:28 -07:00
Kubernetes Prow Robot 5560df8cba
Merge pull request #690 from vteratipally/fix_docker
Fix dockerfile
2022-08-01 15:40:28 -07:00
Varsha Teratipally 8f9c5bbabb fix dockerfile 2022-08-01 21:55:53 +00:00
Kubernetes Prow Robot d00659c642
Merge pull request #689 from refluxwhw/dev
fix README under systemlogmonitor
2022-07-29 03:03:12 -07:00
whwreflux 3fba7a9e86 fix README under systemlogmonitor 2022-07-29 17:14:46 +08:00
Kubernetes Prow Robot 6e3260c43c
Merge pull request #687 from vaibhav2107/npd-link
Updated the docs under NPD
2022-07-26 20:00:35 -07:00
Kubernetes Prow Robot 9a9b06d24d
Merge pull request #660 from grosser/grosser/latest
simplify cri health check
2022-07-26 20:00:28 -07:00
Kubernetes Prow Robot 7bc362cfdc
Merge pull request #668 from grosser/grosser/systemd
show failed statuses as warning
2022-07-26 19:16:38 -07:00
Kubernetes Prow Robot 341af62275
Merge pull request #646 from notchairmk/notchairmk/custom-skip-initial
Allow skipping condition during customplugin initialization
2022-07-26 19:16:31 -07:00
Kubernetes Prow Robot 2d5de8d0fa
Merge pull request #684 from acumino/multi-arch-image
Create multi-arch image
2022-07-26 12:03:10 -07:00
Vaibhav 1c9447854f Fix the incorrect links in docs under NPD 2022-07-20 01:08:48 +05:30
Kubernetes Prow Robot c9ffa67ec4
Merge pull request #685 from diamondburned/resultChan-rm
Remove unused resultChan field in CPM
2022-07-14 22:34:25 -07:00
diamondburned 6809f445eb
Remove unused resultChan field in CPM
This commit removes the resultChan field in ./pkg/custompluginmonitor's
customPluginMonitor struct. This was detected by staticcheck:

    ―❤―▶ staticcheck ./pkg/custompluginmonitor/
    pkg/custompluginmonitor/custom_plugin_monitor.go:50:2: field resultChan is unused (U1000)
2022-07-12 21:43:05 -07:00
Sonu Kumar Singh 04e8d009d4 Use buildx for docker builder 2022-07-05 09:37:52 +05:30
Kubernetes Prow Robot 72f1672634
Merge pull request #675 from mmiranda96/feat/net-monitor-groupings
Add ExcludeInterfaceRegexp to Net Dev monitor
2022-06-29 14:50:06 -07:00
Andrew Garrett 72ad051dd6 Use Warn severity on K8s Event when Node condition is True
If temporary errors generate an Event with a Warn severity, then surely
permanent errors should generate an Event with at least that high of a
severity level.
2022-06-17 22:13:21 +00:00
Mike Miranda 1471f74d98 Add ExcludeInterfaceRegexp to Net Dev monitor 2022-06-15 23:22:38 +00:00
Kubernetes Prow Robot 56122ce0dd
Merge pull request #678 from 2rs2ts/master
Add condition message to event message
2022-06-15 15:43:11 -07:00
Andrew Garrett b1bd8e7424 Use %q instead of %s 2022-06-09 17:18:30 +00:00
Andrew Garrett a39a7c6e0f Add condition message to event message
If you're using some monitoring solution that aggregates events from
your Kubernetes cluster, having the underlying reason why a condition
triggered could be very useful, especially if you are using custom
plugin monitors.

Co-authored-by: Micah Norman <micnorman@paypal.com>
Signed-off-by: Ryan Eschinger <reschinger@paypal.com>
2022-06-08 21:42:40 +00:00
Michael Grosser 011b9e6a46
show failed statuses as warning 2022-04-26 11:50:10 -07:00
Taylor Chaparro 9344c938bb
Allow skipping condition during customplugin initialization 2022-04-26 10:12:01 -07:00
Kubernetes Prow Robot 51508603fe
Merge pull request #616 from com6056/patch-1
Install systemd in docker image
2022-04-24 13:49:37 -07:00
Kubernetes Prow Robot c083db10f0
Merge pull request #628 from mx-psi/master
Change to using new dependency name for osreleaser
2022-04-22 11:35:37 -07:00
Kubernetes Prow Robot a0abe5c667
Merge pull request #653 from matt-cale-do/patch-1
Grammar help
2022-04-21 12:28:18 -07:00
Kubernetes Prow Robot 9c23553e0b
Merge pull request #650 from yankay/fix-deprecated-maintainer-in-dockerfile
FIx deprecated "MAINTAINER" in Dockerfile
2022-04-21 12:28:12 -07:00
Kubernetes Prow Robot 8603b5b98b
Merge pull request #638 from vteratipally/adfad
Add vteratipally as an approver
2022-04-21 10:58:57 -07:00
Kubernetes Prow Robot 285516dc10
Merge pull request #594 from oif/chore/optimize-netcollector-implementation
optimize netcollector implementation and custom `/proc` mount path
2022-04-11 03:36:06 -07:00
Neo Zhuo 11ddb5e6bf support custom `/proc` path 2022-04-11 18:15:08 +08:00
Neo Zhuo 78c11c4ceb reimplement net collector metrics register, config check and recording 2022-04-11 18:15:07 +08:00
Pablo Baeyens 5e300846b2
Merge remote-tracking branch 'origin/master' into mx-psi/master 2022-03-29 14:45:27 -04:00
Michael Grosser d764b1ab87
simplify cri health check 2022-03-28 17:05:53 -07:00
Kubernetes Prow Robot 4412a2b9a4
Merge pull request #657 from vteratipally/remove_ginkgo
remove installing ginkgo using "go get"
2022-03-25 08:33:59 -07:00
varsha teratipally f508ccea7b remove installing ginkgo using "go get" as it ginkgo version supported
is 2.0
2022-03-25 00:48:12 +00:00
Kubernetes Prow Robot 68314853b8
Merge pull request #656 from vteratipally/ads
update golang, ginkgo and gomega dependencies
2022-03-24 09:58:41 -07:00
varsha teratipally 20c3b6f13c update ginkgo and gomega dependencies 2022-03-24 16:25:03 +00:00
Kubernetes Prow Robot bdbf6b3df9
Merge pull request #639 from vteratipally/adfa
install dependencies before building the NPD containers.
2022-03-23 13:41:20 -07:00
Matthew Cale 84259052d1
Grammar help
could have been: `"It monitors a specific kind of node problem..."` as well
2022-03-16 15:24:44 -05:00
Varsha Teratipally c370cfb68a install dependencies before building the NPD containers. 2022-03-10 20:39:56 +00:00
Kay Yan bc89bbce56 MAINTAINER in Dockerfile is deprecated, change to label
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2022-03-07 15:27:08 +08:00
Kelvie Wong 363d01392a Create multi-arch image
For linux arm64 and amd64, as per #586.

I moved the builder image into the same dockerfile, and bumped the Go
version on it. It didn't seem like the builder dockerfile worked with
the latest code anyway (the go modules require go 1.15 and higher).

This requires a recent enough docker install with buildx, as well as
an arm64 builder.

BASEIMAGE is changed to not specify an arch, so that the image will
build on its native arch in buildx.

Example image is on docker hub as:

    kelvie/node-problem-detector:v0.8.10-5-gb0fa610
2022-02-13 01:23:31 -08:00
Pablo Baeyens 0b64594d0a
Merge remote-tracking branch 'origin/master' into mx-psi/master 2022-01-13 10:42:04 +01:00
vteratipally 3e9834e26d
Update OWNERS 2022-01-12 12:10:39 -08:00
Kubernetes Prow Robot e7fe0b20dc
Merge pull request #629 from spiffxp/use-k8s-infra-for-gcb-image
images: use k8s-staging-test-infra/gcb-docker-gcloud
2021-12-15 16:23:18 -08:00
Aaron Crickenberger 0761e11cc4 images: use k8s-staging-test-infra/gcb-docker-gcloud 2021-11-30 13:02:50 -08:00
Pablo Baeyens a859b5f027
Change to using new dependency name for osreleaser
To do this I
1. changed the name in go.mod and the Go code that used it,
2. ran `go mod tidy -go=1.15` and
3. ran `go mod vendor`.

Step 3 added another vendored dependency unrelated AFAIK to this change.
2021-11-29 16:45:48 +01:00
Roy Golan 36dc9081ef
Add medik8s.io to Remdy Systems section 2021-11-22 08:41:56 +02:00
Jordan Rodgers 760d252808
Only need systemd 2021-09-03 17:11:48 -07:00
Jordan Rodgers 0de6fae1f8
Install curl and systemd in docker image
A few issues have popped up where the provided image doesn't have the required packages for certain health checking operations (like https://github.com/kubernetes/node-problem-detector/issues/584#issuecomment-885832078).

This installs curl and systemd in the container to help alleviate these issues.
2021-09-03 16:46:08 -07:00
Kubernetes Prow Robot e7d28a3bf1
Merge pull request #615 from mcshooter/updateTimeFormatForUptimeFunc
Ensure time is in Universal Time Zone to properly calculate uptime
2021-09-02 11:41:41 -07:00
michelletandya 3344efd552 ensure time is in Universal Time Zone to properly calculate uptime 2021-09-02 17:41:54 +00:00
Kubernetes Prow Robot 56c592a5d7
Merge pull request #587 from vteratipally/bug_fix
Add a check if the metric is nil so that collector doesn't collect metrics.
2021-08-31 09:21:37 -07:00
Kubernetes Prow Robot 1123fd22cb
Merge pull request #589 from xinydev/fix-build-guide
Update the instructions for build image in the readme
2021-08-30 15:11:13 -07:00
Kubernetes Prow Robot 393a9401b1
Merge pull request #607 from mmiranda96/fix/23202
Create cloudbuild file
2021-08-30 14:41:13 -07:00
Mike Miranda fd6c80b840 Create cloudbuild file 2021-08-23 23:05:20 +00:00
Kubernetes Prow Robot 3c3609b5fa
Merge pull request #612 from mcshooter/updateUptimeCMd
Update powershell command for uptime to help efficiency
2021-08-20 18:42:05 -07:00
Kubernetes Prow Robot 7a33650863
Merge pull request #609 from mcshooter/fixWindowsCPUIssue
Prevent uptimeFunc from being called everytime CheckHealth is called
2021-08-20 18:41:59 -07:00
Kubernetes Prow Robot a276a05765
Merge pull request #611 from ryuichi1208/master
Fixed dead link
2021-08-20 17:59:59 -07:00
michelletandya dd0d0d71ab Update powershell command for uptime to help efficiency 2021-08-20 01:16:45 +00:00
michelletandya 26f070bfd4 Prevent uptimeFunc from being called everytime CheckHealth is being called 2021-08-17 19:30:28 +00:00
Ryuichi Watanabe ca95d61bf8
fix README 2021-08-15 13:58:46 +09:00
Kubernetes Prow Robot f1aa82a9ae
Merge pull request #596 from lizhuqi/update-config
remove aufs hung check
2021-08-12 09:19:48 -07:00
Kubernetes Prow Robot aa5c7ec00d
Merge pull request #602 from mcshooter/update-makefile
update make clean to remove coverage.out
2021-08-05 10:33:22 -07:00
michelletandya 203116b614 update make clean and .gitignore 2021-08-04 19:45:08 +00:00
Kubernetes Prow Robot 383be3edec
Merge pull request #601 from vteratipally/maintainer
Add vteratipally as a reviewer
2021-08-02 15:12:19 -07:00
vteratipally 50ba775915
Update OWNERS 2021-08-02 13:50:27 -07:00
vteratipally 68bf26b08f
Add vteratipally as a maintainer 2021-08-02 11:58:51 -07:00
Julie Qi fe09e416bd remove aufs hung check 2021-07-30 13:53:25 -07:00
Kubernetes Prow Robot f9199e56c5
Merge pull request #595 from mcshooter/update-makefile
Add coverage.out to Makefile
2021-07-29 11:28:57 -07:00
Kubernetes Prow Robot 870ce7ce75
Merge pull request #575 from uthark/oatamanenko/kube-proxy-check
Check kube-proxy health on linux
2021-07-28 23:38:18 -07:00
Kubernetes Prow Robot 7c5e1385cf
Merge pull request #599 from Random-Liu/fix-npd-hash
Fix NPD hash for test.
2021-07-28 18:54:57 -07:00
Lantao Liu d8ce535dc3 Fix NPD hash for test. 2021-07-28 18:06:37 -07:00
michelletandya 49526abf27 Add coverage.out to Makefile 2021-07-23 18:45:44 +00:00
XinYang 62a5f8888e
Update build image guide docs
Signed-off-by: XinYang <xinydev@gmail.com>
2021-07-02 23:07:01 +08:00
Varsha Teratipally ebdd9038b7 Add a check if the metric is nil so that collector doesn't collect the
metrics.
2021-06-30 19:50:16 +00:00
Oleg Atamanenko c8629cea5d Check kube-proxy health on linux 2021-06-29 21:36:27 -07:00
Kubernetes Prow Robot 70f79831de
Merge pull request #570 from pwschuurman/fix-e2e-test-ext4-flake
Fix e2e-test flakes for Ext4 counter
2021-06-25 12:32:47 -07:00
Kubernetes Prow Robot cbb029d905
Merge pull request #583 from pezzak/log-kubeapi-error
Log error from kube-api
2021-06-25 10:18:51 -07:00
Kubernetes Prow Robot a0b0f9460f
Merge pull request #578 from kubernetes/partitions
Reduce the number of reads to /proc/partitions file and gofmt.
2021-06-25 10:18:45 -07:00
Kubernetes Prow Robot 220f0b00f1
Merge pull request #577 from vteratipally/adfad
Bump image version from v1.0.0 to v.2.0.0 to fix some of the CVEs.
2021-06-25 09:48:52 -07:00
Kubernetes Prow Robot e349323507
Merge pull request #539 from smileusd/health_check
improvement health-checker
2021-06-25 09:48:45 -07:00
Kubernetes Prow Robot 93badb28ac
Merge pull request #585 from jeremyje/corruptmanifest
Add HCS empty layer error reporting.
2021-06-25 09:14:45 -07:00
Jeremy Edwards d52844ae67 Add HCS empty layer error reporting. 2021-06-22 17:06:42 +00:00
pezzak ed97725ea1 Log error from kube-api 2021-06-17 12:51:44 +03:00
Kubernetes Prow Robot fae6181a54
Merge pull request #580 from mcshooter/updateWindowsCriCtlPath
update CriCtl path for windows
2021-06-15 10:30:02 -07:00
michelletandya a14577dfa4 update CriCtl path for windows 2021-06-15 01:03:04 +00:00
varsha teratipally 7b51a90328 Reduce the number of reads to /proc/partitions file
to retrive the partitions on disk
2021-06-13 21:11:34 +00:00
vteratipally 94d8373a9e
Update Makefile
Bump image version from v1.0.0 to v.2.0.0 to fix some of the CVEs.
2021-06-11 10:53:11 -07:00
Kubernetes Prow Robot 1150ce519f
Merge pull request #574 from mcshooter/addLocalWindowsMakeCommand
Update windows command in README to allow windows to be build alone locally
2021-06-07 16:58:01 -07:00
michelletandya c266c431f5 Update README instuctions for building windows locally 2021-06-07 17:45:14 +00:00
Kubernetes Prow Robot 9ce0dbfbd0
Merge pull request #561 from pwschuurman/arm64-support
Added arm64 targets for linux binaries
2021-05-26 15:05:39 -07:00
Peter Schuurman 84a54c5447 Update e2e-test to compare lower bound, rather than range for ext4 errors 2021-05-26 13:13:52 -07:00
Peter Schuurman bd2a900a37 Added arm64 targets for linux binaries 2021-05-26 11:23:36 -07:00
Kubernetes Prow Robot f27c3a8da9
Merge pull request #567 from mcshooter/updateWindowsDefenderConfigPath
config/windows-defender-monitor.json
2021-05-24 15:22:21 -07:00
michelletandya caf2bad7b6 config/windows-defender-monitor.json 2021-05-24 20:08:47 +00:00
tashen b409875246 add npd health-ckecker deploy file 2021-05-19 13:57:48 +08:00
tashen a3b928467e add loopbacktime to reduce time of journalctl call 2021-05-19 13:55:55 +08:00
5479 changed files with 1005445 additions and 606465 deletions

38
.github/dependabot.yml vendored Normal file
View File

@ -0,0 +1,38 @@
version: 2
updates:
- package-ecosystem: github-actions
directory: /
schedule:
interval: weekly
groups:
actions-all:
patterns:
- "*"
labels:
- "ok-to-test"
- package-ecosystem: docker
directory: /
schedule:
interval: weekly
labels:
- "ok-to-test"
- package-ecosystem: gomod
directories:
- /
- /test
schedule:
interval: weekly
ignore:
- dependency-name: "*"
update-types:
- "version-update:semver-major"
- "version-update:semver-minor"
groups:
k8s:
patterns:
- "k8s.io/*"
- "sigs.k8s.io/*"
labels:
- "ok-to-test"

78
.github/workflows/codeql.yml vendored Normal file
View File

@ -0,0 +1,78 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
#
# ******** NOTE ********
# We have attempted to detect the languages in your repository. Please check
# the `language` matrix defined below to confirm you have the correct set of
# supported CodeQL languages.
#
name: "CodeQL"
on:
push:
branches: ["master"]
pull_request:
# The branches below must be a subset of the branches above
branches: ["master"]
schedule:
- cron: "0 0 * * 1"
permissions:
contents: read
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: ["go"]
# CodeQL supports [ $supported-codeql-languages ]
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support
steps:
- name: Harden Runner
uses: step-security/harden-runner@002fdce3c6a235733a90a27c80493a3241e56863 # v2.12.1
with:
egress-policy: audit
- name: Checkout repository
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@ce28f5bb42b7a9f2c824e633a3f6ee835bab6858 # v3.29.0
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@ce28f5bb42b7a9f2c824e633a3f6ee835bab6858 # v3.29.0
# Command-line programs to run using the OS shell.
# 📚 See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
# If the Autobuild fails above, remove it and uncomment the following three lines.
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
# - run: |
# echo "Run, Build Application using script"
# ./location_of_script_within_repo/buildscript.sh
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@ce28f5bb42b7a9f2c824e633a3f6ee835bab6858 # v3.29.0
with:
category: "/language:${{matrix.language}}"

27
.github/workflows/dependency-review.yml vendored Normal file
View File

@ -0,0 +1,27 @@
# Dependency Review Action
#
# This Action will scan dependency manifest files that change as part of a Pull Request,
# surfacing known-vulnerable versions of the packages declared or updated in the PR.
# Once installed, if the workflow run is marked as required,
# PRs introducing known-vulnerable packages will be blocked from merging.
#
# Source repository: https://github.com/actions/dependency-review-action
name: 'Dependency Review'
on: [pull_request]
permissions:
contents: read
jobs:
dependency-review:
runs-on: ubuntu-latest
steps:
- name: Harden Runner
uses: step-security/harden-runner@002fdce3c6a235733a90a27c80493a3241e56863 # v2.12.1
with:
egress-policy: audit
- name: 'Checkout Repository'
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- name: 'Dependency Review'
uses: actions/dependency-review-action@da24556b548a50705dd671f47852072ea4c105d9 # v4.7.1

76
.github/workflows/scorecards.yml vendored Normal file
View File

@ -0,0 +1,76 @@
# This workflow uses actions that are not certified by GitHub. They are provided
# by a third-party and are governed by separate terms of service, privacy
# policy, and support documentation.
name: Scorecard supply-chain security
on:
# For Branch-Protection check. Only the default branch is supported. See
# https://github.com/ossf/scorecard/blob/main/docs/checks.md#branch-protection
branch_protection_rule:
# To guarantee Maintained check is occasionally updated. See
# https://github.com/ossf/scorecard/blob/main/docs/checks.md#maintained
schedule:
- cron: '20 7 * * 2'
push:
branches: ["master"]
# Declare default permissions as read only.
permissions: read-all
jobs:
analysis:
name: Scorecard analysis
runs-on: ubuntu-latest
permissions:
# Needed to upload the results to code-scanning dashboard.
security-events: write
# Needed to publish results and get a badge (see publish_results below).
id-token: write
contents: read
actions: read
steps:
- name: Harden Runner
uses: step-security/harden-runner@002fdce3c6a235733a90a27c80493a3241e56863 # v2.12.1
with:
egress-policy: audit
- name: "Checkout code"
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
persist-credentials: false
- name: "Run analysis"
uses: ossf/scorecard-action@05b42c624433fc40578a4040d5cf5e36ddca8cde # v2.4.2
with:
results_file: results.sarif
results_format: sarif
# (Optional) "write" PAT token. Uncomment the `repo_token` line below if:
# - you want to enable the Branch-Protection check on a *public* repository, or
# - you are installing Scorecards on a *private* repository
# To create the PAT, follow the steps in https://github.com/ossf/scorecard-action#authentication-with-pat.
# repo_token: ${{ secrets.SCORECARD_TOKEN }}
# Public repositories:
# - Publish results to OpenSSF REST API for easy access by consumers
# - Allows the repository to include the Scorecard badge.
# - See https://github.com/ossf/scorecard-action#publishing-results.
# For private repositories:
# - `publish_results` will always be set to `false`, regardless
# of the value entered here.
publish_results: true
# Upload the results as artifacts (optional). Commenting out will disable uploads of run results in SARIF
# format to the repository Actions tab.
- name: "Upload artifact"
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4.6.2
with:
name: SARIF file
path: results.sarif
retention-days: 5
# Upload the results to GitHub's code scanning dashboard.
- name: "Upload to code-scanning"
uses: github/codeql-action/upload-sarif@ce28f5bb42b7a9f2c824e633a3f6ee835bab6858 # v3.29.0
with:
sarif_file: results.sarif

33
.github/workflows/tag-release.yml vendored Normal file
View File

@ -0,0 +1,33 @@
name: tag-release
on:
push:
branches:
- master
paths:
- version.txt
permissions:
contents: read
jobs:
tag:
if: ${{ github.repository == 'kubernetes/node-problem-detector' }}
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- name: Harden Runner
uses: step-security/harden-runner@002fdce3c6a235733a90a27c80493a3241e56863 # v2.12.1
with:
egress-policy: audit
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
fetch-depth: 0
- run: /usr/bin/git config --global user.email actions@github.com
- run: /usr/bin/git config --global user.name 'GitHub Actions Release Tagger'
- run: hack/tag-release.sh
id: tag_release
outputs:
release_tag: ${{ steps.tag_release.outputs.release_tag }}

2
.gitignore vendored
View File

@ -6,3 +6,5 @@ pr.env
junit*.xml
debug.test
/output/
coverage.out
.idea/

18
.pre-commit-config.yaml Normal file
View File

@ -0,0 +1,18 @@
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.16.3
hooks:
- id: gitleaks
- repo: https://github.com/golangci/golangci-lint
rev: v1.52.2
hooks:
- id: golangci-lint
- repo: https://github.com/jumanjihouse/pre-commit-hooks
rev: 3.0.0
hooks:
- id: shellcheck
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace

View File

@ -1,33 +0,0 @@
os:
- linux
sudo: required
dist: xenial
language: go
go:
- "1.16"
- master
env:
- GO111MODULE=on
services:
- docker
before_install:
- sudo apt-get -qq update
- sudo apt-get install -y libsystemd-dev
install:
- mkdir -p $HOME/gopath/src/k8s.io
- mv $TRAVIS_BUILD_DIR $HOME/gopath/src/k8s.io/node-problem-detector
- cd $HOME/gopath/src/k8s.io/node-problem-detector
script:
- make
- make test
- make clean && BUILD_TAGS="disable_custom_plugin_monitor" make
- BUILD_TAGS="disable_custom_plugin_monitor" make test
- make clean && BUILD_TAGS="disable_system_log_monitor" make
- BUILD_TAGS="disable_system_log_monitor" make test
- make clean && BUILD_TAGS="disable_system_stats_monitor" make
- BUILD_TAGS="disable_system_stats_monitor" make test
- make clean && BUILD_TAGS="disable_stackdriver_exporter" make
- BUILD_TAGS="disable_stackdriver_exporter" make test
- make clean && ENABLE_JOURNALD=0 make
- ENABLE_JOURNALD=0 make test
- ENABLE_JOURNALD=0 make build-binaries

View File

@ -29,7 +29,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
- Windows build now supported.
- Added metrics to retrieve stats such as `procs_running` and `procs_blocked`.
- Added metrics to retrieve network stats.
- Added metric to retrieve guest OS features such as unknwon modules, ktd,
- Added metric to retrieve guest OS features such as unknown modules, ktd,
and kernel integrity.
### Changed
@ -158,7 +158,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
- Empty LogPath will now use journald's default path.
- Systemd monitor now looks back 5 minutes.
- Bumped base image to `k8s.gcr.io/debian-base-amd64:1.0.0`.
- Bumped base image to `registry.k8s.io/debian-base-amd64:1.0.0`.
- Updated the detection method for docker overlay2 issues.
- Moved NPD into the kube-system namespace.
@ -237,7 +237,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
- Added resource limites to NPD deployment.
- Added log-counter to dockerfile.
- Added `enable_message_change_based_condition_update` option to enable
condition update when messages cahnge for custom plugin.
condition update when messages change for custom plugin.
### Fixed
@ -248,7 +248,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
### Changed
- Bumped base image to `k8s.gcr.io/debian-base-amd64:0.4.0`.
- Bumped base image to `registry.k8s.io/debian-base-amd64:0.4.0`.
## [0.6.0] - 2018-11-27
@ -277,7 +277,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
- Changed default port from 10256 to 20256 to avoid conflict with kube-proxy.
- Bumped golang version from 1.8 to 1.9.
- Bumped base image to `k8s.gcr.io/debian-base-amd64:0.3`.
- Bumped base image to `registry.k8s.io/debian-base-amd64:0.3`.
### Fixed

View File

@ -14,7 +14,7 @@ If your repo has certain guidelines for contribution, put them here ahead of the
- [Contributor License Agreement](https://git.k8s.io/community/CLA.md) Kubernetes projects require that you sign a Contributor License Agreement (CLA) before we can accept your pull requests
- [Kubernetes Contributor Guide](http://git.k8s.io/community/contributors/guide) - Main contributor documentation, or you can just jump directly to the [contributing section](http://git.k8s.io/community/contributors/guide#contributing)
- [Contributor Cheat Sheet](https://git.k8s.io/community/contributors/guide/contributor-cheatsheet.md) - Common resources for existing developers
- [Contributor Cheat Sheet](https://git.k8s.io/community/contributors/guide/contributor-cheatsheet/README.md) - Common resources for existing developers
## Mentorship
@ -28,4 +28,4 @@ Custom Information - if you're copying this template for the first time you can
- [Slack channel](https://kubernetes.slack.com/messages/kubernetes-users) - Replace `kubernetes-users` with your slack channel string, this will send users directly to your channel.
- [Mailing list](URL)
-->
-->

View File

@ -12,20 +12,42 @@
# See the License for the specific language governing permissions and
# limitations under the License.
ARG BASEIMAGE
FROM ${BASEIMAGE}
# "builder-base" can be overriden using dockerb buildx's --build-context flag,
# by users who want to use a different images for the builder. E.g. if you need to use an older OS
# to avoid dependencies on very recent glibc versions.
# E.g. of the param: --build-context builder-base=docker-image://golang:<something>@sha256:<something>
# Must override builder-base, not builder, since the latter is referred to later in the file and so must not be
# directly replaced. See here, and note that "stage" parameter mentioned there has been renamed to
# "build-context": https://github.com/docker/buildx/pull/904#issuecomment-1005871838
FROM golang:1.24-bookworm@sha256:00eccd446e023d3cd9566c25a6e6a02b90db3e1e0bbe26a48fc29cd96e800901 as builder-base
FROM builder-base as builder
LABEL maintainer="Andy Xie <andy.xning@gmail.com>"
MAINTAINER Random Liu <lantaol@google.com>
ARG TARGETARCH
RUN clean-install util-linux libsystemd0 bash
ENV GOPATH /gopath/
ENV PATH $GOPATH/bin:$PATH
RUN apt-get update --fix-missing && apt-get --yes install libsystemd-dev gcc-aarch64-linux-gnu
RUN go version
COPY . /gopath/src/k8s.io/node-problem-detector/
WORKDIR /gopath/src/k8s.io/node-problem-detector
RUN GOARCH=${TARGETARCH} make bin/node-problem-detector bin/health-checker bin/log-counter
FROM --platform=${TARGETPLATFORM} registry.k8s.io/build-image/debian-base:bookworm-v1.0.4@sha256:0a17678966f63e82e9c5e246d9e654836a33e13650a698adefede61bb5ca099e as base
LABEL maintainer="Random Liu <lantaol@google.com>"
RUN clean-install util-linux bash libsystemd-dev
# Avoid symlink of /etc/localtime.
RUN test -h /etc/localtime && rm -f /etc/localtime && cp /usr/share/zoneinfo/UTC /etc/localtime || true
COPY ./bin/node-problem-detector /node-problem-detector
COPY --from=builder /gopath/src/k8s.io/node-problem-detector/bin/node-problem-detector /node-problem-detector
ARG LOGCOUNTER
COPY ./bin/health-checker ${LOGCOUNTER} /home/kubernetes/bin/
COPY --from=builder /gopath/src/k8s.io/node-problem-detector/bin/health-checker /gopath/src/k8s.io/node-problem-detector/${LOGCOUNTER} /home/kubernetes/bin/
COPY config /config
ENTRYPOINT ["/node-problem-detector", "--config.system-log-monitor=/config/kernel-monitor.json"]
COPY --from=builder /gopath/src/k8s.io/node-problem-detector/config/ /config
ENTRYPOINT ["/node-problem-detector", "--config.system-log-monitor=/config/kernel-monitor.json,/config/readonly-monitor.json"]

142
Makefile
View File

@ -17,12 +17,16 @@
.PHONY: all \
vet fmt version test e2e-test \
build-binaries build-container build-tar build \
docker-builder build-in-docker push-container push-tar push clean
docker-builder build-in-docker \
push-container push-tar push release clean depup \
print-tar-sha-md5
all: build
# PLATFORMS is the set of OS_ARCH that NPD can build against.
PLATFORMS=linux_amd64 windows_amd64
LINUX_PLATFORMS=linux_amd64 linux_arm64
DOCKER_PLATFORMS=linux/amd64,linux/arm64
PLATFORMS=$(LINUX_PLATFORMS) windows_amd64
# VERSION is the version of the binary.
VERSION?=$(shell if [ -d .git ]; then echo `git describe --tags --dirty`; else echo "UNKNOWN"; fi)
@ -63,21 +67,24 @@ IMAGE:=$(REGISTRY)/node-problem-detector:$(TAG)
# support needs libsystemd-dev or libsystemd-journal-dev.
ENABLE_JOURNALD?=1
ifeq ($(go env GOHOSTOS), darwin)
ifeq ($(shell go env GOHOSTOS), darwin)
ENABLE_JOURNALD=0
else ifeq ($(go env GOHOSTOS), windows)
else ifeq ($(shell go env GOHOSTOS), windows)
ENABLE_JOURNALD=0
endif
# TODO(random-liu): Support different architectures.
# The debian-base:v1.0.0 image built from kubernetes repository is based on
# Debian Stretch. It includes systemd 232 with support for both +XZ and +LZ4
# compression. +LZ4 is needed on some os distros such as COS.
BASEIMAGE:=k8s.gcr.io/debian-base-amd64:v1.0.0
# Disable cgo by default to make the binary statically linked.
CGO_ENABLED:=0
ifeq ($(GOARCH), arm64)
CC:=aarch64-linux-gnu-gcc
else
CC:=x86_64-linux-gnu-gcc
endif
# Set default Go architecture to AMD64.
GOARCH ?= amd64
# Construct the "-tags" parameter used by "go build".
BUILD_TAGS?=
@ -101,15 +108,15 @@ ifeq ($(ENABLE_JOURNALD), 1)
CGO_ENABLED:=1
LOGCOUNTER=./bin/log-counter
else
# Hack: Don't copy over log-counter, use a wildcard path that shouldnt match
# Hack: Don't copy over log-counter, use a wildcard path that shouldn't match
# anything in COPY command.
LOGCOUNTER=*dont-include-log-counter
endif
vet:
GO111MODULE=on go list -mod vendor -tags "$(HOST_PLATFORM_BUILD_TAGS)" ./... | \
go list -tags "$(HOST_PLATFORM_BUILD_TAGS)" ./... | \
grep -v "./vendor/*" | \
GO111MODULE=on xargs go vet -mod vendor -tags "$(HOST_PLATFORM_BUILD_TAGS)"
xargs go vet -tags "$(HOST_PLATFORM_BUILD_TAGS)"
fmt:
find . -type f -name "*.go" | grep -v "./vendor/*" | xargs gofmt -s -w -l
@ -123,12 +130,13 @@ ifeq ($(ENABLE_JOURNALD), 1)
BINARIES_LINUX_ONLY += bin/log-counter
endif
ALL_BINARIES = $(foreach binary, $(BINARIES) $(BINARIES_LINUX_ONLY), ./$(binary)) $(foreach binary, $(BINARIES) $(BINARIES_LINUX_ONLY), output/linux_amd64/$(binary)) $(foreach binary, $(BINARIES), output/windows_amd64/$(binary).exe)
ALL_BINARIES = $(foreach binary, $(BINARIES) $(BINARIES_LINUX_ONLY), ./$(binary)) \
$(foreach platform, $(LINUX_PLATFORMS), $(foreach binary, $(BINARIES) $(BINARIES_LINUX_ONLY), output/$(platform)/$(binary))) \
$(foreach binary, $(BINARIES), output/windows_amd64/$(binary).exe)
ALL_TARBALLS = $(foreach platform, $(PLATFORMS), $(NPD_NAME_VERSION)-$(platform).tar.gz)
output/windows_amd64/bin/%.exe: $(PKG_SOURCES)
GOOS=windows GOARCH=amd64 CGO_ENABLED=$(CGO_ENABLED) GO111MODULE=on go build \
-mod vendor \
GOOS=windows GOARCH=amd64 CGO_ENABLED=$(CGO_ENABLED) go build \
-o $@ \
-ldflags '-X $(PKG)/pkg/version.version=$(VERSION)' \
-tags "$(WINDOWS_BUILD_TAGS)" \
@ -136,15 +144,15 @@ output/windows_amd64/bin/%.exe: $(PKG_SOURCES)
touch $@
output/windows_amd64/test/bin/%.exe: $(PKG_SOURCES)
GOOS=windows GOARCH=amd64 CGO_ENABLED=$(CGO_ENABLED) GO111MODULE=on go build \
-mod vendor \
-o $@ \
cd test && \
GOOS=windows GOARCH=amd64 CGO_ENABLED=$(CGO_ENABLED) go build \
-o ../$@ \
-tags "$(WINDOWS_BUILD_TAGS)" \
./test/e2e/$(subst -,,$*)
./e2e/$(subst -,,$*)
output/linux_amd64/bin/%: $(PKG_SOURCES)
GOOS=linux GOARCH=amd64 CGO_ENABLED=$(CGO_ENABLED) GO111MODULE=on go build \
-mod vendor \
GOOS=linux GOARCH=amd64 CGO_ENABLED=$(CGO_ENABLED) \
CC=x86_64-linux-gnu-gcc go build \
-o $@ \
-ldflags '-X $(PKG)/pkg/version.version=$(VERSION)' \
-tags "$(LINUX_BUILD_TAGS)" \
@ -152,17 +160,34 @@ output/linux_amd64/bin/%: $(PKG_SOURCES)
touch $@
output/linux_amd64/test/bin/%: $(PKG_SOURCES)
GOOS=linux GOARCH=amd64 CGO_ENABLED=$(CGO_ENABLED) GO111MODULE=on go build \
-mod vendor \
-o $@ \
cd test && \
GOOS=linux GOARCH=amd64 CGO_ENABLED=$(CGO_ENABLED) \
CC=x86_64-linux-gnu-gcc go build \
-o ../$@ \
-tags "$(LINUX_BUILD_TAGS)" \
./test/e2e/$(subst -,,$*)
./e2e/$(subst -,,$*)
output/linux_arm64/bin/%: $(PKG_SOURCES)
GOOS=linux GOARCH=arm64 CGO_ENABLED=$(CGO_ENABLED) \
CC=aarch64-linux-gnu-gcc go build \
-o $@ \
-ldflags '-X $(PKG)/pkg/version.version=$(VERSION)' \
-tags "$(LINUX_BUILD_TAGS)" \
./cmd/$(subst -,,$*)
touch $@
output/linux_arm64/test/bin/%: $(PKG_SOURCES)
cd test && \
GOOS=linux GOARCH=arm64 CGO_ENABLED=$(CGO_ENABLED) \
CC=aarch64-linux-gnu-gcc go build \
-o ../$@ \
-tags "$(LINUX_BUILD_TAGS)" \
./e2e/$(subst -,,$*)
# In the future these targets should be deprecated.
./bin/log-counter: $(PKG_SOURCES)
ifeq ($(ENABLE_JOURNALD), 1)
CGO_ENABLED=$(CGO_ENABLED) GOOS=linux GO111MODULE=on go build \
-mod vendor \
CGO_ENABLED=$(CGO_ENABLED) GOOS=linux GOARCH=$(GOARCH) CC=$(CC) go build \
-o bin/log-counter \
-ldflags '-X $(PKG)/pkg/version.version=$(VERSION)' \
-tags "$(LINUX_BUILD_TAGS)" \
@ -172,38 +197,37 @@ else
endif
./bin/node-problem-detector: $(PKG_SOURCES)
CGO_ENABLED=$(CGO_ENABLED) GOOS=linux GO111MODULE=on go build \
-mod vendor \
CGO_ENABLED=$(CGO_ENABLED) GOOS=linux GOARCH=$(GOARCH) CC=$(CC) go build \
-o bin/node-problem-detector \
-ldflags '-X $(PKG)/pkg/version.version=$(VERSION)' \
-tags "$(LINUX_BUILD_TAGS)" \
./cmd/nodeproblemdetector
./test/bin/problem-maker: $(PKG_SOURCES)
CGO_ENABLED=$(CGO_ENABLED) GOOS=linux GO111MODULE=on go build \
-mod vendor \
-o test/bin/problem-maker \
cd test && \
CGO_ENABLED=$(CGO_ENABLED) GOOS=linux GOARCH=$(GOARCH) CC=$(CC) go build \
-o bin/problem-maker \
-tags "$(LINUX_BUILD_TAGS)" \
./test/e2e/problemmaker/problem_maker.go
./e2e/problemmaker/problem_maker.go
./bin/health-checker: $(PKG_SOURCES)
CGO_ENABLED=$(CGO_ENABLED) GOOS=linux GO111MODULE=on go build \
-mod vendor \
CGO_ENABLED=$(CGO_ENABLED) GOOS=linux GOARCH=$(GOARCH) CC=$(CC) go build \
-o bin/health-checker \
-ldflags '-X $(PKG)/pkg/version.version=$(VERSION)' \
-tags "$(LINUX_BUILD_TAGS)" \
cmd/healthchecker/health_checker.go
test: vet fmt
GO111MODULE=on go test -mod vendor -timeout=1m -v -race -short -tags "$(HOST_PLATFORM_BUILD_TAGS)" ./...
go test -timeout=1m -v -race -short -tags "$(HOST_PLATFORM_BUILD_TAGS)" ./...
e2e-test: vet fmt build-tar
GO111MODULE=on ginkgo -nodes=$(PARALLEL) -mod vendor -timeout=10m -v -tags "$(HOST_PLATFORM_BUILD_TAGS)" -stream \
./test/e2e/metriconly/... -- \
cd test && \
go run github.com/onsi/ginkgo/ginkgo -nodes=$(PARALLEL) -timeout=10m -v -tags "$(HOST_PLATFORM_BUILD_TAGS)" -stream \
./e2e/metriconly/... -- \
-project=$(PROJECT) -zone=$(ZONE) \
-image=$(VM_IMAGE) -image-family=$(IMAGE_FAMILY) -image-project=$(IMAGE_PROJECT) \
-ssh-user=$(SSH_USER) -ssh-key=$(SSH_KEY) \
-npd-build-tar=`pwd`/$(TARBALL) \
-npd-build-tar=`pwd`/../$(TARBALL) \
-boskos-project-type=$(BOSKOS_PROJECT_TYPE) -job-name=$(JOB_NAME) \
-artifacts-dir=$(ARTIFACTS)
@ -216,8 +240,9 @@ $(NPD_NAME_VERSION)-%.tar.gz: $(ALL_BINARIES) test/e2e-install.sh
build-binaries: $(ALL_BINARIES)
build-container: build-binaries Dockerfile
docker build -t $(IMAGE) --build-arg BASEIMAGE=$(BASEIMAGE) --build-arg LOGCOUNTER=$(LOGCOUNTER) .
build-container: clean Dockerfile
docker buildx create --platform $(DOCKER_PLATFORMS) --use
docker buildx build --platform $(DOCKER_PLATFORMS) -t $(IMAGE) --build-arg LOGCOUNTER=$(LOGCOUNTER) .
$(TARBALL): ./bin/node-problem-detector ./bin/log-counter ./bin/health-checker ./test/bin/problem-maker
tar -zcvf $(TARBALL) bin/ config/ test/e2e-install.sh test/bin/problem-maker
@ -229,7 +254,7 @@ build-tar: $(TARBALL) $(ALL_TARBALLS)
build: build-container build-tar
docker-builder:
docker build -t npd-builder ./builder
docker build -t npd-builder . --target=builder
build-in-docker: clean docker-builder
docker run \
@ -237,17 +262,46 @@ build-in-docker: clean docker-builder
-c 'cd /gopath/src/k8s.io/node-problem-detector/ && make build-binaries'
push-container: build-container
# So we can push to docker hub by setting REGISTRY
ifneq (,$(findstring gcr.io,$(REGISTRY)))
gcloud auth configure-docker
docker push $(IMAGE)
endif
# Build should be cached from build-container
docker buildx build --push --platform $(DOCKER_PLATFORMS) -t $(IMAGE) --build-arg LOGCOUNTER=$(LOGCOUNTER) .
push-tar: build-tar
gsutil cp $(TARBALL) $(UPLOAD_PATH)/node-problem-detector/
gsutil cp node-problem-detector-$(VERSION)-*.tar.gz* $(UPLOAD_PATH)/node-problem-detector/
# `make push` is used by presubmit and CI jobs.
push: push-container push-tar
# `make release` is used when releasing a new NPD version.
release: push-container build-tar print-tar-sha-md5
print-tar-sha-md5: build-tar
./hack/print-tar-sha-md5.sh $(VERSION)
coverage.out:
rm -f coverage.out
go test -coverprofile=coverage.out -timeout=1m -v -short ./...
clean:
rm -rf bin/
rm -rf test/bin/
rm -f node-problem-detector-*.tar.gz*
rm -rf output/
rm -f coverage.out
.PHONY: gomod
gomod:
go mod tidy
go mod vendor
cd test; go mod tidy
.PHONY: goget
goget:
go get $(shell go list -f '{{if not (or .Main .Indirect)}}{{.Path}}{{end}}' -mod=mod -m all)
.PHONY: depup
depup: goget gomod

10
OWNERS
View File

@ -1,12 +1,14 @@
reviewers:
- Random-Liu
- dchen1107
- sig-node-reviewers
- andyxning
- wangzhen127
- xueweiz
- vteratipally
- mmiranda96
- hakman
approvers:
- Random-Liu
- dchen1107
- sig-node-approvers
- andyxning
- wangzhen127
- xueweiz
- vteratipally

19
OWNERS_ALIASES Normal file
View File

@ -0,0 +1,19 @@
aliases:
sig-node-approvers:
- Random-Liu
- dchen1107
- derekwaynecarr
- yujuhong
- sjenning
- mrunalp
- klueska
- SergeyKanzhelev
- tallclair
sig-node-reviewers:
- Random-Liu
- dchen1107
- derekwaynecarr
- yujuhong
- sjenning
- mrunalp
- klueska

View File

@ -7,11 +7,11 @@ layers in the cluster management stack.
It is a daemon that runs on each node, detects node
problems and reports them to apiserver.
node-problem-detector can either run as a
[DaemonSet](http://kubernetes.io/docs/admin/daemons/) or run standalone.
[DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/) or run standalone.
Now it is running as a
[Kubernetes Addon](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
enabled by default in the GCE cluster.
enabled by default in the GKE cluster. It is also enabled by default in AKS as part of the
[AKS Linux Extension](https://learn.microsoft.com/en-us/azure/aks/faq#what-is-the-purpose-of-the-aks-linux-extension-i-see-installed-on-my-linux-vmss-instances).
# Background
There are tons of node problems that could possibly affect the pods running on the
@ -41,8 +41,8 @@ should be reported as `Event`.
# Problem Daemon
A problem daemon is a sub-daemon of node-problem-detector. It monitors a specific
kind of node problems and reports them to node-problem-detector.
A problem daemon is a sub-daemon of node-problem-detector. It monitors specific
kinds of node problems and reports them to node-problem-detector.
A problem daemon could be:
* A tiny daemon designed for dedicated Kubernetes use-cases.
@ -62,9 +62,9 @@ List of supported problem daemons types:
| Problem Daemon Types | NodeCondition | Description | Configs | Disabling Build Tag |
|----------------|:---------------:|:------------|:--------|:--------------------|
| [SystemLogMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/systemlogmonitor) | KernelDeadlock ReadonlyFilesystem FrequentKubeletRestart FrequentDockerRestart FrequentContainerdRestart | A system log monitor monitors system log and reports problems and metrics according to predefined rules. | [filelog](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor-filelog.json), [kmsg](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json), [kernel](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor-counter.json) [abrt](https://github.com/kubernetes/node-problem-detector/blob/master/config/abrt-adaptor.json) [systemd](https://github.com/kubernetes/node-problem-detector/blob/master/config/systemd-monitor-counter.json) | disable_system_log_monitor
| [SystemStatsMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/systemstatsmonitor) | None(Could be added in the future) | A system stats monitor for node-problem-detector to collect various health-related system stats as metrics. See the proposal [here](https://docs.google.com/document/d/1SeaUz6kBavI283Dq8GBpoEUDrHA2a795xtw0OvjM568/edit). | | disable_system_stats_monitor
| [SystemStatsMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/systemstatsmonitor) | None(Could be added in the future) | A system stats monitor for node-problem-detector to collect various health-related system stats as metrics. See the proposal [here](https://docs.google.com/document/d/1SeaUz6kBavI283Dq8GBpoEUDrHA2a795xtw0OvjM568/edit). | [system-stats-monitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/system-stats-monitor.json) | disable_system_stats_monitor
| [CustomPluginMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/custompluginmonitor) | On-demand(According to users configuration), existing example: NTPProblem | A custom plugin monitor for node-problem-detector to invoke and check various node problems with user-defined check scripts. See the proposal [here](https://docs.google.com/document/d/1jK_5YloSYtboj-DtfjmYKxfNnUxCAvohLnsH5aGCAYQ/edit#). | [example](https://github.com/kubernetes/node-problem-detector/blob/4ad49bbd84b8ced45ac825eac01ec93d9235935e/config/custom-plugin-monitor.json) | disable_custom_plugin_monitor
| [HealthChecker](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/healthchecker) | KubeletUnhealthy ContainerRuntimeUnhealthy| A health checker for node-problem-detector to check kubelet and container runtime health. | [kubelet](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-kubelet.json) [docker](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-docker.json) |
| [HealthChecker](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/healthchecker) | KubeletUnhealthy ContainerRuntimeUnhealthy| A health checker for node-problem-detector to check kubelet and container runtime health. | [kubelet](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-kubelet.json) [docker](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-docker.json) [containerd](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-containerd.json) |
# Exporter
@ -102,9 +102,14 @@ certain backends. Some of them can be disabled at compile-time using a build tag
* `--config.custom-plugin-monitor`: List of paths to custom plugin monitor config files, comma-separated, e.g.
[config/custom-plugin-monitor.json](https://github.com/kubernetes/node-problem-detector/blob/master/config/custom-plugin-monitor.json).
Node problem detector will start a separate custom plugin monitor for each configuration. You can
Node problem detector will start a separate custom plugin monitor for each configuration. You can
use different custom plugin monitors to monitor different node problems.
#### For Health Checkers
Health checkers are configured as custom plugins, using the config/health-checker-*.json config files.
#### For Kubernetes exporter
* `--enable-k8s-exporter`: Enables reporting to Kubernetes API server, default to `true`.
@ -137,12 +142,12 @@ For example, to run without auth, use the following config:
## Build Image
* `go get` or `git clone` node-problem-detector repo into `$GOPATH/src/k8s.io` or `$GOROOT/src/k8s.io`
with one of the below directions:
* `cd $GOPATH/src/k8s.io && git clone git@github.com:kubernetes/node-problem-detector.git`
* `cd $GOPATH/src/k8s.io && go get k8s.io/node-problem-detector`
* Install development dependencies for `libsystemd` and the ARM GCC toolchain
* Debian/Ubuntu: `apt install libsystemd-dev gcc-aarch64-linux-gnu`
* run `make` in the top directory. It will:
* `git clone git@github.com:kubernetes/node-problem-detector.git`
* Run `make` in the top directory. It will:
* Build the binary.
* Build the docker image. The binary and `config/` are copied into the docker image.
@ -158,11 +163,6 @@ and [System Stats Monitor](https://github.com/kubernetes/node-problem-detector/t
Check out the [Problem Daemon](https://github.com/kubernetes/node-problem-detector#problem-daemon) section
to see how to disable each problem daemon during compilation time.
**Note**:
By default, node-problem-detector will be built with systemd support with the `make` command. This requires systemd develop files.
You should download the systemd develop files first. For Ubuntu, the `libsystemd-journal-dev` package should
be installed. For Debian, the `libsystemd-dev` package should be installed.
## Push Image
`make push` uploads the docker image to a registry. By default, the image will be uploaded to
@ -175,7 +175,7 @@ The easiest way to install node-problem-detector into your cluster is to use the
```
helm repo add deliveryhero https://charts.deliveryhero.io/
helm install deliveryhero/node-problem-detector
helm install --generate-name deliveryhero/node-problem-detector
```
Alternatively, to install node-problem-detector manually:
@ -184,9 +184,13 @@ Alternatively, to install node-problem-detector manually:
2. Edit [node-problem-detector-config.yaml](deployment/node-problem-detector-config.yaml) to configure node-problem-detector.
3. Create the ConfigMap with `kubectl create -f node-problem-detector-config.yaml`.
3. Edit [rbac.yaml](deployment/rbac.yaml) to fit your environment.
3. Create the DaemonSet with `kubectl create -f node-problem-detector.yaml`.
4. Create the ServiceAccount and ClusterRoleBinding with `kubectl create -f rbac.yaml`.
4. Create the ConfigMap with `kubectl create -f node-problem-detector-config.yaml`.
5. Create the DaemonSet with `kubectl create -f node-problem-detector.yaml`.
## Start Standalone
@ -214,7 +218,7 @@ To develop NPD on Windows you'll need to setup your Windows machine for Go devel
* [Go](https://golang.org/)
* [Visual Studio Code](https://code.visualstudio.com/)
* [Make](http://gnuwin32.sourceforge.net/packages/make.htm)
* [mingw-64 WinBuilds](http://mingw-w64.org/doku.php/download/win-builds)
* [mingw-64 WinBuilds](http://mingw-w64.org/downloads)
* Tested with x86-64 Windows Native mode.
* Add the `$InstallDir\bin` to [Windows `PATH` variable](https://answers.microsoft.com/en-us/windows/forum/windows_10-other_settings-winpc/adding-path-variable/97300613-20cb-4d85-8d0e-cc9d3549ba23).
@ -222,16 +226,16 @@ To develop NPD on Windows you'll need to setup your Windows machine for Go devel
# Run these commands in the node-problem-detector directory.
# Build in MINGW64 Window
make clean windows-binaries
make clean ENABLE_JOURNALD=0 build-binaries
# Test in MINGW64 Window
make test
# Run with containerd log monitoring enabled in Command Prompt. (Assumes containerd is installed.)
%CD%\output\windows_amd64\node-problem-detector.exe --logtostderr --enable-k8s-exporter=false --config.system-log-monitor=%CD%\config\windows-containerd-monitor-filelog.json --config.system-stats-monitor=config\windows-system-stats-monitor.json
%CD%\output\windows_amd64\bin\node-problem-detector.exe --logtostderr --enable-k8s-exporter=false --config.system-log-monitor=%CD%\config\windows-containerd-monitor-filelog.json --config.system-stats-monitor=config\windows-system-stats-monitor.json
# Configure NPD to run as a Windows Service
sc.exe create NodeProblemDetector binpath= "%CD%\node-problem-detector.exe [FLAGS]" start= demand
sc.exe create NodeProblemDetector binpath= "%CD%\node-problem-detector.exe [FLAGS]" start= demand
sc.exe failure NodeProblemDetector reset= 0 actions= restart/10000
sc.exe start NodeProblemDetector
```
@ -264,9 +268,9 @@ For example, to test [KernelMonitor](https://github.com/kubernetes/node-problem-
node-problem-detector uses [go modules](https://github.com/golang/go/wiki/Modules)
to manage dependencies. Therefore, building node-problem-detector requires
golang 1.11+. It still uses vendoring. See the
[Kubernetes go modules KEP](https://github.com/kubernetes/enhancements/blob/master/keps/sig-architecture/2019-03-19-go-modules.md#alternatives-to-vendoring-using-go-modules)
[Kubernetes go modules KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/917-go-modules#alternatives-to-vendoring-using-go-modules)
for the design decisions. To add a new dependency, update [go.mod](go.mod) and
run `GO111MODULE=on go mod vendor`.
run `go mod vendor`.
# Remedy Systems
@ -275,30 +279,26 @@ detected by the node-problem-detector. Remedy systems observe events and/or node
conditions emitted by the node-problem-detector and take action to return the
Kubernetes cluster to a healthy state. The following remedy systems exist:
* [**Draino**](https://github.com/planetlabs/draino) automatically drains Kubernetes
nodes based on labels and node conditions. Nodes that match _all_ of the supplied
labels and _any_ of the supplied node conditions will be prevented from accepting
new pods (aka 'cordoned') immediately, and
[drained](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/)
after a configurable time. Draino can be used in conjunction with the
[Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
to automatically terminate drained nodes. Refer to
[this issue](https://github.com/kubernetes/node-problem-detector/issues/199)
for an example production use case for Draino.
* [**Descheduler**](https://github.com/kubernetes-sigs/descheduler) strategy RemovePodsViolatingNodeTaints
evicts pods violating NoSchedule taints on nodes. The k8s scheduler's TaintNodesByCondition feature must
be enabled. The [Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
can be used to automatically terminate drained nodes.
* [**mediK8S**](https://github.com/medik8s) is an umbrella project for automatic remediation
system build on [Node Health Check Operator (NHC)](https://github.com/medik8s/node-healthcheck-operator) that monitors
node conditions and delegates remediation to external remediators using the Remediation API.[Poison-Pill](https://github.com/medik8s/poison-pill)
is a remediator that will reboot the node and make sure all statefull workloads are rescheduled. NHC supports conditionally remediating if the cluster
has enough healthy capacity, or manually pausing any action to minimze cluster disruption.
* [**MachineHealthCheck**](https://cluster-api.sigs.k8s.io/developer/architecture/controllers/machine-health-check) of [Cluster API](https://cluster-api.sigs.k8s.io/) are responsible for remediating unhealthy Machines.
# Testing
NPD is tested via unit tests, [NPD e2e tests](https://github.com/kubernetes/node-problem-detector/blob/master/test/e2e/README.md), Kubernetes e2e tests and Kubernetes nodes e2e tests. Prow handles the [pre-submit tests](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/node-problem-detector/node-problem-detector-presubmits.yaml) and [CI tests](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/node-problem-detector/node-problem-detector-ci.yaml).
CI test results can be found below:
1. [Unit tests](https://k8s-testgrid.appspot.com/sig-node-node-problem-detector#ci-npd-test)
2. [NPD e2e tests](https://k8s-testgrid.appspot.com/sig-node-node-problem-detector#ci-npd-e2e-test)
3. [Kubernetes e2e tests](https://k8s-testgrid.appspot.com/sig-node-node-problem-detector#ci-npd-e2e-kubernetes-gce-gci)
4. [Kubernetes nodes e2e tests](https://k8s-testgrid.appspot.com/sig-node-node-problem-detector#ci-npd-e2e-node)
1. [Unit tests](https://testgrid.k8s.io/sig-node-node-problem-detector#ci-npd-test)
2. [NPD e2e tests](https://testgrid.k8s.io/sig-node-node-problem-detector#ci-npd-e2e-test)
3. [Kubernetes e2e tests](https://testgrid.k8s.io/sig-node-node-problem-detector#ci-npd-e2e-kubernetes-gce-gci)
4. [Kubernetes nodes e2e tests](https://testgrid.k8s.io/sig-node-node-problem-detector#ci-npd-e2e-node)
## Running tests
@ -310,6 +310,10 @@ See [NPD e2e test documentation](https://github.com/kubernetes/node-problem-dete
[Problem maker](https://github.com/kubernetes/node-problem-detector/blob/master/test/e2e/problemmaker/README.md) is a program used in NPD e2e tests to generate/simulate node problems. It is ONLY intended to be used by NPD e2e tests. Please do NOT run it on your workstation, as it could cause real node problems.
# Compatibility
Node problem detector's architecture has been fairly stable. Recent versions (v0.8.13+) should be able to work with any supported kubernetes versions.
# Docs
* [Custom plugin monitor](docs/custom_plugin_monitor.md)
@ -320,4 +324,4 @@ See [NPD e2e test documentation](https://github.com/kubernetes/node-problem-dete
* [Slides](https://docs.google.com/presentation/d/1bkJibjwWXy8YnB5fna6p-Ltiy-N5p01zUsA22wCNkXA/edit?usp=sharing)
* [Plugin Interface Proposal](https://docs.google.com/document/d/1jK_5YloSYtboj-DtfjmYKxfNnUxCAvohLnsH5aGCAYQ/edit#)
* [Addon Manifest](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/node-problem-detector)
* [Metrics Mode Proposal](https://docs.google.com/document/d/1SeaUz6kBavI283Dq8GBpoEUDrHA2a795xtw0OvjM568/edit)
* [Metrics Mode Proposal](https://docs.google.com/document/d/1SeaUz6kBavI283Dq8GBpoEUDrHA2a795xtw0OvjM568/edit)

View File

@ -1,25 +0,0 @@
# Copyright 2018 The Kubernetes Authors. All rights reserved
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
FROM golang:1.11.0
LABEL maintainer="Andy Xie <andy.xning@gmail.com>"
ENV GOPATH /gopath/
ENV PATH $GOPATH/bin:$PATH
RUN apt-get update && apt-get --yes install libsystemd-dev
RUN go version
RUN go get github.com/tools/godep
RUN godep version
CMD ["/bin/bash"]

26
cloudbuild.yaml Normal file
View File

@ -0,0 +1,26 @@
# See https://cloud.google.com/cloud-build/docs/build-config
# this must be specified in seconds. If omitted, defaults to 600s (10 mins)
timeout: 3600s
options:
# job builds a multi-arch docker image for amd64 and arm64
machineType: E2_HIGHCPU_8
steps:
- name: 'gcr.io/k8s-staging-test-infra/gcb-docker-gcloud:v20230623-56e06d7c18'
entrypoint: bash
env:
- PROW_GIT_TAG=$_GIT_TAG
- PULL_BASE_REF=$_PULL_BASE_REF
- VERSION=$_PULL_BASE_REF
- DOCKER_CLI_EXPERIMENTAL=enabled
args:
- -c
- |
echo "Building/Pushing NPD containers"
apk add musl-dev gcc
make push-container
substitutions:
# _GIT_TAG will be filled with a git-based tag for the image, of the form vYYYYMMDD-hash, and
# can be used as a substitution
_GIT_TAG: 'PLACE_HOLDER'
_PULL_BASE_REF: 'master'

View File

@ -23,17 +23,24 @@ import (
"github.com/spf13/pflag"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/cmd/healthchecker/options"
"k8s.io/node-problem-detector/pkg/custompluginmonitor/types"
"k8s.io/node-problem-detector/pkg/healthchecker"
)
func main() {
// Set glog flag so that it does not log to files.
if err := flag.Set("logtostderr", "true"); err != nil {
fmt.Printf("Failed to set logtostderr=true: %v", err)
os.Exit(int(types.Unknown))
}
klogFlags := flag.NewFlagSet("klog", flag.ExitOnError)
klog.InitFlags(klogFlags)
klogFlags.VisitAll(func(f *flag.Flag) {
switch f.Name {
case "v", "vmodule", "logtostderr":
flag.CommandLine.Var(f.Value, f.Name, f.Usage)
}
})
pflag.CommandLine.AddGoFlagSet(flag.CommandLine)
pflag.CommandLine.MarkHidden("vmodule")
pflag.CommandLine.MarkHidden("logtostderr")
hco := options.NewHealthCheckerOptions()
hco.AddFlags(pflag.CommandLine)

View File

@ -39,7 +39,9 @@ type HealthCheckerOptions struct {
EnableRepair bool
CriCtlPath string
CriSocketPath string
CriTimeout time.Duration
CoolDownTime time.Duration
LoopBackTime time.Duration
HealthCheckTimeout time.Duration
LogPatterns types.LogPatternFlag
}
@ -61,8 +63,12 @@ func (hco *HealthCheckerOptions) AddFlags(fs *pflag.FlagSet) {
"The path to the crictl binary. This is used to check health of cri component.")
fs.StringVar(&hco.CriSocketPath, "cri-socket-path", types.DefaultCriSocketPath,
"The path to the cri socket. Used with crictl to specify the socket path.")
fs.DurationVar(&hco.CriTimeout, "cri-timeout", types.DefaultCriTimeout,
"The duration to wait for crictl to run.")
fs.DurationVar(&hco.CoolDownTime, "cooldown-time", types.DefaultCoolDownTime,
"The duration to wait for the service to be up before attempting repair.")
fs.DurationVar(&hco.LoopBackTime, "loopback-time", types.DefaultLoopBackTime,
"The duration to loop back, if it is 0, health-check will check from start time.")
fs.DurationVar(&hco.HealthCheckTimeout, "health-check-timeout", types.DefaultHealthCheckTimeout,
"The time to wait before marking the component as unhealthy.")
fs.Var(&hco.LogPatterns, "log-pattern",

View File

@ -1,3 +1,4 @@
//go:build journald
// +build journald
/*
@ -25,17 +26,24 @@ import (
"github.com/spf13/pflag"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/cmd/logcounter/options"
"k8s.io/node-problem-detector/pkg/custompluginmonitor/types"
"k8s.io/node-problem-detector/pkg/logcounter"
)
func main() {
// Set glog flag so that it does not log to files.
if err := flag.Set("logtostderr", "true"); err != nil {
fmt.Printf("Failed to set logtostderr=true: %v", err)
os.Exit(int(types.Unknown))
}
klogFlags := flag.NewFlagSet("klog", flag.ExitOnError)
klog.InitFlags(klogFlags)
klogFlags.VisitAll(func(f *flag.Flag) {
switch f.Name {
case "v", "vmodule", "logtostderr":
flag.CommandLine.Var(f.Value, f.Name, f.Usage)
}
})
pflag.CommandLine.AddGoFlagSet(flag.CommandLine)
pflag.CommandLine.MarkHidden("vmodule")
pflag.CommandLine.MarkHidden("logtostderr")
fedo := options.NewLogCounterOptions()
fedo.AddFlags(pflag.CommandLine)

View File

@ -34,6 +34,7 @@ type LogCounterOptions struct {
Lookback string
Delay string
Pattern string
RevertPattern string
Count int
}
@ -46,6 +47,8 @@ func (fedo *LogCounterOptions) AddFlags(fs *pflag.FlagSet) {
"The time duration log watcher delays after node boot time. This is useful when log watcher needs to wait for some time until the node is stable.")
fs.StringVar(&fedo.Pattern, "pattern", "",
"The regular expression to match the problem in log. The pattern must match to the end of the line.")
fs.StringVar(&fedo.RevertPattern, "revert-pattern", "",
"Similar to --pattern but conversely it decreases count value for every match. This is useful to discount a log when another log occurs.")
fs.IntVar(&fedo.Count, "count", 1,
"The number of times the pattern must be found to trigger the condition")
}

View File

@ -1,3 +1,4 @@
//go:build !disable_stackdriver_exporter
// +build !disable_stackdriver_exporter
/*

View File

@ -17,7 +17,9 @@ limitations under the License.
package main
import (
"github.com/golang/glog"
"context"
"k8s.io/klog/v2"
_ "k8s.io/node-problem-detector/cmd/nodeproblemdetector/exporterplugins"
_ "k8s.io/node-problem-detector/cmd/nodeproblemdetector/problemdaemonplugins"
@ -31,16 +33,7 @@ import (
"k8s.io/node-problem-detector/pkg/version"
)
func npdInteractive(npdo *options.NodeProblemDetectorOptions) {
termCh := make(chan error, 1)
defer close(termCh)
if err := npdMain(npdo, termCh); err != nil {
glog.Fatalf("Problem detector failed with error: %v", err)
}
}
func npdMain(npdo *options.NodeProblemDetectorOptions, termCh <-chan error) error {
func npdMain(ctx context.Context, npdo *options.NodeProblemDetectorOptions) error {
if npdo.PrintVersion {
version.PrintVersion()
return nil
@ -53,18 +46,18 @@ func npdMain(npdo *options.NodeProblemDetectorOptions, termCh <-chan error) erro
// Initialize problem daemons.
problemDaemons := problemdaemon.NewProblemDaemons(npdo.MonitorConfigPaths)
if len(problemDaemons) == 0 {
glog.Fatalf("No problem daemon is configured")
klog.Fatalf("No problem daemon is configured")
}
// Initialize exporters.
defaultExporters := []types.Exporter{}
if ke := k8sexporter.NewExporterOrDie(npdo); ke != nil {
if ke := k8sexporter.NewExporterOrDie(ctx, npdo); ke != nil {
defaultExporters = append(defaultExporters, ke)
glog.Info("K8s exporter started.")
klog.Info("K8s exporter started.")
}
if pe := prometheusexporter.NewExporterOrDie(npdo); pe != nil {
defaultExporters = append(defaultExporters, pe)
glog.Info("Prometheus exporter started.")
klog.Info("Prometheus exporter started.")
}
plugableExporters := exporters.NewExporters()
@ -74,10 +67,10 @@ func npdMain(npdo *options.NodeProblemDetectorOptions, termCh <-chan error) erro
npdExporters = append(npdExporters, plugableExporters...)
if len(npdExporters) == 0 {
glog.Fatalf("No exporter is successfully setup")
klog.Fatalf("No exporter is successfully setup")
}
// Initialize NPD core.
p := problemdetector.NewProblemDetector(problemDaemons, npdExporters)
return p.Run(termCh)
return p.Run(ctx)
}

View File

@ -1,30 +0,0 @@
/*
Copyright 2021 The Kubernetes Authors All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package main
import (
"github.com/spf13/pflag"
"k8s.io/node-problem-detector/cmd/options"
)
func main() {
npdo := options.NewNodeProblemDetectorOptions()
npdo.AddFlags(pflag.CommandLine)
pflag.Parse()
npdInteractive(npdo)
}

View File

@ -1,3 +1,4 @@
//go:build !disable_system_log_monitor
// +build !disable_system_log_monitor
/*
@ -19,9 +20,8 @@ limitations under the License.
package main
import (
"errors"
"context"
"fmt"
"io/ioutil"
"os"
"strings"
"testing"
@ -81,24 +81,22 @@ func TestNPDMain(t *testing.T) {
npdo, cleanup := setupNPD(t)
defer cleanup()
termCh := make(chan error, 2)
termCh <- errors.New("close")
defer close(termCh)
if err := npdMain(npdo, termCh); err != nil {
ctx, cancelFunc := context.WithCancel(context.Background())
cancelFunc()
if err := npdMain(ctx, npdo); err != nil {
t.Errorf("termination signal should not return error got, %v", err)
}
}
func writeTempFile(t *testing.T, ext string, contents string) (string, error) {
f, err := ioutil.TempFile("", "*."+ext)
f, err := os.CreateTemp("", "*."+ext)
if err != nil {
return "", fmt.Errorf("cannot create temp file, %v", err)
}
fileName := f.Name()
if err := ioutil.WriteFile(fileName, []byte(contents), 0644); err != nil {
if err := os.WriteFile(fileName, []byte(contents), 0644); err != nil {
os.Remove(fileName)
return "", fmt.Errorf("cannot write config to temp file %s, %v", fileName, err)
}

View File

@ -0,0 +1,50 @@
//go:build unix
/*
Copyright 2021 The Kubernetes Authors All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package main
import (
"context"
"flag"
"github.com/spf13/pflag"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/cmd/options"
)
func main() {
klogFlags := flag.NewFlagSet("klog", flag.ExitOnError)
klog.InitFlags(klogFlags)
klogFlags.VisitAll(func(f *flag.Flag) {
switch f.Name {
case "v", "vmodule", "logtostderr":
flag.CommandLine.Var(f.Value, f.Name, f.Usage)
}
})
pflag.CommandLine.AddGoFlagSet(flag.CommandLine)
pflag.CommandLine.MarkHidden("vmodule")
pflag.CommandLine.MarkHidden("logtostderr")
npdo := options.NewNodeProblemDetectorOptions()
npdo.AddFlags(pflag.CommandLine)
pflag.Parse()
if err := npdMain(context.Background(), npdo); err != nil {
klog.Fatalf("Problem detector failed with error: %v", err)
}
}

View File

@ -17,16 +17,17 @@ limitations under the License.
package main
import (
"errors"
"context"
"flag"
"fmt"
"sync"
"time"
"github.com/golang/glog"
"github.com/spf13/pflag"
"golang.org/x/sys/windows/svc"
"golang.org/x/sys/windows/svc/debug"
"golang.org/x/sys/windows/svc/eventlog"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/cmd/options"
)
@ -43,6 +44,18 @@ var (
)
func main() {
klogFlags := flag.NewFlagSet("klog", flag.ExitOnError)
klog.InitFlags(klogFlags)
klogFlags.VisitAll(func(f *flag.Flag) {
switch f.Name {
case "v", "vmodule", "logtostderr":
flag.CommandLine.Var(f.Value, f.Name, f.Usage)
}
})
pflag.CommandLine.AddGoFlagSet(flag.CommandLine)
pflag.CommandLine.MarkHidden("vmodule")
pflag.CommandLine.MarkHidden("logtostderr")
npdo := options.NewNodeProblemDetectorOptions()
npdo.AddFlags(pflag.CommandLine)
@ -62,7 +75,7 @@ func main() {
func isRunningAsWindowsService() bool {
runningAsService, err := svc.IsWindowsService()
if err != nil {
glog.Errorf("cannot determine if running as Windows Service assuming standalone, %v", err)
klog.Errorf("cannot determine if running as Windows Service assuming standalone, %v", err)
return false
}
return runningAsService
@ -102,26 +115,20 @@ type npdService struct {
}
func (s *npdService) Execute(args []string, r <-chan svc.ChangeRequest, changes chan<- svc.Status) (bool, uint32) {
appTermCh := make(chan error, 1)
svcLoopTermCh := make(chan error, 1)
defer func() {
close(appTermCh)
close(svcLoopTermCh)
}()
changes <- svc.Status{State: svc.StartPending}
changes <- svc.Status{State: svc.Running, Accepts: svcCommandsAccepted}
var appWG sync.WaitGroup
var svcWG sync.WaitGroup
options := s.options
ctx, cancelFunc := context.WithCancel(context.Background())
// NPD application goroutine.
appWG.Add(1)
go func() {
defer appWG.Done()
if err := npdMain(options, appTermCh); err != nil {
if err := npdMain(ctx, options); err != nil {
elog.Warning(windowsEventLogID, err.Error())
}
@ -132,16 +139,36 @@ func (s *npdService) Execute(args []string, r <-chan svc.ChangeRequest, changes
svcWG.Add(1)
go func() {
defer svcWG.Done()
serviceLoop(r, changes, appTermCh, svcLoopTermCh)
for {
select {
case <-ctx.Done():
return
case c := <-r:
switch c.Cmd {
case svc.Interrogate:
changes <- c.CurrentStatus
// Testing deadlock from https://code.google.com/p/winsvc/issues/detail?id=4
time.Sleep(100 * time.Millisecond)
changes <- c.CurrentStatus
case svc.Stop, svc.Shutdown:
elog.Info(windowsEventLogID, fmt.Sprintf("Stopping %s service, %v", svcName, c.Context))
cancelFunc()
case svc.Pause:
elog.Info(windowsEventLogID, "ignoring pause command from Windows service control, not supported")
changes <- svc.Status{State: svc.Paused, Accepts: svcCommandsAccepted}
case svc.Continue:
elog.Info(windowsEventLogID, "ignoring continue command from Windows service control, not supported")
changes <- svc.Status{State: svc.Running, Accepts: svcCommandsAccepted}
default:
elog.Error(windowsEventLogID, fmt.Sprintf("unexpected control request #%d", c))
}
}
}
}()
// Wait for the application go routine to die.
appWG.Wait()
// Ensure that the service control loop is killed.
svcLoopTermCh <- nil
// Wait for the service control loop to terminate.
// Otherwise it's possible that the channel closures cause the application to panic.
svcWG.Wait()
@ -151,31 +178,3 @@ func (s *npdService) Execute(args []string, r <-chan svc.ChangeRequest, changes
return false, uint32(0)
}
func serviceLoop(r <-chan svc.ChangeRequest, changes chan<- svc.Status, appTermCh chan error, svcLoopTermCh chan error) {
for {
select {
case <-svcLoopTermCh:
return
case c := <-r:
switch c.Cmd {
case svc.Interrogate:
changes <- c.CurrentStatus
// Testing deadlock from https://code.google.com/p/winsvc/issues/detail?id=4
time.Sleep(100 * time.Millisecond)
changes <- c.CurrentStatus
case svc.Stop, svc.Shutdown:
elog.Info(windowsEventLogID, fmt.Sprintf("Stopping %s service, %v", svcName, c.Context))
appTermCh <- errors.New("stopping service")
case svc.Pause:
elog.Info(windowsEventLogID, "ignoring pause command from Windows service control, not supported")
changes <- svc.Status{State: svc.Paused, Accepts: svcCommandsAccepted}
case svc.Continue:
elog.Info(windowsEventLogID, "ignoring continue command from Windows service control, not supported")
changes <- svc.Status{State: svc.Running, Accepts: svcCommandsAccepted}
default:
elog.Error(windowsEventLogID, fmt.Sprintf("unexpected control request #%d", c))
}
}
}
}

View File

@ -1,3 +1,4 @@
//go:build !disable_system_log_monitor
// +build !disable_system_log_monitor
/*

View File

@ -1,3 +1,4 @@
//go:build !disable_custom_plugin_monitor
// +build !disable_custom_plugin_monitor
/*

View File

@ -1,3 +1,4 @@
//go:build !disable_system_log_monitor
// +build !disable_system_log_monitor
/*

View File

@ -1,3 +1,4 @@
//go:build !disable_system_stats_monitor
// +build !disable_system_stats_monitor
/*

View File

@ -43,6 +43,10 @@ type NodeProblemDetectorOptions struct {
ServerPort int
// ServerAddress is the address to bind the node problem detector server.
ServerAddress string
// QPS is the maximum QPS to the master from client.
QPS float32
// Burst is the maximum burst for throttle.
Burst int
// exporter options
@ -61,6 +65,10 @@ type NodeProblemDetectorOptions struct {
APIServerWaitInterval time.Duration
// K8sExporterHeartbeatPeriod is the period at which the k8s exporter does forcibly sync with apiserver.
K8sExporterHeartbeatPeriod time.Duration
// K8sExporterWriteEvents determines whether to write Kubernetes Events for problems.
K8sExporterWriteEvents bool
// K8sExporterUpdateNodeConditions determines whether to update Kubernetes Node Conditions for problems.
K8sExporterUpdateNodeConditions bool
// prometheusExporter options
// PrometheusServerPort is the port to bind the Prometheus scrape endpoint. Use 0 to disable.
@ -113,6 +121,8 @@ func (npdo *NodeProblemDetectorOptions) AddFlags(fs *pflag.FlagSet) {
fs.DurationVar(&npdo.APIServerWaitTimeout, "apiserver-wait-timeout", time.Duration(5)*time.Minute, "The timeout on waiting for kube-apiserver to be ready. This is ignored if --enable-k8s-exporter is false.")
fs.DurationVar(&npdo.APIServerWaitInterval, "apiserver-wait-interval", time.Duration(5)*time.Second, "The interval between the checks on the readiness of kube-apiserver. This is ignored if --enable-k8s-exporter is false.")
fs.DurationVar(&npdo.K8sExporterHeartbeatPeriod, "k8s-exporter-heartbeat-period", 5*time.Minute, "The period at which k8s-exporter does forcibly sync with apiserver.")
fs.BoolVar(&npdo.K8sExporterWriteEvents, "k8s-exporter-write-events", true, "Whether to write Kubernetes Event objects with event details.")
fs.BoolVar(&npdo.K8sExporterUpdateNodeConditions, "k8s-exporter-update-node-conditions", true, "Whether to update Kubernetes Node conditions with event details.")
fs.BoolVar(&npdo.PrintVersion, "version", false, "Print version information and quit")
fs.StringVar(&npdo.HostnameOverride, "hostname-override",
"", "Custom node name used to override hostname")
@ -125,6 +135,8 @@ func (npdo *NodeProblemDetectorOptions) AddFlags(fs *pflag.FlagSet) {
20257, "The port to bind the Prometheus scrape endpoint. Prometheus exporter is enabled by default at port 20257. Use 0 to disable.")
fs.StringVar(&npdo.PrometheusServerAddress, "prometheus-address",
"127.0.0.1", "The address to bind the Prometheus scrape endpoint.")
fs.Float32Var(&npdo.QPS, "kube-api-qps", 500, "Maximum QPS to use while talking with Kubernetes API")
fs.IntVar(&npdo.Burst, "kube-api-burst", 500, "Maximum burst for throttle while talking with Kubernetes API")
for _, exporterName := range exporters.GetExporterNames() {
exporterHandler := exporters.GetExporterHandlerOrDie(exporterName)
exporterHandler.Options.SetFlags(fs)

View File

@ -31,7 +31,7 @@
},
{
"type": "temporary",
"reason": "Kerneloops",
"reason": "KernelOops",
"pattern": "System encountered a non-fatal error in \\S+"
}
]

View File

@ -0,0 +1,28 @@
{
"plugin": "filelog",
"pluginConfig": {
"timestamp": "^.{15}",
"message": "(?i)Currently unreadable.*sectors|(?i)Offline uncorrectable sectors",
"timestampFormat": "Jan _2 15:04:05"
},
"logPath": "/var/log/messages",
"lookback": "10h",
"bufferSize": 1,
"source": "disk-monitor",
"skipList": [ " audit:", " audit[" ],
"conditions": [
{
"type": "DiskBadBlock",
"reason": "DiskBadBlock",
"message": "Disk no bad block"
},
],
"rules": [
{
"type": "permanent",
"condition": "DiskBadBlock",
"reason": "DiskBadBlock",
"pattern": ".*([1-9]\\d{2,}) (Currently unreadable.*sectors|Offline uncorrectable sectors).*"
},
]
}

View File

@ -25,6 +25,7 @@
"--component=kubelet",
"--enable-repair=true",
"--cooldown-time=1m",
"--loopback-time=0",
"--health-check-timeout=10s"
],
"timeout": "3m"

View File

@ -0,0 +1,20 @@
{
"plugin": "custom",
"pluginConfig": {
"invoke_interval": "86400s",
"timeout": "5s",
"max_output_length": 80,
"concurrency": 1
},
"source": "iptables-mode-monitor",
"metricsReporting": true,
"conditions": [],
"rules": [
{
"type": "temporary",
"reason": "IPTablesVersionsMismatch",
"path": "./config/plugin/iptables_mode.sh",
"timeout": "5s"
}
]
}

View File

@ -42,12 +42,6 @@
"reason": "KernelOops",
"pattern": "divide error: 0000 \\[#\\d+\\] SMP"
},
{
"type": "permanent",
"condition": "KernelDeadlock",
"reason": "AUFSUmountHung",
"pattern": "task umount\\.aufs:\\w+ blocked for more than \\w+ seconds\\."
},
{
"type": "permanent",
"condition": "KernelDeadlock",

View File

@ -12,9 +12,14 @@
"message": "kernel has no deadlock"
},
{
"type": "ReadonlyFilesystem",
"reason": "FilesystemIsNotReadOnly",
"message": "Filesystem is not read-only"
"type": "XfsShutdown",
"reason": "XfsHasNotShutDown",
"message": "XFS has not shutdown"
},
{
"type": "CperHardwareErrorFatal",
"reason": "CperHardwareHasNoFatalError",
"message": "UEFI CPER has no fatal error"
}
],
"rules": [
@ -58,28 +63,38 @@
"reason": "IOError",
"pattern": "Buffer I/O error .*"
},
{
"type": "permanent",
"condition": "XfsShutdown",
"reason": "XfsHasShutdown",
"pattern": "XFS .* Shutting down filesystem.?"
},
{
"type": "temporary",
"reason": "MemoryReadError",
"pattern": "CE memory read error .*"
},
{
"type": "temporary",
"reason": "CperHardwareErrorCorrected",
"pattern": ".*\\[Hardware Error\\]: event severity: corrected$"
},
{
"type": "temporary",
"reason": "CperHardwareErrorRecoverable",
"pattern": ".*\\[Hardware Error\\]: event severity: recoverable$"
},
{
"type": "permanent",
"condition": "KernelDeadlock",
"reason": "AUFSUmountHung",
"pattern": "task umount\\.aufs:\\w+ blocked for more than \\w+ seconds\\."
"condition": "CperHardwareErrorFatal",
"reason": "CperHardwareErrorFatal",
"pattern": ".*\\[Hardware Error\\]: event severity: fatal$"
},
{
"type": "permanent",
"condition": "KernelDeadlock",
"reason": "DockerHung",
"pattern": "task docker:\\w+ blocked for more than \\w+ seconds\\."
},
{
"type": "permanent",
"condition": "ReadonlyFilesystem",
"reason": "FilesystemIsReadOnly",
"pattern": "Remounting filesystem read-only"
}
]
}

View File

@ -1,5 +1,6 @@
{
"net": {
"excludeInterfaceRegexp": "^(cali|tunl|veth)",
"metricsConfigs": {
"net/rx_bytes": {
"displayName": "net/rx_bytes"

View File

@ -20,8 +20,7 @@ if systemctl -q is-active "$SERVICE"; then
echo "$SERVICE is running"
exit $OK
else
# Does not differenciate stopped/failed service from non-existent
# Does not differentiate stopped/failed service from non-existent
echo "$SERVICE is not running"
exit $NONOK
fi

30
config/plugin/iptables_mode.sh Executable file
View File

@ -0,0 +1,30 @@
#!/bin/bash
# As of iptables 1.8, the iptables command line clients come in two different versions/modes: "legacy",
# which uses the kernel iptables API just like iptables 1.6 and earlier did, and "nft", which translates
# the iptables command-line API into the kernel nftables API.
# Because they connect to two different subsystems in the kernel, you cannot mix rules from different versions.
# Ref: https://github.com/kubernetes-sigs/iptables-wrappers
readonly OK=0
readonly NONOK=1
readonly UNKNOWN=2
# based on: https://github.com/kubernetes-sigs/iptables-wrappers/blob/97b01f43a8e8db07840fc4b95e833a37c0d36b12/iptables-wrapper-installer.sh
readonly num_legacy_lines=$( (iptables-legacy-save || true; ip6tables-legacy-save || true) 2>/dev/null | grep -c '^-' || true)
readonly num_nft_lines=$( (timeout 5 sh -c "iptables-nft-save; ip6tables-nft-save" || true) 2>/dev/null | grep -c '^-' || true)
if [ "$num_legacy_lines" -gt 0 ] && [ "$num_nft_lines" -gt 0 ]; then
echo "Found rules from both versions, iptables-legacy: ${num_legacy_lines} iptables-nft: ${num_nft_lines}"
echo $NONOK
elif [ "$num_legacy_lines" -gt 0 ] && [ "$num_nft_lines" -eq 0 ]; then
echo "Using iptables-legacy: ${num_legacy_lines} rules"
echo $OK
elif [ "$num_legacy_lines" -eq 0 ] && [ "$num_nft_lines" -gt 0 ]; then
echo "Using iptables-nft: ${num_nft_lines} rules"
echo $OK
else
echo "No iptables rules found"
echo $UNKNOWN
fi

View File

@ -0,0 +1,23 @@
{
"plugin": "kmsg",
"logPath": "/dev/kmsg",
"lookback": "5m",
"bufferSize": 10,
"source": "readonly-monitor",
"metricsReporting": true,
"conditions": [
{
"type": "ReadonlyFilesystem",
"reason": "FilesystemIsNotReadOnly",
"message": "Filesystem is not read-only"
}
],
"rules": [
{
"type": "permanent",
"condition": "ReadonlyFilesystem",
"reason": "FilesystemIsReadOnly",
"pattern": "Remounting filesystem read-only"
}
]
}

View File

@ -44,6 +44,9 @@
"disk/bytes_used": {
"displayName": "disk/bytes_used"
},
"disk/percent_used": {
"displayName": "disk/percent_used"
},
"disk/io_time": {
"displayName": "disk/io_time"
},
@ -88,6 +91,9 @@
},
"memory/unevictable_used": {
"displayName": "memory/unevictable_used"
},
"memory/percent_used": {
"displayName": "memory/percent_used"
}
}
},

View File

@ -37,7 +37,8 @@
"--lookback=20m",
"--delay=5m",
"--count=5",
"--pattern=Started Kubernetes kubelet."
"--pattern=Started (Kubernetes kubelet|kubelet.service|kubelet.service - Kubernetes kubelet).",
"--revert-pattern=Stopping (Kubernetes kubelet|kubelet.service|kubelet.service - Kubernetes kubelet)..."
],
"timeout": "1m"
},
@ -51,7 +52,8 @@
"--log-path=/var/log/journal",
"--lookback=20m",
"--count=5",
"--pattern=Starting Docker Application Container Engine..."
"--pattern=Starting (Docker Application Container Engine|docker.service|docker.service - Docker Application Container Engine)...",
"--revert-pattern=Stopping (Docker Application Container Engine|docker.service|docker.service - Docker Application Container Engine)..."
],
"timeout": "1m"
},
@ -65,7 +67,8 @@
"--log-path=/var/log/journal",
"--lookback=20m",
"--count=5",
"--pattern=Starting containerd container runtime..."
"--pattern=Starting (containerd container runtime|containerd.service|containerd.service - containerd container runtime)...",
"--revert-pattern=Stopping (containerd container runtime|containerd.service|containerd.service - containerd container runtime)..."
],
"timeout": "1m"
}

View File

@ -13,17 +13,17 @@
{
"type": "temporary",
"reason": "KubeletStart",
"pattern": "Started Kubernetes kubelet."
"pattern": "Started (Kubernetes kubelet|kubelet.service|kubelet.service - Kubernetes kubelet)."
},
{
"type": "temporary",
"reason": "DockerStart",
"pattern": "Starting Docker Application Container Engine..."
"pattern": "Starting (Docker Application Container Engine|docker.service|docker.service - Docker Application Container Engine)..."
},
{
"type": "temporary",
"reason": "ContainerdStart",
"pattern": "Starting containerd container runtime..."
"pattern": "Starting (containerd container runtime|containerd.service|containerd.service - containerd container runtime)..."
}
]
}

View File

@ -8,7 +8,7 @@ Restart=always
RestartSec=10
ExecStart=/home/kubernetes/bin/node-problem-detector --v=2 --logtostderr --enable-k8s-exporter=false \
--exporter.stackdriver=/home/kubernetes/node-problem-detector/config/exporter/stackdriver-exporter.json \
--config.system-log-monitor=/home/kubernetes/node-problem-detector/config/kernel-monitor.json,/home/kubernetes/node-problem-detector/config/docker-monitor.json,/home/kubernetes/node-problem-detector/config/systemd-monitor.json \
--config.system-log-monitor=/home/kubernetes/node-problem-detector/config/kernel-monitor.json,/home/kubernetes/node-problem-detector/config/readonly-monitor.json,/home/kubernetes/node-problem-detector/config/docker-monitor.json,/home/kubernetes/node-problem-detector/config/systemd-monitor.json \
--config.custom-plugin-monitor=/home/kubernetes/node-problem-detector/config/kernel-monitor-counter.json,/home/kubernetes/node-problem-detector/config/systemd-monitor-counter.json \
--config.system-stats-monitor=/home/kubernetes/node-problem-detector/config/system-stats-monitor.json,/home/kubernetes/node-problem-detector/config/net-cgroup-system-stats-monitor.json

View File

@ -20,6 +20,11 @@
"type": "temporary",
"reason": "CorruptContainerImageLayer",
"pattern": ".*failed to pull and unpack image.*failed to extract layer.*archive/tar: invalid tar header.*"
},
{
"type": "temporary",
"reason": "HCSEmptyLayerchain",
"pattern": ".*Failed to unmarshall layerchain json - invalid character '\\x00' looking for beginning of value*"
}
]
}

View File

@ -13,7 +13,7 @@
{
"type": "temporary",
"reason": "WindowsDefenderThreatsDetected",
"path": "./config/plugin/windows_defender_problem.ps1",
"path": "C:\\etc\\kubernetes\\node-problem-detector\\config\\plugin\\windows_defender_problem.ps1",
"timeout": "3s"
}
]

View File

@ -44,6 +44,9 @@
"disk/bytes_used": {
"displayName": "disk/bytes_used"
},
"disk/percent_used": {
"displayName": "disk/percent_used"
},
"disk/io_time": {
"displayName": "disk/io_time"
},
@ -88,6 +91,9 @@
},
"memory/unevictable_used": {
"displayName": "memory/unevictable_used"
},
"memory/percent_used": {
"displayName": "memory/percent_used"
}
}
}

View File

@ -50,12 +50,6 @@ data:
"reason": "MemoryReadError",
"pattern": "CE memory read error .*"
},
{
"type": "permanent",
"condition": "KernelDeadlock",
"reason": "AUFSUmountHung",
"pattern": "task umount\\.aufs:\\w+ blocked for more than \\w+ seconds\\."
},
{
"type": "permanent",
"condition": "KernelDeadlock",
@ -70,6 +64,30 @@ data:
}
]
}
readonly-monitor.json: |
{
"plugin": "kmsg",
"logPath": "/dev/kmsg",
"lookback": "5m",
"bufferSize": 10,
"source": "readonly-monitor",
"metricsReporting": true,
"conditions": [
{
"type": "ReadonlyFilesystem",
"reason": "FilesystemIsNotReadOnly",
"message": "Filesystem is not read-only"
}
],
"rules": [
{
"type": "permanent",
"condition": "ReadonlyFilesystem",
"reason": "FilesystemIsReadOnly",
"pattern": "Remounting filesystem read-only"
}
]
}
docker-monitor.json: |
{
"plugin": "journald",

View File

@ -0,0 +1,104 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-problem-detector
namespace: kube-system
labels:
app: node-problem-detector
spec:
selector:
matchLabels:
app: node-problem-detector
template:
metadata:
labels:
app: node-problem-detector
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
containers:
- name: node-problem-detector
command:
- /node-problem-detector
- --logtostderr
- --config.system-log-monitor=/config/kernel-monitor.json,/config/readonly-monitor.json,/config/docker-monitor.json
- --config.custom-plugin-monitor=/config/health-checker-kubelet.json
image: registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.19
resources:
limits:
cpu: 10m
memory: 80Mi
requests:
cpu: 10m
memory: 80Mi
imagePullPolicy: Always
securityContext:
privileged: true
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- name: log
mountPath: /var/log
readOnly: true
- name: kmsg
mountPath: /dev/kmsg
readOnly: true
# Make sure node problem detector is in the same timezone
# with the host.
- name: localtime
mountPath: /etc/localtime
readOnly: true
- name: config
mountPath: /config
readOnly: true
- mountPath: /etc/machine-id
name: machine-id
readOnly: true
- mountPath: /run/systemd/system
name: systemd
- mountPath: /var/run/dbus/
name: dbus
mountPropagation: Bidirectional
volumes:
- name: log
# Config `log` to your system log directory
hostPath:
path: /var/log/
- name: kmsg
hostPath:
path: /dev/kmsg
- name: localtime
hostPath:
path: /etc/localtime
- name: config
configMap:
name: node-problem-detector-config
items:
- key: kernel-monitor.json
path: kernel-monitor.json
- key: readonly-monitor.json
path: readonly-monitor.json
- key: docker-monitor.json
path: docker-monitor.json
- name: machine-id
hostPath:
path: /etc/machine-id
type: "File"
- name: systemd
hostPath:
path: /run/systemd/system/
type: ""
- name: dbus
hostPath:
path: /var/run/dbus/
type: ""

View File

@ -28,8 +28,8 @@ spec:
command:
- /node-problem-detector
- --logtostderr
- --config.system-log-monitor=/config/kernel-monitor.json,/config/docker-monitor.json
image: k8s.gcr.io/node-problem-detector/node-problem-detector:v0.8.7
- --config.system-log-monitor=/config/kernel-monitor.json,/config/readonly-monitor.json,/config/docker-monitor.json
image: registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.19
resources:
limits:
cpu: 10m
@ -60,6 +60,7 @@ spec:
- name: config
mountPath: /config
readOnly: true
serviceAccountName: node-problem-detector
volumes:
- name: log
# Config `log` to your system log directory
@ -77,6 +78,8 @@ spec:
items:
- key: kernel-monitor.json
path: kernel-monitor.json
- key: readonly-monitor.json
path: readonly-monitor.json
- key: docker-monitor.json
path: docker-monitor.json
tolerations:

19
deployment/rbac.yaml Normal file
View File

@ -0,0 +1,19 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: node-problem-detector
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: npd-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:node-problem-detector
subjects:
- kind: ServiceAccount
name: node-problem-detector
namespace: kube-system

View File

@ -1,9 +1,62 @@
# Custom Plugin Monitor
## Configuration
### Plugin Config
* `invoke_interval`: Interval at which custom plugins will be invoked.
* `timeout`: Time after which custom plugins invokation will be terminated and considered timeout.
* `timeout`: Time after which custom plugins invocation will be terminated and considered timeout.
* `max_output_length`: The maximum standard output size from custom plugins that NPD will be cut and use for condition status message.
* `concurrency`: The plugin worker number, i.e., how many custom plugins will be invoked concurrently.
* `enable_message_change_based_condition_update`: Flag controls whether message change should result in a condition update.
* `enable_message_change_based_condition_update`: Flag controls whether message change should result in a condition update.
* `skip_initial_status`: Flag controls whether condition will be emitted during plugin initialization.
### Annotated Plugin Configuration Example
```
{
"plugin": "custom",
"pluginConfig": {
"invoke_interval": "30s",
"timeout": "5s",
"max_output_length": 80,
"concurrency": 3,
"enable_message_change_based_condition_update": false
},
"source": "ntp-custom-plugin-monitor",
"metricsReporting": true,
"conditions": [
{
"type": "NTPProblem",
"reason": "NTPIsUp", // This is the default reason shown when healthy
"message": "ntp service is up" // This is the default message shown when healthy
}
],
"rules": [
{
"type": "temporary", // These are not shown unless there's an
// event so they always relate to a problem.
// There are no defaults since there is nothing
// to show unless there's a problem.
"reason": "NTPIsDown", // This is the reason shown for this event
// and the message shown comes from stdout.
"path": "./config/plugin/check_ntp.sh",
"timeout": "3s"
},
{
"type": "permanent", // These are permanent and are shown in the Conditions section
// when running `kubectl describe node ...`
// They have default values shown above in the conditions section
// and also a reason for each specific trigger listed in this rules section.
// Message will come from default for healthy times
// and during unhealthy time message comes from stdout of the check.
"condition": "NTPProblem", // This is the key to connect to the corresponding condition listed above
"reason": "NTPIsDown", // and the reason shown for failures detected in this rule
// and message will be from stdout of the check.
"path": "./config/plugin/check_ntp.sh",
"timeout": "3s"
}
]
}
```

View File

@ -4,6 +4,12 @@ These are notes to help follow a consistent release process. See something
important missing? Please submit a pull request to add anything else that would
be useful!
## Prerequisites
Ensure access to the container image [staging registry](https://console.cloud.google.com/gcr/images/k8s-staging-npd/global/node-problem-detector).
Add email to `k8s-infra-staging-npd` group in sig-node [groups.yaml](https://github.com/kubernetes/k8s.io/blob/main/groups/sig-node/groups.yaml).
See example https://github.com/kubernetes/k8s.io/pull/1599.
## Preparing for a release
There are a few steps that should be taken prior to creating the actual release
@ -11,37 +17,100 @@ itself.
1. Collect changes since last release. This can be done by looking directly at
merged commit messages (``git log [last_release_tag]...HEAD``), or by
viewing the changes on GitHub ([example:
https://github.com/kubernetes/node-problem-detector/compare/v0.8.6...master](https://github.com/kubernetes/node-problem-detector/compare/v0.8.6...master)).
viewing the changes on GitHub (example: https://github.com/kubernetes/node-problem-detector/compare/v0.8.15...master).
1. Based on the changes to be included in the release, determine what the next
2. Based on the changes to be included in the release, determine what the next
release number should be. We strive to follow [SemVer](https://semver.org/)
as much as possible.
1. Update [CHANGELOG](https://github.com/kubernetes/node-problem-detector/blob/master/CHANGELOG.md)
3. Update [CHANGELOG](https://github.com/kubernetes/node-problem-detector/blob/master/CHANGELOG.md)
with all significant changes.
## Create release
Once changes have been merged to the CHANGELOG, perform the actual release via
GitHub. When creating the release, make sure to include the following in the
body of the release:
### Create the new version tag
#### Option 1
```
# Use v0.8.17 as an example.
git clone git@github.com:kubernetes/node-problem-detector.git
cd node-problem-detector/
git tag v0.8.17
git push origin v0.8.17
```
#### Option 2
Update [version.txt](https://github.com/kubernetes/node-problem-detector/blob/master/version.txt)
(example https://github.com/kubernetes/node-problem-detector/pull/869).
### Build and push artifacts
This step builds the NPD into container files and tar files.
- The container file is pushed to the [staging registry](https://console.cloud.google.com/gcr/images/k8s-staging-npd/global/node-problem-detector).
You will promote the new image to registry.k8s.io later.
- The tar files are generated locally. You will upload those to github in the
release note later.
**Note: You need the access mentioned in the [prerequisites](#prerequisites)
section to perform steps in this section.**
```
# One-time setup
sudo apt-get install libsystemd-dev gcc-aarch64-linux-gnu
cd node-problem-detector
make release
# Get SHA256 of the tar files. For example
sha256sum node-problem-detector-v0.8.17-linux_amd64.tar.gz
sha256sum node-problem-detector-v0.8.17-linux_arm64.tar.gz
sha256sum node-problem-detector-v0.8.17-windows_amd64.tar.gz
# Get MD5 of the tar files. For example
md5sum node-problem-detector-v0.8.17-linux_amd64.tar.gz
md5sum node-problem-detector-v0.8.17-linux_arm64.tar.gz
md5sum node-problem-detector-v0.8.17-windows_amd64.tar.gz
# Verify container image in staging registry and get SHA256.
docker pull gcr.io/k8s-staging-npd/node-problem-detector:v0.8.17
docker image ls gcr.io/k8s-staging-npd/node-problem-detector --digests
```
### Promote new NPD image to registry.k8s.io
1. Get the SHA256 from the new NPD image from the [staging registry](https://console.cloud.google.com/gcr/images/k8s-staging-npd/global/node-problem-detector)
or previous step.
2. Promote the NPD image to registry.k8s.io ([images.yaml](https://github.com/kubernetes/k8s.io/blob/main/registry.k8s.io/images/k8s-staging-npd/images.yaml), example https://github.com/kubernetes/k8s.io/pull/6523).
3. Verify the container image.
```
docker pull registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.17
docker image ls registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.17
```
### Create the release note
Go to https://github.com/kubernetes/node-problem-detector/releases, draft a new
release note and publish. Make sure to include the following in the body of the
release note:
1. For convenience, add a link to easily view the changes since the last
release (e.g.
[https://github.com/kubernetes/node-problem-detector/compare/v0.8.5...v0.8.6](https://github.com/kubernetes/node-problem-detector/compare/v0.8.5...v0.8.6)).
[https://github.com/kubernetes/node-problem-detector/compare/v0.8.15...v0.8.17](https://github.com/kubernetes/node-problem-detector/compare/v0.8.15...v0.8.17)).
1. There is no need to duplicate everything from the CHANGELOG, but include the
2. There is no need to duplicate everything from the CHANGELOG, but include the
most significant things so someone just viewing the release entry will have
an idea of what it includes.
1. Provide a link to the new image release (e.g. `Image:
k8s.gcr.io/node-problem-detector/node-problem-detector:v0.8.6`)
3. Provide a link to the new image release (e.g. `Image:
registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.17`)
4. Upload the tar files built from [pevious step](#build-and-push-artifacts),
and include the SHA and MD5.
## Post release steps
1. Update image version in
[deployment/node-problem-detector.yaml](https://github.com/kubernetes/node-problem-detector/blob/422c088d623488be33aa697588655440c4e6a063/deployment/node-problem-detector.yaml#L32).
1. Update image version in node-problem-detector repo, so anyone deploying
directly from the repo deployment file will get the newest image deployed.
Example https://github.com/kubernetes/node-problem-detector/pull/897.
Update the image version in the deployment file so anyone deploying directly
from the repo deployment file will get the newest image deployed.
2. Update the NPD version in [kubernetes/kubernetes](https://github.com/kubernetes/kubernetes)
repo, so that kubernetes clusters use the new NPD version. Example
https://github.com/kubernetes/kubernetes/pull/123740.

135
go.mod
View File

@ -1,41 +1,110 @@
module k8s.io/node-problem-detector
go 1.15
go 1.24.2
require (
cloud.google.com/go v0.45.1
code.cloudfoundry.org/clock v0.0.0-20180518195852-02e53af36e6c
contrib.go.opencensus.io/exporter/prometheus v0.0.0-20190427222117-f6cda26f80a3
contrib.go.opencensus.io/exporter/stackdriver v0.13.4
github.com/StackExchange/wmi v0.0.0-20181212234831-e0a55b97c705 // indirect
github.com/avast/retry-go v2.4.1+incompatible
github.com/cobaugh/osrelease v0.0.0-20181218015638-a93a0a55a249
github.com/coreos/go-systemd v0.0.0-20190321100706-95778dfbb74e
cloud.google.com/go/compute/metadata v0.6.0
contrib.go.opencensus.io/exporter/prometheus v0.4.2
contrib.go.opencensus.io/exporter/stackdriver v0.13.14
github.com/acobaugh/osrelease v0.1.0
github.com/avast/retry-go/v4 v4.6.1
github.com/coreos/go-systemd/v22 v22.5.0
github.com/euank/go-kmsg-parser v2.0.0+incompatible
github.com/go-ole/go-ole v1.2.4 // indirect
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b
github.com/google/cadvisor v0.36.0
github.com/hpcloud/tail v1.0.0
github.com/onsi/ginkgo v1.10.3
github.com/onsi/gomega v1.7.1
github.com/pborman/uuid v1.2.0
github.com/prometheus/client_model v0.0.0-20190812154241-14fe0d1b01d4
github.com/prometheus/common v0.4.1
github.com/prometheus/procfs v0.2.0
github.com/shirou/gopsutil v2.19.12+incompatible
github.com/spf13/pflag v1.0.5
github.com/stretchr/testify v1.6.1
github.com/tedsuo/ifrit v0.0.0-20180802180643-bea94bb476cc // indirect
go.opencensus.io v0.22.4
golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45
golang.org/x/sys v0.0.0-20201211090839-8ad439b19e0f
google.golang.org/api v0.10.0
k8s.io/api v0.0.0-20190816222004-e3a6b8045b0b
k8s.io/apimachinery v0.0.0-20190816221834-a9f1d8a9c101
k8s.io/client-go v11.0.1-0.20190805182717-6502b5e7b1b5+incompatible
k8s.io/heapster v0.0.0-20180704153620-b25f8a16208f
k8s.io/kubernetes v1.14.6
k8s.io/test-infra v0.0.0-20190914015041-e1cbc3ccd91c
github.com/prometheus/client_model v0.6.2
github.com/prometheus/common v0.63.0
github.com/prometheus/procfs v0.16.1
github.com/shirou/gopsutil/v3 v3.24.5
github.com/spf13/pflag v1.0.6
github.com/stretchr/testify v1.10.0
go.opencensus.io v0.24.0
golang.org/x/sys v0.32.0
google.golang.org/api v0.230.0
k8s.io/api v0.33.0
k8s.io/apimachinery v0.33.0
k8s.io/client-go v0.33.0
k8s.io/klog/v2 v2.130.1
k8s.io/utils v0.0.0-20250321185631-1f6e0b77f77e
)
replace git.apache.org/thrift.git => github.com/apache/thrift v0.0.0-20180902110319-2566ecd5d999
require (
cloud.google.com/go/auth v0.16.0 // indirect
cloud.google.com/go/auth/oauth2adapt v0.2.8 // indirect
cloud.google.com/go/monitoring v1.20.3 // indirect
cloud.google.com/go/trace v1.10.11 // indirect
github.com/aws/aws-sdk-go v1.44.72 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/census-instrumentation/opencensus-proto v0.4.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
github.com/emicklei/go-restful/v3 v3.11.0 // indirect
github.com/felixge/httpsnoop v1.0.4 // indirect
github.com/fsnotify/fsnotify v1.6.0 // indirect
github.com/fxamacker/cbor/v2 v2.7.0 // indirect
github.com/go-kit/log v0.2.1 // indirect
github.com/go-logfmt/logfmt v0.5.1 // indirect
github.com/go-logr/logr v1.4.2 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-ole/go-ole v1.2.6 // indirect
github.com/go-openapi/jsonpointer v0.21.0 // indirect
github.com/go-openapi/jsonreference v0.20.2 // indirect
github.com/go-openapi/swag v0.23.0 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.4 // indirect
github.com/google/gnostic-models v0.6.9 // indirect
github.com/google/go-cmp v0.7.0 // indirect
github.com/google/s2a-go v0.1.9 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.6 // indirect
github.com/googleapis/gax-go/v2 v2.14.1 // indirect
github.com/jmespath/go-jmespath v0.4.0 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/klauspost/compress v1.17.9 // indirect
github.com/lufia/plan9stats v0.0.0-20211012122336-39d0f177ccd0 // indirect
github.com/mailru/easyjson v0.7.7 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
github.com/power-devops/perfstat v0.0.0-20210106213030-5aafc221ea8c // indirect
github.com/prometheus/client_golang v1.20.4 // indirect
github.com/prometheus/prometheus v0.35.0 // indirect
github.com/prometheus/statsd_exporter v0.22.7 // indirect
github.com/shoenig/go-m1cpu v0.1.6 // indirect
github.com/tklauser/go-sysconf v0.3.12 // indirect
github.com/tklauser/numcpus v0.6.1 // indirect
github.com/x448/float16 v0.8.4 // indirect
github.com/yusufpapurcu/wmi v1.2.4 // indirect
go.opentelemetry.io/auto/sdk v1.1.0 // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.60.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.60.0 // indirect
go.opentelemetry.io/otel v1.35.0 // indirect
go.opentelemetry.io/otel/metric v1.35.0 // indirect
go.opentelemetry.io/otel/trace v1.35.0 // indirect
golang.org/x/crypto v0.37.0 // indirect
golang.org/x/net v0.39.0 // indirect
golang.org/x/oauth2 v0.29.0 // indirect
golang.org/x/sync v0.13.0 // indirect
golang.org/x/term v0.31.0 // indirect
golang.org/x/text v0.24.0 // indirect
golang.org/x/time v0.11.0 // indirect
google.golang.org/genproto v0.0.0-20240730163845-b1a4ccb954bf // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20250218202821-56aae31c358a // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20250414145226-207652e42e2e // indirect
google.golang.org/grpc v1.72.0 // indirect
google.golang.org/protobuf v1.36.6 // indirect
gopkg.in/evanphx/json-patch.v4 v4.12.0 // indirect
gopkg.in/fsnotify.v1 v1.4.7 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/kube-openapi v0.0.0-20250318190949-c8a335a9a2ff // indirect
sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 // indirect
sigs.k8s.io/randfill v1.0.0 // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.6.0 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)

1969
go.sum

File diff suppressed because it is too large Load Diff

46
hack/print-tar-sha-md5.sh Executable file
View File

@ -0,0 +1,46 @@
#!/bin/bash
# Copyright 2024 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -o errexit
set -o nounset
set -o pipefail
VERSION="$1"
NPD_LINUX_AMD64=node-problem-detector-${VERSION}-linux_amd64.tar.gz
NPD_LINUX_ARM64=node-problem-detector-${VERSION}-linux_arm64.tar.gz
NPD_WINDOWS_AMD64=node-problem-detector-${VERSION}-windows_amd64.tar.gz
SHA_NPD_LINUX_AMD64=$(sha256sum ${NPD_LINUX_AMD64} | cut -d' ' -f1)
SHA_NPD_LINUX_ARM64=$(sha256sum ${NPD_LINUX_ARM64} | cut -d' ' -f1)
SHA_NPD_WINDOWS_AMD64=$(sha256sum ${NPD_WINDOWS_AMD64} | cut -d' ' -f1)
MD5_NPD_LINUX_AMD64=$(md5sum ${NPD_LINUX_AMD64} | cut -d' ' -f1)
MD5_NPD_LINUX_ARM64=$(md5sum ${NPD_LINUX_ARM64} | cut -d' ' -f1)
MD5_NPD_WINDOWS_AMD64=$(md5sum ${NPD_WINDOWS_AMD64} | cut -d' ' -f1)
echo
echo **${NPD_LINUX_AMD64}**:
echo **SHA**: ${SHA_NPD_LINUX_AMD64}
echo **MD5**: ${MD5_NPD_LINUX_AMD64}
echo
echo **${NPD_LINUX_ARM64}**:
echo **SHA**: ${SHA_NPD_LINUX_ARM64}
echo **MD5**: ${MD5_NPD_LINUX_ARM64}
echo
echo **${NPD_WINDOWS_AMD64}**:
echo **SHA**: ${SHA_NPD_WINDOWS_AMD64}
echo **MD5**: ${MD5_NPD_WINDOWS_AMD64}

32
hack/tag-release.sh Executable file
View File

@ -0,0 +1,32 @@
#!/bin/bash -xe
# Copyright 2023 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
VERSION=$(cat version.txt)
if [[ ! "${VERSION}" =~ ^v([0-9]+[.][0-9]+)[.]([0-9]+)(-(alpha|beta)[.]([0-9]+))?$ ]]; then
echo "Version ${VERSION} must be 'X.Y.Z', 'X.Y.Z-alpha.N', or 'X.Y.Z-beta.N'"
exit 1
fi
if [ "$(git tag -l "${VERSION}")" ]; then
echo "Tag ${VERSION} already exists"
exit 1
fi
git tag -a -m "Release ${VERSION}" "${VERSION}"
git push origin "${VERSION}"
echo "release_tag=refs/tags/${VERSION}" >> $GITHUB_OUTPUT

30
hack/verify-gomod.sh Executable file
View File

@ -0,0 +1,30 @@
#!/bin/bash
# Copyright 2024 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -o errexit
set -o nounset
set -o pipefail
make gomod
changes=$(git status --porcelain go.mod go.sum vendor/ tests/e2e/go.mod tests/e2e/go.sum || true)
if [ -n "${changes}" ]; then
echo "ERROR: go modules are not up to date; please run: make gomod"
echo "changed files:"
printf "%s" "${changes}\n"
echo "git diff:"
git --no-pager diff
exit 1
fi

View File

@ -18,10 +18,10 @@ package custompluginmonitor
import (
"encoding/json"
"io/ioutil"
"os"
"time"
"github.com/golang/glog"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/pkg/custompluginmonitor/plugin"
cpmtypes "k8s.io/node-problem-detector/pkg/custompluginmonitor/types"
@ -47,7 +47,6 @@ type customPluginMonitor struct {
config cpmtypes.CustomPluginConfig
conditions []types.Condition
plugin *plugin.Plugin
resultChan <-chan cpmtypes.Result
statusChan chan *types.Status
tomb *tomb.Tomb
}
@ -58,27 +57,27 @@ func NewCustomPluginMonitorOrDie(configPath string) types.Monitor {
configPath: configPath,
tomb: tomb.NewTomb(),
}
f, err := ioutil.ReadFile(configPath)
f, err := os.ReadFile(configPath)
if err != nil {
glog.Fatalf("Failed to read configuration file %q: %v", configPath, err)
klog.Fatalf("Failed to read configuration file %q: %v", configPath, err)
}
err = json.Unmarshal(f, &c.config)
if err != nil {
glog.Fatalf("Failed to unmarshal configuration file %q: %v", configPath, err)
klog.Fatalf("Failed to unmarshal configuration file %q: %v", configPath, err)
}
// Apply configurations
err = (&c.config).ApplyConfiguration()
if err != nil {
glog.Fatalf("Failed to apply configuration for %q: %v", configPath, err)
klog.Fatalf("Failed to apply configuration for %q: %v", configPath, err)
}
// Validate configurations
err = c.config.Validate()
if err != nil {
glog.Fatalf("Failed to validate custom plugin config %+v: %v", c.config, err)
klog.Fatalf("Failed to validate custom plugin config %+v: %v", c.config, err)
}
glog.Infof("Finish parsing custom plugin monitor config file %s: %+v", c.configPath, c.config)
klog.Infof("Finish parsing custom plugin monitor config file %s: %+v", c.configPath, c.config)
c.plugin = plugin.NewPlugin(c.config)
// A 1000 size channel should be big enough.
@ -97,32 +96,39 @@ func initializeProblemMetricsOrDie(rules []*cpmtypes.CustomRule) {
if rule.Type == types.Perm {
err := problemmetrics.GlobalProblemMetricsManager.SetProblemGauge(rule.Condition, rule.Reason, false)
if err != nil {
glog.Fatalf("Failed to initialize problem gauge metrics for problem %q, reason %q: %v",
klog.Fatalf("Failed to initialize problem gauge metrics for problem %q, reason %q: %v",
rule.Condition, rule.Reason, err)
}
}
err := problemmetrics.GlobalProblemMetricsManager.IncrementProblemCounter(rule.Reason, 0)
if err != nil {
glog.Fatalf("Failed to initialize problem counter metrics for %q: %v", rule.Reason, err)
klog.Fatalf("Failed to initialize problem counter metrics for %q: %v", rule.Reason, err)
}
}
}
func (c *customPluginMonitor) Start() (<-chan *types.Status, error) {
glog.Infof("Start custom plugin monitor %s", c.configPath)
klog.Infof("Start custom plugin monitor %s", c.configPath)
go c.plugin.Run()
go c.monitorLoop()
return c.statusChan, nil
}
func (c *customPluginMonitor) Stop() {
glog.Infof("Stop custom plugin monitor %s", c.configPath)
klog.Infof("Stop custom plugin monitor %s", c.configPath)
c.tomb.Stop()
}
// monitorLoop is the main loop of customPluginMonitor.
// there is one customPluginMonitor, one plugin instance for each configPath.
// each runs rules in parallel at pre-configured concurrency, and interval.
func (c *customPluginMonitor) monitorLoop() {
c.initializeStatus()
c.initializeConditions()
if *c.config.PluginGlobalConfig.SkipInitialStatus {
klog.Infof("Skipping sending initial status. Using default conditions: %+v", c.conditions)
} else {
c.sendInitialStatus()
}
resultChan := c.plugin.GetResultChan()
@ -130,16 +136,16 @@ func (c *customPluginMonitor) monitorLoop() {
select {
case result, ok := <-resultChan:
if !ok {
glog.Errorf("Result channel closed: %s", c.configPath)
klog.Errorf("Result channel closed: %s", c.configPath)
return
}
glog.V(3).Infof("Receive new plugin result for %s: %+v", c.configPath, result)
klog.V(3).Infof("Receive new plugin result for %s: %+v", c.configPath, result)
status := c.generateStatus(result)
glog.V(3).Infof("New status generated: %+v", status)
klog.V(3).Infof("New status generated: %+v", status)
c.statusChan <- status
case <-c.tomb.Stopping():
c.plugin.Stop()
glog.Infof("Custom plugin monitor stopped: %s", c.configPath)
klog.Infof("Custom plugin monitor stopped: %s", c.configPath)
c.tomb.Done()
return
}
@ -232,6 +238,7 @@ func (c *customPluginMonitor) generateStatus(result cpmtypes.Result) *types.Stat
condition.Type,
status,
newReason,
newMessage,
timestamp,
)
@ -252,7 +259,7 @@ func (c *customPluginMonitor) generateStatus(result cpmtypes.Result) *types.Stat
err := problemmetrics.GlobalProblemMetricsManager.IncrementProblemCounter(
event.Reason, 1)
if err != nil {
glog.Errorf("Failed to update problem counter metrics for %q: %v",
klog.Errorf("Failed to update problem counter metrics for %q: %v",
event.Reason, err)
}
}
@ -260,7 +267,7 @@ func (c *customPluginMonitor) generateStatus(result cpmtypes.Result) *types.Stat
err := problemmetrics.GlobalProblemMetricsManager.SetProblemGauge(
condition.Type, condition.Reason, condition.Status == types.True)
if err != nil {
glog.Errorf("Failed to update problem gauge metrics for problem %q, reason %q: %v",
klog.Errorf("Failed to update problem gauge metrics for problem %q, reason %q: %v",
condition.Type, condition.Reason, err)
}
}
@ -273,7 +280,7 @@ func (c *customPluginMonitor) generateStatus(result cpmtypes.Result) *types.Stat
}
// Log only if condition has changed
if len(activeProblemEvents) != 0 || len(inactiveProblemEvents) != 0 {
glog.V(0).Infof("New status generated: %+v", status)
klog.V(0).Infof("New status generated: %+v", status)
}
return status
}
@ -289,11 +296,9 @@ func toConditionStatus(s cpmtypes.Status) types.ConditionStatus {
}
}
// initializeStatus initializes the internal condition and also reports it to the node problem detector.
func (c *customPluginMonitor) initializeStatus() {
// Initialize the default node conditions
c.conditions = initialConditions(c.config.DefaultConditions)
glog.Infof("Initialize condition generated: %+v", c.conditions)
// sendInitialStatus sends the initial status to the node problem detector.
func (c *customPluginMonitor) sendInitialStatus() {
klog.Infof("Sending initial status for %s with conditions: %+v", c.config.Source, c.conditions)
// Update the initial status
c.statusChan <- &types.Status{
Source: c.config.Source,
@ -301,6 +306,12 @@ func (c *customPluginMonitor) initializeStatus() {
}
}
// initializeConditions initializes the internal node conditions.
func (c *customPluginMonitor) initializeConditions() {
c.conditions = initialConditions(c.config.DefaultConditions)
klog.Infof("Initialized conditions for %s: %+v", c.configPath, c.conditions)
}
func initialConditions(defaults []types.Condition) []types.Condition {
conditions := make([]types.Condition, len(defaults))
copy(conditions, defaults)

View File

@ -20,14 +20,13 @@ import (
"context"
"fmt"
"io"
"io/ioutil"
"os/exec"
"strings"
"sync"
"syscall"
"time"
"github.com/golang/glog"
"k8s.io/klog/v2"
cpmtypes "k8s.io/node-problem-detector/pkg/custompluginmonitor/types"
"k8s.io/node-problem-detector/pkg/util"
"k8s.io/node-problem-detector/pkg/util/tomb"
@ -61,7 +60,7 @@ func (p *Plugin) GetResultChan() <-chan cpmtypes.Result {
func (p *Plugin) Run() {
defer func() {
glog.Info("Stopping plugin execution")
klog.Info("Stopping plugin execution")
close(p.resultChan)
p.tomb.Done()
}()
@ -90,9 +89,10 @@ func (p *Plugin) Run() {
// run each rule in parallel and wait for them to complete
func (p *Plugin) runRules() {
glog.V(3).Info("Start to run custom plugins")
klog.V(3).Info("Start to run custom plugins")
for _, rule := range p.config.Rules {
// syncChan limits concurrent goroutines to configured PluginGlobalConfig.Concurrency value
p.syncChan <- struct{}{}
p.Add(1)
go func(rule *cpmtypes.CustomRule) {
@ -103,8 +103,12 @@ func (p *Plugin) runRules() {
start := time.Now()
exitStatus, message := p.run(*rule)
level := klog.Level(3)
if exitStatus != 0 {
level = klog.Level(2)
}
glog.V(3).Infof("Rule: %+v. Start time: %v. End time: %v. Duration: %v", rule, start, time.Now(), time.Since(start))
klog.V(level).Infof("Rule: %+v. Start time: %v. End time: %v. Duration: %v", rule, start, time.Now(), time.Since(start))
result := cpmtypes.Result{
Rule: rule,
@ -112,26 +116,27 @@ func (p *Plugin) runRules() {
Message: message,
}
// pipes result into resultChan which customPluginMonitor instance generates status from
p.resultChan <- result
// Let the result be logged at a higher verbosity level. If there is a change in status it is logged later.
glog.V(3).Infof("Add check result %+v for rule %+v", result, rule)
klog.V(level).Infof("Add check result %+v for rule %+v", result, rule)
}(rule)
}
p.Wait()
glog.V(3).Info("Finish running custom plugins")
klog.V(3).Info("Finish running custom plugins")
}
// readFromReader reads the maxBytes from the reader and drains the rest.
func readFromReader(reader io.ReadCloser, maxBytes int64) ([]byte, error) {
limitReader := io.LimitReader(reader, maxBytes)
data, err := ioutil.ReadAll(limitReader)
data, err := io.ReadAll(limitReader)
if err != nil {
return []byte{}, err
}
// Drain the reader
if _, err := io.Copy(ioutil.Discard, reader); err != nil {
if _, err := io.Copy(io.Discard, reader); err != nil {
return []byte{}, err
}
return data, nil
@ -152,16 +157,16 @@ func (p *Plugin) run(rule cpmtypes.CustomRule) (exitStatus cpmtypes.Status, outp
stdoutPipe, err := cmd.StdoutPipe()
if err != nil {
glog.Errorf("Error creating stdout pipe for plugin %q: error - %v", rule.Path, err)
klog.Errorf("Error creating stdout pipe for plugin %q: error - %v", rule.Path, err)
return cpmtypes.Unknown, "Error creating stdout pipe for plugin. Please check the error log"
}
stderrPipe, err := cmd.StderrPipe()
if err != nil {
glog.Errorf("Error creating stderr pipe for plugin %q: error - %v", rule.Path, err)
klog.Errorf("Error creating stderr pipe for plugin %q: error - %v", rule.Path, err)
return cpmtypes.Unknown, "Error creating stderr pipe for plugin. Please check the error log"
}
if err := cmd.Start(); err != nil {
glog.Errorf("Error in starting plugin %q: error - %v", rule.Path, err)
klog.Errorf("Error in starting plugin %q: error - %v", rule.Path, err)
return cpmtypes.Unknown, "Error in starting plugin. Please check the error log"
}
@ -177,9 +182,9 @@ func (p *Plugin) run(rule cpmtypes.CustomRule) (exitStatus cpmtypes.Status, outp
if ctx.Err() == context.Canceled {
return
}
glog.Errorf("Error in running plugin timeout %q", rule.Path)
klog.Errorf("Error in running plugin timeout %q", rule.Path)
if cmd.Process == nil || cmd.Process.Pid == 0 {
glog.Errorf("Error in cmd.Process check %q", rule.Path)
klog.Errorf("Error in cmd.Process check %q", rule.Path)
break
}
@ -189,7 +194,7 @@ func (p *Plugin) run(rule cpmtypes.CustomRule) (exitStatus cpmtypes.Status, outp
err := util.Kill(cmd)
if err != nil {
glog.Errorf("Error in kill process %d, %v", cmd.Process.Pid, err)
klog.Errorf("Error in kill process %d, %v", cmd.Process.Pid, err)
}
case <-waitChan:
return
@ -218,18 +223,18 @@ func (p *Plugin) run(rule cpmtypes.CustomRule) (exitStatus cpmtypes.Status, outp
wg.Wait()
if stdoutErr != nil {
glog.Errorf("Error reading stdout for plugin %q: error - %v", rule.Path, err)
klog.Errorf("Error reading stdout for plugin %q: error - %v", rule.Path, err)
return cpmtypes.Unknown, "Error reading stdout for plugin. Please check the error log"
}
if stderrErr != nil {
glog.Errorf("Error reading stderr for plugin %q: error - %v", rule.Path, err)
klog.Errorf("Error reading stderr for plugin %q: error - %v", rule.Path, err)
return cpmtypes.Unknown, "Error reading stderr for plugin. Please check the error log"
}
if err := cmd.Wait(); err != nil {
if _, ok := err.(*exec.ExitError); !ok {
glog.Errorf("Error in waiting for plugin %q: error - %v. output - %q", rule.Path, err, string(stdout))
klog.Errorf("Error in waiting for plugin %q: error - %v. output - %q", rule.Path, err, string(stdout))
return cpmtypes.Unknown, "Error in waiting for plugin. Please check the error log"
}
}
@ -268,12 +273,12 @@ func (p *Plugin) run(rule cpmtypes.CustomRule) (exitStatus cpmtypes.Status, outp
// Stop the plugin.
func (p *Plugin) Stop() {
p.tomb.Stop()
glog.Info("Stop plugin execution")
klog.Info("Stop plugin execution")
}
func logPluginStderr(rule cpmtypes.CustomRule, logs string, logLevel glog.Level) {
func logPluginStderr(rule cpmtypes.CustomRule, logs string, logLevel klog.Level) {
if len(logs) != 0 {
glog.V(logLevel).Infof("Start logs from plugin %+v \n %s", rule, logs)
glog.V(logLevel).Infof("End logs from plugin %+v", rule)
klog.V(logLevel).Infof("Start logs from plugin %+v \n %s", rule, logs)
klog.V(logLevel).Infof("End logs from plugin %+v", rule)
}
}

View File

@ -33,6 +33,7 @@ var (
defaultConcurrency = 3
defaultMessageChangeBasedConditionUpdate = false
defaultEnableMetricsReporting = true
defaultSkipInitialStatus = false
customPluginName = "custom"
)
@ -52,9 +53,11 @@ type pluginGlobalConfig struct {
Concurrency *int `json:"concurrency,omitempty"`
// EnableMessageChangeBasedConditionUpdate indicates whether NPD should enable message change based condition update.
EnableMessageChangeBasedConditionUpdate *bool `json:"enable_message_change_based_condition_update,omitempty"`
// SkipInitialStatus prevents the first status update with default conditions
SkipInitialStatus *bool `json:"skip_initial_status,omitempty"`
}
// Custom plugin config is the configuration of custom plugin monitor.
// CustomPluginConfig is the configuration of custom plugin monitor.
type CustomPluginConfig struct {
// Plugin is the name of plugin which is currently used.
// Currently supported: custom.
@ -105,6 +108,10 @@ func (cpc *CustomPluginConfig) ApplyConfiguration() error {
cpc.PluginGlobalConfig.EnableMessageChangeBasedConditionUpdate = &defaultMessageChangeBasedConditionUpdate
}
if cpc.PluginGlobalConfig.SkipInitialStatus == nil {
cpc.PluginGlobalConfig.SkipInitialStatus = &defaultSkipInitialStatus
}
for _, rule := range cpc.Rules {
if rule.TimeoutString != nil {
timeout, err := time.ParseDuration(*rule.TimeoutString)

View File

@ -33,6 +33,7 @@ func TestCustomPluginConfigApplyConfiguration(t *testing.T) {
concurrency := 2
messageChangeBasedConditionUpdate := true
disableMetricsReporting := false
disableInitialStatusUpdate := true
ruleTimeout := 1 * time.Second
ruleTimeoutString := ruleTimeout.String()
@ -62,6 +63,7 @@ func TestCustomPluginConfigApplyConfiguration(t *testing.T) {
MaxOutputLength: &defaultMaxOutputLength,
Concurrency: &defaultConcurrency,
EnableMessageChangeBasedConditionUpdate: &defaultMessageChangeBasedConditionUpdate,
SkipInitialStatus: &defaultSkipInitialStatus,
},
EnableMetricsReporting: &defaultEnableMetricsReporting,
Rules: []*CustomRule{
@ -91,6 +93,7 @@ func TestCustomPluginConfigApplyConfiguration(t *testing.T) {
MaxOutputLength: &defaultMaxOutputLength,
Concurrency: &defaultConcurrency,
EnableMessageChangeBasedConditionUpdate: &defaultMessageChangeBasedConditionUpdate,
SkipInitialStatus: &defaultSkipInitialStatus,
},
EnableMetricsReporting: &defaultEnableMetricsReporting,
},
@ -110,6 +113,7 @@ func TestCustomPluginConfigApplyConfiguration(t *testing.T) {
MaxOutputLength: &defaultMaxOutputLength,
Concurrency: &defaultConcurrency,
EnableMessageChangeBasedConditionUpdate: &defaultMessageChangeBasedConditionUpdate,
SkipInitialStatus: &defaultSkipInitialStatus,
},
EnableMetricsReporting: &defaultEnableMetricsReporting,
},
@ -129,6 +133,7 @@ func TestCustomPluginConfigApplyConfiguration(t *testing.T) {
MaxOutputLength: &maxOutputLength,
Concurrency: &defaultConcurrency,
EnableMessageChangeBasedConditionUpdate: &defaultMessageChangeBasedConditionUpdate,
SkipInitialStatus: &defaultSkipInitialStatus,
},
EnableMetricsReporting: &defaultEnableMetricsReporting,
},
@ -148,6 +153,7 @@ func TestCustomPluginConfigApplyConfiguration(t *testing.T) {
MaxOutputLength: &defaultMaxOutputLength,
Concurrency: &concurrency,
EnableMessageChangeBasedConditionUpdate: &defaultMessageChangeBasedConditionUpdate,
SkipInitialStatus: &defaultSkipInitialStatus,
},
EnableMetricsReporting: &defaultEnableMetricsReporting,
},
@ -167,6 +173,7 @@ func TestCustomPluginConfigApplyConfiguration(t *testing.T) {
MaxOutputLength: &defaultMaxOutputLength,
Concurrency: &defaultConcurrency,
EnableMessageChangeBasedConditionUpdate: &messageChangeBasedConditionUpdate,
SkipInitialStatus: &defaultSkipInitialStatus,
},
EnableMetricsReporting: &defaultEnableMetricsReporting,
},
@ -184,10 +191,30 @@ func TestCustomPluginConfigApplyConfiguration(t *testing.T) {
MaxOutputLength: &defaultMaxOutputLength,
Concurrency: &defaultConcurrency,
EnableMessageChangeBasedConditionUpdate: &defaultMessageChangeBasedConditionUpdate,
SkipInitialStatus: &defaultSkipInitialStatus,
},
EnableMetricsReporting: &disableMetricsReporting,
},
},
"disable status update during initialization": {
Orig: CustomPluginConfig{PluginGlobalConfig: pluginGlobalConfig{
SkipInitialStatus: &disableInitialStatusUpdate,
},
},
Wanted: CustomPluginConfig{
PluginGlobalConfig: pluginGlobalConfig{
InvokeIntervalString: &defaultInvokeIntervalString,
InvokeInterval: &defaultInvokeInterval,
TimeoutString: &defaultGlobalTimeoutString,
Timeout: &defaultGlobalTimeout,
MaxOutputLength: &defaultMaxOutputLength,
Concurrency: &defaultConcurrency,
EnableMessageChangeBasedConditionUpdate: &defaultMessageChangeBasedConditionUpdate,
SkipInitialStatus: &disableInitialStatusUpdate,
},
EnableMetricsReporting: &defaultEnableMetricsReporting,
},
},
}
for desp, utMeta := range utMetas {

View File

@ -17,8 +17,9 @@ limitations under the License.
package types
import (
"k8s.io/node-problem-detector/pkg/types"
"time"
"k8s.io/node-problem-detector/pkg/types"
)
type Status int

View File

@ -17,6 +17,7 @@ limitations under the License.
package condition
import (
"context"
"reflect"
"sync"
"time"
@ -25,10 +26,10 @@ import (
"k8s.io/node-problem-detector/pkg/types"
problemutil "k8s.io/node-problem-detector/pkg/util"
"k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/util/clock"
v1 "k8s.io/api/core/v1"
"k8s.io/utils/clock"
"github.com/golang/glog"
"k8s.io/klog/v2"
)
const (
@ -49,7 +50,7 @@ const (
// not. This addresses 3).
type ConditionManager interface {
// Start starts the condition manager.
Start()
Start(ctx context.Context)
// UpdateCondition updates a specific condition.
UpdateCondition(types.Condition)
// GetConditions returns all current conditions.
@ -67,7 +68,7 @@ type conditionManager struct {
// No lock is needed in `sync`, because it is in the same goroutine with the
// write operation.
sync.RWMutex
clock clock.Clock
clock clock.WithTicker
latestTry time.Time
resyncNeeded bool
client problemclient.Client
@ -78,18 +79,18 @@ type conditionManager struct {
}
// NewConditionManager creates a condition manager.
func NewConditionManager(client problemclient.Client, clock clock.Clock, heartbeatPeriod time.Duration) ConditionManager {
func NewConditionManager(client problemclient.Client, clockInUse clock.WithTicker, heartbeatPeriod time.Duration) ConditionManager {
return &conditionManager{
client: client,
clock: clock,
clock: clockInUse,
updates: make(map[string]types.Condition),
conditions: make(map[string]types.Condition),
heartbeatPeriod: heartbeatPeriod,
}
}
func (c *conditionManager) Start() {
go c.syncLoop()
func (c *conditionManager) Start(ctx context.Context) {
go c.syncLoop(ctx)
}
func (c *conditionManager) UpdateCondition(condition types.Condition) {
@ -110,15 +111,17 @@ func (c *conditionManager) GetConditions() []types.Condition {
return conditions
}
func (c *conditionManager) syncLoop() {
func (c *conditionManager) syncLoop(ctx context.Context) {
ticker := c.clock.NewTicker(updatePeriod)
defer ticker.Stop()
for {
select {
case <-ticker.C():
if c.needUpdates() || c.needResync() || c.needHeartbeat() {
c.sync()
c.sync(ctx)
}
case <-ctx.Done():
break
}
}
}
@ -150,16 +153,16 @@ func (c *conditionManager) needHeartbeat() bool {
}
// sync synchronizes node conditions with the apiserver.
func (c *conditionManager) sync() {
func (c *conditionManager) sync(ctx context.Context) {
c.latestTry = c.clock.Now()
c.resyncNeeded = false
conditions := []v1.NodeCondition{}
for i := range c.conditions {
conditions = append(conditions, problemutil.ConvertToAPICondition(c.conditions[i]))
}
if err := c.client.SetConditions(conditions); err != nil {
if err := c.client.SetConditions(ctx, conditions); err != nil {
// The conditions will be updated again in future sync
glog.Errorf("failed to update node conditions: %v", err)
klog.Errorf("failed to update node conditions: %v", err)
c.resyncNeeded = true
return
}

View File

@ -17,6 +17,7 @@ limitations under the License.
package condition
import (
"context"
"fmt"
"testing"
"time"
@ -28,14 +29,14 @@ import (
problemutil "k8s.io/node-problem-detector/pkg/util"
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/util/clock"
testclock "k8s.io/utils/clock/testing"
)
const heartbeatPeriod = 1 * time.Minute
func newTestManager() (*conditionManager, *problemclient.FakeProblemClient, *clock.FakeClock) {
func newTestManager() (*conditionManager, *problemclient.FakeProblemClient, *testclock.FakeClock) {
fakeClient := problemclient.NewFakeProblemClient()
fakeClock := clock.NewFakeClock(time.Now())
fakeClock := testclock.NewFakeClock(time.Now())
manager := NewConditionManager(fakeClient, fakeClock, heartbeatPeriod)
return manager.(*conditionManager), fakeClient, fakeClock
}
@ -109,7 +110,7 @@ func TestResync(t *testing.T) {
m, fakeClient, fakeClock := newTestManager()
condition := newTestCondition("TestCondition")
m.conditions = map[string]types.Condition{condition.Type: condition}
m.sync()
m.sync(context.Background())
expected := []v1.NodeCondition{problemutil.ConvertToAPICondition(condition)}
assert.Nil(t, fakeClient.AssertConditions(expected), "Condition should be updated via client")
@ -118,7 +119,7 @@ func TestResync(t *testing.T) {
assert.False(t, m.needResync(), "Should not resync after resync period without resync needed")
fakeClient.InjectError("SetConditions", fmt.Errorf("injected error"))
m.sync()
m.sync(context.Background())
assert.False(t, m.needResync(), "Should not resync before resync period")
fakeClock.Step(resyncPeriod)
@ -129,7 +130,7 @@ func TestHeartbeat(t *testing.T) {
m, fakeClient, fakeClock := newTestManager()
condition := newTestCondition("TestCondition")
m.conditions = map[string]types.Condition{condition.Type: condition}
m.sync()
m.sync(context.Background())
expected := []v1.NodeCondition{problemutil.ConvertToAPICondition(condition)}
assert.Nil(t, fakeClient.AssertConditions(expected), "Condition should be updated via client")

View File

@ -17,15 +17,16 @@ limitations under the License.
package k8sexporter
import (
"context"
"net"
"net/http"
_ "net/http/pprof"
"net/http/pprof"
"strconv"
"github.com/golang/glog"
"k8s.io/klog/v2"
"k8s.io/apimachinery/pkg/util/clock"
"k8s.io/apimachinery/pkg/util/wait"
"k8s.io/utils/clock"
"k8s.io/node-problem-detector/cmd/options"
"k8s.io/node-problem-detector/pkg/exporters/k8sexporter/condition"
@ -37,6 +38,8 @@ import (
type k8sExporter struct {
client problemclient.Client
conditionManager condition.ConditionManager
writeEvents bool
updateConditions bool
}
// NewExporterOrDie creates a exporter for Kubernetes apiserver exporting,
@ -44,35 +47,41 @@ type k8sExporter struct {
//
// Note that this function may be blocked (until a timeout occurs) before
// kube-apiserver becomes ready.
func NewExporterOrDie(npdo *options.NodeProblemDetectorOptions) types.Exporter {
func NewExporterOrDie(ctx context.Context, npdo *options.NodeProblemDetectorOptions) types.Exporter {
if !npdo.EnableK8sExporter {
return nil
}
c := problemclient.NewClientOrDie(npdo)
glog.Infof("Waiting for kube-apiserver to be ready (timeout %v)...", npdo.APIServerWaitTimeout)
if err := waitForAPIServerReadyWithTimeout(c, npdo); err != nil {
glog.Warningf("kube-apiserver did not become ready: timed out on waiting for kube-apiserver to return the node object: %v", err)
klog.Infof("Waiting for kube-apiserver to be ready (timeout %v)...", npdo.APIServerWaitTimeout)
if err := waitForAPIServerReadyWithTimeout(ctx, c, npdo); err != nil {
klog.Warningf("kube-apiserver did not become ready: timed out on waiting for kube-apiserver to return the node object: %v", err)
}
ke := k8sExporter{
client: c,
conditionManager: condition.NewConditionManager(c, clock.RealClock{}, npdo.K8sExporterHeartbeatPeriod),
writeEvents: npdo.K8sExporterWriteEvents,
updateConditions: npdo.K8sExporterUpdateNodeConditions,
}
ke.startHTTPReporting(npdo)
ke.conditionManager.Start()
ke.conditionManager.Start(ctx)
return &ke
}
func (ke *k8sExporter) ExportProblems(status *types.Status) {
for _, event := range status.Events {
ke.client.Eventf(util.ConvertToAPIEventType(event.Severity), status.Source, event.Reason, event.Message)
if ke.writeEvents {
for _, event := range status.Events {
ke.client.Eventf(util.ConvertToAPIEventType(event.Severity), status.Source, event.Reason, event.Message)
}
}
for _, cdt := range status.Conditions {
ke.conditionManager.UpdateCondition(cdt)
if ke.updateConditions {
for _, cdt := range status.Conditions {
ke.conditionManager.UpdateCondition(cdt)
}
}
}
@ -94,22 +103,30 @@ func (ke *k8sExporter) startHTTPReporting(npdo *options.NodeProblemDetectorOptio
util.ReturnHTTPJson(w, ke.conditionManager.GetConditions())
})
// register pprof
mux.HandleFunc("/debug/pprof/", pprof.Index)
mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline)
mux.HandleFunc("/debug/pprof/profile", pprof.Profile)
mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol)
mux.HandleFunc("/debug/pprof/trace", pprof.Trace)
addr := net.JoinHostPort(npdo.ServerAddress, strconv.Itoa(npdo.ServerPort))
go func() {
err := http.ListenAndServe(addr, mux)
if err != nil {
glog.Fatalf("Failed to start server: %v", err)
klog.Fatalf("Failed to start server: %v", err)
}
}()
}
func waitForAPIServerReadyWithTimeout(c problemclient.Client, npdo *options.NodeProblemDetectorOptions) error {
return wait.PollImmediate(npdo.APIServerWaitInterval, npdo.APIServerWaitTimeout, func() (done bool, err error) {
func waitForAPIServerReadyWithTimeout(ctx context.Context, c problemclient.Client, npdo *options.NodeProblemDetectorOptions) error {
return wait.PollUntilContextTimeout(ctx, npdo.APIServerWaitInterval, npdo.APIServerWaitTimeout, true, func(ctx context.Context) (done bool, err error) {
// If NPD can get the node object from kube-apiserver, the server is
// ready and the RBAC permission is set correctly.
if _, err := c.GetNode(); err == nil {
return true, nil
if _, err := c.GetNode(ctx); err != nil {
klog.Errorf("Can't get node object: %v", err)
return false, err
}
return false, nil
return true, nil
})
}

View File

@ -12,12 +12,12 @@
// See the License for the specific language governing permissions and
// limitations under the License.
package kubernetes
package problemclient
import (
"fmt"
"io/ioutil"
"net/url"
"os"
"strconv"
"k8s.io/apimachinery/pkg/runtime/schema"
@ -57,7 +57,7 @@ func getConfigOverrides(uri *url.URL) (*kubeClientCmd.ConfigOverrides, error) {
return &kubeConfigOverride, nil
}
func GetKubeClientConfig(uri *url.URL) (*kube_rest.Config, error) {
func getKubeClientConfig(uri *url.URL) (*kube_rest.Config, error) {
var (
kubeConfig *kube_rest.Config
err error
@ -137,7 +137,7 @@ func GetKubeClientConfig(uri *url.URL) (*kube_rest.Config, error) {
if useServiceAccount {
// If a readable service account token exists, then use it
if contents, err := ioutil.ReadFile(defaultServiceAccountFile); err == nil {
if contents, err := os.ReadFile(defaultServiceAccountFile); err == nil {
kubeConfig.BearerToken = string(contents)
}
}

View File

@ -17,6 +17,7 @@ limitations under the License.
package problemclient
import (
"context"
"fmt"
"reflect"
"sync"
@ -60,7 +61,7 @@ func (f *FakeProblemClient) AssertConditions(expected []v1.NodeCondition) error
}
// SetConditions is a fake mimic of SetConditions, it only update the internal condition cache.
func (f *FakeProblemClient) SetConditions(conditions []v1.NodeCondition) error {
func (f *FakeProblemClient) SetConditions(ctx context.Context, conditions []v1.NodeCondition) error {
f.Lock()
defer f.Unlock()
if err, ok := f.errors["SetConditions"]; ok {
@ -73,7 +74,7 @@ func (f *FakeProblemClient) SetConditions(conditions []v1.NodeCondition) error {
}
// GetConditions is a fake mimic of GetConditions, it returns the conditions cached internally.
func (f *FakeProblemClient) GetConditions(types []v1.NodeConditionType) ([]*v1.NodeCondition, error) {
func (f *FakeProblemClient) GetConditions(ctx context.Context, types []v1.NodeConditionType) ([]*v1.NodeCondition, error) {
f.Lock()
defer f.Unlock()
if err, ok := f.errors["GetConditions"]; ok {
@ -93,6 +94,6 @@ func (f *FakeProblemClient) GetConditions(types []v1.NodeConditionType) ([]*v1.N
func (f *FakeProblemClient) Eventf(eventType string, source, reason, messageFmt string, args ...interface{}) {
}
func (f *FakeProblemClient) GetNode() (*v1.Node, error) {
func (f *FakeProblemClient) GetNode(ctx context.Context) (*v1.Node, error) {
return nil, fmt.Errorf("GetNode() not implemented")
}

View File

@ -17,24 +17,24 @@ limitations under the License.
package problemclient
import (
"context"
"encoding/json"
"fmt"
"net/url"
"os"
"path/filepath"
typedcorev1 "k8s.io/client-go/kubernetes/typed/core/v1"
"k8s.io/kubernetes/pkg/api/legacyscheme"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/clock"
clientset "k8s.io/client-go/kubernetes"
typedcorev1 "k8s.io/client-go/kubernetes/typed/core/v1"
"k8s.io/client-go/tools/record"
"k8s.io/client-go/util/retry"
"k8s.io/klog/v2"
"k8s.io/utils/clock"
"github.com/golang/glog"
"k8s.io/heapster/common/kubernetes"
"k8s.io/node-problem-detector/cmd/options"
"k8s.io/node-problem-detector/pkg/version"
)
@ -42,14 +42,14 @@ import (
// Client is the interface of problem client
type Client interface {
// GetConditions get all specific conditions of current node.
GetConditions(conditionTypes []v1.NodeConditionType) ([]*v1.NodeCondition, error)
GetConditions(ctx context.Context, conditionTypes []v1.NodeConditionType) ([]*v1.NodeCondition, error)
// SetConditions set or update conditions of current node.
SetConditions(conditions []v1.NodeCondition) error
SetConditions(ctx context.Context, conditionTypes []v1.NodeCondition) error
// Eventf reports the event.
Eventf(eventType string, source, reason, messageFmt string, args ...interface{})
// GetNode returns the Node object of the node on which the
// node-problem-detector runs.
GetNode() (*v1.Node, error)
GetNode(ctx context.Context) (*v1.Node, error)
}
type nodeProblemClient struct {
@ -68,13 +68,14 @@ func NewClientOrDie(npdo *options.NodeProblemDetectorOptions) Client {
// we have checked it is a valid URI after command line argument is parsed.:)
uri, _ := url.Parse(npdo.ApiServerOverride)
cfg, err := kubernetes.GetKubeClientConfig(uri)
cfg, err := getKubeClientConfig(uri)
if err != nil {
panic(err)
}
cfg.UserAgent = fmt.Sprintf("%s/%s", filepath.Base(os.Args[0]), version.Version())
// TODO(random-liu): Set QPS Limit
cfg.QPS = npdo.QPS
cfg.Burst = npdo.Burst
c.client = clientset.NewForConfigOrDie(cfg).CoreV1()
c.nodeName = npdo.NodeName
c.eventNamespace = npdo.EventNamespace
@ -83,8 +84,8 @@ func NewClientOrDie(npdo *options.NodeProblemDetectorOptions) Client {
return c
}
func (c *nodeProblemClient) GetConditions(conditionTypes []v1.NodeConditionType) ([]*v1.NodeCondition, error) {
node, err := c.GetNode()
func (c *nodeProblemClient) GetConditions(ctx context.Context, conditionTypes []v1.NodeConditionType) ([]*v1.NodeCondition, error) {
node, err := c.GetNode(ctx)
if err != nil {
return nil, err
}
@ -99,7 +100,7 @@ func (c *nodeProblemClient) GetConditions(conditionTypes []v1.NodeConditionType)
return conditions, nil
}
func (c *nodeProblemClient) SetConditions(newConditions []v1.NodeCondition) error {
func (c *nodeProblemClient) SetConditions(ctx context.Context, newConditions []v1.NodeCondition) error {
for i := range newConditions {
// Each time we update the conditions, we update the heart beat time
newConditions[i].LastHeartbeatTime = metav1.NewTime(c.clock.Now())
@ -108,7 +109,15 @@ func (c *nodeProblemClient) SetConditions(newConditions []v1.NodeCondition) erro
if err != nil {
return err
}
return c.client.RESTClient().Patch(types.StrategicMergePatchType).Resource("nodes").Name(c.nodeName).SubResource("status").Body(patch).Do().Error()
return retry.OnError(retry.DefaultRetry,
func(error) bool {
return true
},
func() error {
_, err := c.client.Nodes().PatchStatus(ctx, c.nodeName, patch)
return err
},
)
}
func (c *nodeProblemClient) Eventf(eventType, source, reason, messageFmt string, args ...interface{}) {
@ -121,8 +130,10 @@ func (c *nodeProblemClient) Eventf(eventType, source, reason, messageFmt string,
recorder.Eventf(c.nodeRef, eventType, reason, messageFmt, args...)
}
func (c *nodeProblemClient) GetNode() (*v1.Node, error) {
return c.client.Nodes().Get(c.nodeName, metav1.GetOptions{})
func (c *nodeProblemClient) GetNode(ctx context.Context) (*v1.Node, error) {
// To reduce the load on APIServer & etcd, we are serving GET operations from
// apiserver cache (the data might be slightly delayed).
return c.client.Nodes().Get(ctx, c.nodeName, metav1.GetOptions{ResourceVersion: "0"})
}
// generatePatch generates condition patch
@ -137,8 +148,8 @@ func generatePatch(conditions []v1.NodeCondition) ([]byte, error) {
// getEventRecorder generates a recorder for specific node name and source.
func getEventRecorder(c typedcorev1.CoreV1Interface, namespace, nodeName, source string) record.EventRecorder {
eventBroadcaster := record.NewBroadcaster()
eventBroadcaster.StartLogging(glog.V(4).Infof)
recorder := eventBroadcaster.NewRecorder(legacyscheme.Scheme, v1.EventSource{Component: source, Host: nodeName})
eventBroadcaster.StartLogging(klog.V(4).Infof)
recorder := eventBroadcaster.NewRecorder(runtime.NewScheme(), v1.EventSource{Component: source, Host: nodeName})
eventBroadcaster.StartRecordingToSink(&typedcorev1.EventSinkImpl{Interface: c.Events(namespace)})
return recorder
}

View File

@ -22,10 +22,10 @@ import (
"testing"
"time"
"k8s.io/api/core/v1"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/clock"
"k8s.io/client-go/tools/record"
testclock "k8s.io/utils/clock/testing"
"github.com/stretchr/testify/assert"
)
@ -40,7 +40,7 @@ func newFakeProblemClient() *nodeProblemClient {
nodeName: testNode,
// There is no proper fake for *client.Client for now
// TODO(random-liu): Add test for SetConditions when we have good fake for *client.Client
clock: &clock.FakeClock{},
clock: testclock.NewFakeClock(time.Now()),
recorders: make(map[string]record.EventRecorder),
nodeRef: getNodeRef("", testNode),
}

View File

@ -22,8 +22,8 @@ import (
"strconv"
"contrib.go.opencensus.io/exporter/prometheus"
"github.com/golang/glog"
"go.opencensus.io/stats/view"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/cmd/options"
"k8s.io/node-problem-detector/pkg/types"
@ -40,13 +40,13 @@ func NewExporterOrDie(npdo *options.NodeProblemDetectorOptions) types.Exporter {
addr := net.JoinHostPort(npdo.PrometheusServerAddress, strconv.Itoa(npdo.PrometheusServerPort))
pe, err := prometheus.NewExporter(prometheus.Options{})
if err != nil {
glog.Fatalf("Failed to create Prometheus exporter: %v", err)
klog.Fatalf("Failed to create Prometheus exporter: %v", err)
}
go func() {
mux := http.NewServeMux()
mux.Handle("/metrics", pe)
if err := http.ListenAndServe(addr, mux); err != nil {
glog.Fatalf("Failed to start Prometheus scrape endpoint: %v", err)
klog.Fatalf("Failed to start Prometheus scrape endpoint: %v", err)
}
}()
view.RegisterExporter(pe)

View File

@ -18,7 +18,7 @@ package gce
import (
"cloud.google.com/go/compute/metadata"
"github.com/golang/glog"
"k8s.io/klog/v2"
)
type Metadata struct {
@ -37,7 +37,7 @@ func (md *Metadata) HasMissingField() bool {
func (md *Metadata) PopulateFromGCE() error {
var err error
glog.Info("Fetching GCE metadata from metadata server")
klog.Info("Fetching GCE metadata from metadata server")
if md.ProjectID == "" {
md.ProjectID, err = metadata.ProjectID()
if err != nil {

View File

@ -18,19 +18,19 @@ package stackdriverexporter
import (
"encoding/json"
"io/ioutil"
"os"
"path/filepath"
"reflect"
"time"
"contrib.go.opencensus.io/exporter/stackdriver"
monitoredres "contrib.go.opencensus.io/exporter/stackdriver/monitoredresource"
"github.com/golang/glog"
"github.com/spf13/pflag"
"go.opencensus.io/stats/view"
"google.golang.org/api/option"
"k8s.io/klog/v2"
"github.com/avast/retry-go"
"github.com/avast/retry-go/v4"
"k8s.io/node-problem-detector/pkg/exporters"
seconfig "k8s.io/node-problem-detector/pkg/exporters/stackdriver/config"
"k8s.io/node-problem-detector/pkg/types"
@ -54,6 +54,7 @@ var NPDMetricToSDMetric = map[metrics.MetricID]string{
metrics.CPULoad15m: "compute.googleapis.com/guest/cpu/load_15m",
metrics.DiskAvgQueueLenID: "compute.googleapis.com/guest/disk/queue_length",
metrics.DiskBytesUsedID: "compute.googleapis.com/guest/disk/bytes_used",
metrics.DiskPercentUsedID: "compute.googleapis.com/guest/disk/percent_used",
metrics.DiskIOTimeID: "compute.googleapis.com/guest/disk/io_time",
metrics.DiskMergedOpsCountID: "compute.googleapis.com/guest/disk/merged_operation_count",
metrics.DiskOpsBytesID: "compute.googleapis.com/guest/disk/operation_bytes_count",
@ -66,6 +67,7 @@ var NPDMetricToSDMetric = map[metrics.MetricID]string{
metrics.MemoryDirtyUsedID: "compute.googleapis.com/guest/memory/dirty_used",
metrics.MemoryPageCacheUsedID: "compute.googleapis.com/guest/memory/page_cache_used",
metrics.MemoryUnevictableUsedID: "compute.googleapis.com/guest/memory/unevictable_used",
metrics.MemoryPercentUsedID: "compute.googleapis.com/guest/memory/percent_used",
metrics.ProblemCounterID: "compute.googleapis.com/guest/system/problem_count",
metrics.ProblemGaugeID: "compute.googleapis.com/guest/system/problem_state",
metrics.OSFeatureID: "compute.googleapis.com/guest/system/os_feature_enabled",
@ -137,12 +139,12 @@ func (se *stackdriverExporter) setupOpenCensusViewExporterOrDie() {
DefaultMonitoringLabels: &globalLabels,
})
if err != nil {
glog.Fatalf("Failed to create Stackdriver OpenCensus view exporter: %v", err)
klog.Fatalf("Failed to create Stackdriver OpenCensus view exporter: %v", err)
}
exportPeriod, err := time.ParseDuration(se.config.ExportPeriod)
if err != nil {
glog.Fatalf("Failed to parse ExportPeriod %q: %v", se.config.ExportPeriod, err)
klog.Fatalf("Failed to parse ExportPeriod %q: %v", se.config.ExportPeriod, err)
}
view.SetReportingPeriod(exportPeriod)
@ -151,33 +153,33 @@ func (se *stackdriverExporter) setupOpenCensusViewExporterOrDie() {
func (se *stackdriverExporter) populateMetadataOrDie() {
if !se.config.GCEMetadata.HasMissingField() {
glog.Infof("Using GCE metadata specified in the config file: %+v", se.config.GCEMetadata)
klog.Infof("Using GCE metadata specified in the config file: %+v", se.config.GCEMetadata)
return
}
metadataFetchTimeout, err := time.ParseDuration(se.config.MetadataFetchTimeout)
if err != nil {
glog.Fatalf("Failed to parse MetadataFetchTimeout %q: %v", se.config.MetadataFetchTimeout, err)
klog.Fatalf("Failed to parse MetadataFetchTimeout %q: %v", se.config.MetadataFetchTimeout, err)
}
metadataFetchInterval, err := time.ParseDuration(se.config.MetadataFetchInterval)
if err != nil {
glog.Fatalf("Failed to parse MetadataFetchInterval %q: %v", se.config.MetadataFetchInterval, err)
klog.Fatalf("Failed to parse MetadataFetchInterval %q: %v", se.config.MetadataFetchInterval, err)
}
glog.Infof("Populating GCE metadata by querying GCE metadata server.")
klog.Infof("Populating GCE metadata by querying GCE metadata server.")
err = retry.Do(se.config.GCEMetadata.PopulateFromGCE,
retry.Delay(metadataFetchInterval),
retry.Attempts(uint(metadataFetchTimeout/metadataFetchInterval)),
retry.DelayType(retry.FixedDelay))
if err == nil {
glog.Infof("Using GCE metadata: %+v", se.config.GCEMetadata)
klog.Infof("Using GCE metadata: %+v", se.config.GCEMetadata)
return
}
if se.config.PanicOnMetadataFetchFailure {
glog.Fatalf("Failed to populate GCE metadata: %v", err)
klog.Fatalf("Failed to populate GCE metadata: %v", err)
} else {
glog.Errorf("Failed to populate GCE metadata: %v", err)
klog.Errorf("Failed to populate GCE metadata: %v", err)
}
}
@ -200,7 +202,7 @@ func (clo *commandLineOptions) SetFlags(fs *pflag.FlagSet) {
func NewExporterOrDie(clo types.CommandLineOptions) types.Exporter {
options, ok := clo.(*commandLineOptions)
if !ok {
glog.Fatalf("Wrong type for the command line options of Stackdriver Exporter: %s.", reflect.TypeOf(clo))
klog.Fatalf("Wrong type for the command line options of Stackdriver Exporter: %s.", reflect.TypeOf(clo))
}
if options.configPath == "" {
return nil
@ -209,17 +211,17 @@ func NewExporterOrDie(clo types.CommandLineOptions) types.Exporter {
se := stackdriverExporter{}
// Apply configurations.
f, err := ioutil.ReadFile(options.configPath)
f, err := os.ReadFile(options.configPath)
if err != nil {
glog.Fatalf("Failed to read configuration file %q: %v", options.configPath, err)
klog.Fatalf("Failed to read configuration file %q: %v", options.configPath, err)
}
err = json.Unmarshal(f, &se.config)
if err != nil {
glog.Fatalf("Failed to unmarshal configuration file %q: %v", options.configPath, err)
klog.Fatalf("Failed to unmarshal configuration file %q: %v", options.configPath, err)
}
se.config.ApplyConfiguration()
glog.Infof("Starting Stackdriver exporter %s", options.configPath)
klog.Infof("Starting Stackdriver exporter %s", options.configPath)
se.populateMetadataOrDie()
se.setupOpenCensusViewExporterOrDie()

View File

@ -1,3 +1,4 @@
//go:build !disable_stackdriver_exporter
// +build !disable_stackdriver_exporter
/*

View File

@ -17,9 +17,13 @@ limitations under the License.
package healthchecker
import (
"context"
"net/http"
"os/exec"
"strings"
"time"
"github.com/golang/glog"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/cmd/healthchecker/options"
"k8s.io/node-problem-detector/pkg/healthchecker/types"
)
@ -36,6 +40,7 @@ type healthChecker struct {
crictlPath string
healthCheckTimeout time.Duration
coolDownTime time.Duration
loopBackTime time.Duration
logPatternsToCheck map[string]int
}
@ -48,6 +53,7 @@ func NewHealthChecker(hco *options.HealthCheckerOptions) (types.HealthChecker, e
healthCheckTimeout: hco.HealthCheckTimeout,
coolDownTime: hco.CoolDownTime,
service: hco.Service,
loopBackTime: hco.LoopBackTime,
logPatternsToCheck: hco.LogPatterns.GetLogPatternCountMap(),
}
hc.healthCheckFunc = getHealthCheckFunc(hco)
@ -63,24 +69,26 @@ func (hc *healthChecker) CheckHealth() (bool, error) {
if err != nil {
return healthy, err
}
logPatternHealthy, err := logPatternHealthCheck(hc.service, hc.logPatternsToCheck)
logPatternHealthy, err := logPatternHealthCheck(hc.service, hc.loopBackTime, hc.logPatternsToCheck)
if err != nil {
return logPatternHealthy, err
}
if healthy && logPatternHealthy {
return true, nil
}
// The service is unhealthy.
// Attempt repair based on flag.
if hc.enableRepair {
// repair if the service has been up for the cool down period.
uptime, err := hc.uptimeFunc()
if err != nil {
glog.Infof("error in getting uptime for %v: %v\n", hc.component, err)
klog.Infof("error in getting uptime for %v: %v\n", hc.component, err)
return false, nil
}
glog.Infof("%v is unhealthy, component uptime: %v\n", hc.component, uptime)
klog.Infof("%v is unhealthy, component uptime: %v\n", hc.component, uptime)
if uptime > hc.coolDownTime {
glog.Infof("%v cooldown period of %v exceeded, repairing", hc.component, hc.coolDownTime)
klog.Infof("%v cooldown period of %v exceeded, repairing", hc.component, hc.coolDownTime)
hc.repairFunc()
}
}
@ -89,18 +97,21 @@ func (hc *healthChecker) CheckHealth() (bool, error) {
// logPatternHealthCheck checks for the provided logPattern occurrences in the service logs.
// Returns true if the pattern is empty or does not exist logThresholdCount times since start of service, false otherwise.
func logPatternHealthCheck(service string, logPatternsToCheck map[string]int) (bool, error) {
func logPatternHealthCheck(service string, loopBackTime time.Duration, logPatternsToCheck map[string]int) (bool, error) {
if len(logPatternsToCheck) == 0 {
return true, nil
}
uptimeFunc := getUptimeFunc(service)
klog.Infof("Getting uptime for service: %v\n", service)
uptime, err := uptimeFunc()
if err != nil {
klog.Warningf("Failed to get the uptime: %+v", err)
return true, err
}
logStartTime := time.Now().Add(-uptime).Format(types.LogParsingTimeLayout)
if err != nil {
return true, err
if loopBackTime > 0 && uptime > loopBackTime {
logStartTime = time.Now().Add(-loopBackTime).Format(types.LogParsingTimeLayout)
}
for pattern, count := range logPatternsToCheck {
healthy, err := checkForPattern(service, logStartTime, pattern, count)
@ -110,3 +121,65 @@ func logPatternHealthCheck(service string, logPatternsToCheck map[string]int) (b
}
return true, nil
}
// healthCheckEndpointOKFunc returns a function to check the status of an http endpoint
func healthCheckEndpointOKFunc(endpoint string, timeout time.Duration) func() (bool, error) {
return func() (bool, error) {
httpClient := http.Client{Timeout: timeout}
response, err := httpClient.Get(endpoint)
if err != nil || response.StatusCode != http.StatusOK {
return false, nil
}
return true, nil
}
}
// getHealthCheckFunc returns the health check function based on the component.
func getHealthCheckFunc(hco *options.HealthCheckerOptions) func() (bool, error) {
switch hco.Component {
case types.KubeletComponent:
return healthCheckEndpointOKFunc(types.KubeletHealthCheckEndpoint(), hco.HealthCheckTimeout)
case types.KubeProxyComponent:
return healthCheckEndpointOKFunc(types.KubeProxyHealthCheckEndpoint(), hco.HealthCheckTimeout)
case types.DockerComponent:
return func() (bool, error) {
if _, err := execCommand(hco.HealthCheckTimeout, getDockerPath(), "ps"); err != nil {
return false, nil
}
return true, nil
}
case types.CRIComponent:
return func() (bool, error) {
_, err := execCommand(
hco.HealthCheckTimeout,
hco.CriCtlPath,
"--timeout="+hco.CriTimeout.String(),
"--runtime-endpoint="+hco.CriSocketPath,
"pods",
"--latest",
)
if err != nil {
return false, nil
}
return true, nil
}
default:
klog.Warningf("Unsupported component: %v", hco.Component)
}
return nil
}
// execCommand executes the bash command and returns the (output, error) from command, error if timeout occurs.
func execCommand(timeout time.Duration, command string, args ...string) (string, error) {
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
cmd := exec.CommandContext(ctx, command, args...)
out, err := cmd.CombinedOutput()
if err != nil {
klog.Infof("command %v failed: %v, %s\n", cmd, err, string(out))
return "", err
}
return strings.TrimSuffix(string(out), "\n"), nil
}

View File

@ -0,0 +1,49 @@
/*
Copyright 2023 The Kubernetes Authors All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package healthchecker
import (
"runtime"
"time"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/cmd/healthchecker/options"
)
// getUptimeFunc returns the time for which the given service has been running.
func getUptimeFunc(service string) func() (time.Duration, error) {
klog.Fatalf("getUptimeFunc is not supported in %s", runtime.GOOS)
return func() (time.Duration, error) { return time.Second, nil }
}
// getRepairFunc returns the repair function based on the component.
func getRepairFunc(hco *options.HealthCheckerOptions) func() {
klog.Fatalf("getRepairFunc is not supported in %s", runtime.GOOS)
return func() {}
}
// checkForPattern returns (true, nil) if logPattern occurs less than logCountThreshold number of times since last
// service restart. (false, nil) otherwise.
func checkForPattern(service, logStartTime, logPattern string, logCountThreshold int) (bool, error) {
klog.Fatalf("checkForPattern is not supported in %s", runtime.GOOS)
return false, nil
}
func getDockerPath() string {
klog.Fatalf("getDockerPath is not supported in %s", runtime.GOOS)
return ""
}

View File

@ -17,15 +17,12 @@ limitations under the License.
package healthchecker
import (
"context"
"errors"
"net/http"
"os/exec"
"strconv"
"strings"
"time"
"github.com/golang/glog"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/cmd/healthchecker/options"
"k8s.io/node-problem-detector/pkg/healthchecker/types"
@ -59,6 +56,11 @@ func getUptimeFunc(service string) func() (time.Duration, error) {
// getRepairFunc returns the repair function based on the component.
func getRepairFunc(hco *options.HealthCheckerOptions) func() {
// Use `systemctl kill` instead of `systemctl restart` for the repair function.
// We start to rely on the kernel message difference for the two commands to
// indicate if the component restart is due to an administrative plan (restart)
// or a system issue that needs repair (kill).
// See https://github.com/kubernetes/node-problem-detector/issues/847.
switch hco.Component {
case types.DockerComponent:
// Use "docker ps" for docker health check. Not using crictl for docker to remove
@ -75,49 +77,6 @@ func getRepairFunc(hco *options.HealthCheckerOptions) func() {
}
}
// getHealthCheckFunc returns the health check function based on the component.
func getHealthCheckFunc(hco *options.HealthCheckerOptions) func() (bool, error) {
switch hco.Component {
case types.KubeletComponent:
return func() (bool, error) {
httpClient := http.Client{Timeout: hco.HealthCheckTimeout}
response, err := httpClient.Get(types.KubeletHealthCheckEndpoint)
if err != nil || response.StatusCode != http.StatusOK {
return false, nil
}
return true, nil
}
case types.DockerComponent:
return func() (bool, error) {
if _, err := execCommand(hco.HealthCheckTimeout, "docker", "ps"); err != nil {
return false, nil
}
return true, nil
}
case types.CRIComponent:
return func() (bool, error) {
if _, err := execCommand(hco.HealthCheckTimeout, hco.CriCtlPath, "--runtime-endpoint="+hco.CriSocketPath, "--image-endpoint="+hco.CriSocketPath, "pods"); err != nil {
return false, nil
}
return true, nil
}
}
return nil
}
// execCommand executes the bash command and returns the (output, error) from command, error if timeout occurs.
func execCommand(timeout time.Duration, command string, args ...string) (string, error) {
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
cmd := exec.CommandContext(ctx, command, args...)
out, err := cmd.Output()
if err != nil {
glog.Infof("command %v failed: %v, %v\n", cmd, err, out)
return "", err
}
return strings.TrimSuffix(string(out), "\n"), nil
}
// checkForPattern returns (true, nil) if logPattern occurs less than logCountThreshold number of times since last
// service restart. (false, nil) otherwise.
func checkForPattern(service, logStartTime, logPattern string, logCountThreshold int) (bool, error) {
@ -136,8 +95,12 @@ func checkForPattern(service, logStartTime, logPattern string, logCountThreshold
return true, err
}
if occurrences >= logCountThreshold {
glog.Infof("%s failed log pattern check, %s occurrences: %v", service, logPattern, occurrences)
klog.Infof("%s failed log pattern check, %s occurrences: %v", service, logPattern, occurrences)
return false, nil
}
return true, nil
}
func getDockerPath() string {
return "docker"
}

View File

@ -20,6 +20,7 @@ import (
"testing"
"time"
"k8s.io/node-problem-detector/cmd/healthchecker/options"
"k8s.io/node-problem-detector/pkg/healthchecker/types"
)
@ -119,3 +120,38 @@ func TestHealthCheck(t *testing.T) {
})
}
}
func TestComponentsSupported(t *testing.T) {
for _, tc := range []struct {
description string
component string
}{
{
description: "Kube Proxy should be supported",
component: types.KubeProxyComponent,
},
{
description: "Kubelet should be supported",
component: types.KubeletComponent,
},
{
description: "Docker should be supported",
component: types.DockerComponent,
},
{
description: "CRI should be supported",
component: types.CRIComponent,
},
} {
t.Run(tc.description, func(t *testing.T) {
checkFunc := getHealthCheckFunc(&options.HealthCheckerOptions{
Component: tc.component,
})
if checkFunc == nil {
t.Errorf("component %v should be supported", tc.component)
}
})
}
}

View File

@ -18,13 +18,12 @@ package healthchecker
import (
"fmt"
"net/http"
"os/exec"
"strconv"
"strings"
"time"
"github.com/golang/glog"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/cmd/healthchecker/options"
"k8s.io/node-problem-detector/pkg/healthchecker/types"
@ -34,12 +33,19 @@ import (
// getUptimeFunc returns the time for which the given service has been running.
func getUptimeFunc(service string) func() (time.Duration, error) {
return func() (time.Duration, error) {
// Using the WinEvent Log Objects to find the Service logs' time when the Service last entered running state.
// To attempt to calculate uptime more efficiently, we attempt to grab the process id to grab the start time.
// If the process id does not exist (meaning the service is not running for some reason), we will result to
// using the WinEvent Log Objects to find the Service logs' time when the Service last entered running state.
// In addition to filtering not by the logname=system we also filter on event id=7036 to reduce the number of
// entries the next command Where-Object will have to look through. id 7036 messages indicating a stopped or running service.
// The powershell command formats the TimeCreated of the event log in RFC1123Pattern.
// However, because the time library parser does not recognize the ',' in this RFC1123Pattern format,
// it is manually removed before parsing it using the UptimeTimeLayout.
getTimeCreatedCmd := "(Get-WinEvent -Logname System | Where-Object {$_.Message -Match '.*(" + service +
").*(running).*'} | Select-Object -Property TimeCreated -First 1 | foreach {$_.TimeCreated.ToString('R')} | Out-String).Trim()"
getTimeCreatedCmd := `$ProcessId = (Get-WMIObject -Class Win32_Service -Filter "Name='` + service + `'" | Select-Object -ExpandProperty ProcessId);` +
`if ([string]::IsNullOrEmpty($ProcessId) -or $ProcessId -eq 0) { (Get-WinEvent -FilterHashtable @{logname='system';id=7036} ` +
`| Where-Object {$_.Message -match '.*(` + service + `).*(running).*'} | Select-Object -Property TimeCreated -First 1 | ` +
`foreach {$_.TimeCreated.ToUniversalTime().ToString('R')} | Out-String).Trim() } else { (Get-Process -Id $ProcessId | Select starttime | ` +
`foreach {$_.starttime.ToUniversalTime().ToString('R')} | Out-String).Trim() }`
out, err := powershell(getTimeCreatedCmd)
if err != nil {
return time.Duration(0), err
@ -64,49 +70,6 @@ func getRepairFunc(hco *options.HealthCheckerOptions) func() {
}
}
// getHealthCheckFunc returns the health check function based on the component.
func getHealthCheckFunc(hco *options.HealthCheckerOptions) func() (bool, error) {
switch hco.Component {
case types.KubeletComponent:
return healthCheckEndpointOKFunc(types.KubeletHealthCheckEndpoint, hco.HealthCheckTimeout)
case types.KubeProxyComponent:
return healthCheckEndpointOKFunc(types.KubeProxyHealthCheckEndpoint, hco.HealthCheckTimeout)
case types.DockerComponent:
return func() (bool, error) {
if _, err := execCommand("docker.exe", "ps"); err != nil {
return false, nil
}
return true, nil
}
case types.CRIComponent:
return func() (bool, error) {
if _, err := execCommand(hco.CriCtlPath, "--runtime-endpoint="+hco.CriSocketPath, "--image-endpoint="+hco.CriSocketPath, "pods"); err != nil {
return false, nil
}
return true, nil
}
}
return nil
}
// healthCheckEndpointOKFunc returns a function to check the status of an http endpoint
func healthCheckEndpointOKFunc(endpoint string, timeout time.Duration) func() (bool, error) {
return func() (bool, error) {
httpClient := http.Client{Timeout: timeout}
response, err := httpClient.Get(endpoint)
if err != nil || response.StatusCode != http.StatusOK {
return false, nil
}
return true, nil
}
}
// execCommand creates a new process, executes the command, and returns the (output, error) from command.
func execCommand(command string, args ...string) (string, error) {
cmd := util.Exec(command, args...)
return extractCommandOutput(cmd)
}
// powershell executes the arguments in powershell process and returns (output, error) from command.
func powershell(args ...string) (string, error) {
cmd := util.Powershell(args...)
@ -117,7 +80,7 @@ func powershell(args ...string) (string, error) {
func extractCommandOutput(cmd *exec.Cmd) (string, error) {
out, err := cmd.Output()
if err != nil {
glog.Infof("command %v failed: %v, %v\n", cmd, err, out)
klog.Infof("command %v failed: %v, %v\n", cmd, err, out)
return "", err
}
return strings.TrimSuffix(string(out), "\r\n"), nil
@ -138,8 +101,12 @@ func checkForPattern(service, logStartTime, logPattern string, logCountThreshold
return true, err
}
if occurrences >= logCountThreshold {
glog.Infof("%s failed log pattern check, %s occurrences: %v", service, logPattern, occurrences)
klog.Infof("%s failed log pattern check, %s occurrences: %v", service, logPattern, occurrences)
return false, nil
}
return true, nil
}
func getDockerPath() string {
return "docker.exe"
}

View File

@ -18,6 +18,8 @@ package types
import (
"fmt"
"net"
"os"
"sort"
"strconv"
"strings"
@ -25,6 +27,8 @@ import (
)
const (
DefaultLoopBackTime = 0 * time.Minute
DefaultCriTimeout = 2 * time.Second
DefaultCoolDownTime = 2 * time.Minute
DefaultHealthCheckTimeout = 10 * time.Second
CmdTimeout = 10 * time.Second
@ -36,12 +40,57 @@ const (
ContainerdService = "containerd"
KubeProxyComponent = "kube-proxy"
KubeletHealthCheckEndpoint = "http://127.0.0.1:10248/healthz"
KubeProxyHealthCheckEndpoint = "http://127.0.0.1:10256/healthz"
LogPatternFlagSeparator = ":"
hostAddressKey = "HOST_ADDRESS"
kubeletPortKey = "KUBELET_PORT"
kubeProxyPortKey = "KUBEPROXY_PORT"
defaultHostAddress = "localhost"
defaultKubeletPort = "10248"
defaultKubeproxyPort = "10256"
)
var (
kubeletHealthCheckEndpoint string
kubeProxyHealthCheckEndpoint string
)
func init() {
setKubeEndpoints()
}
func setKubeEndpoints() {
var o string
hostAddress := defaultHostAddress
kubeletPort := defaultKubeletPort
kubeProxyPort := defaultKubeproxyPort
o = os.Getenv(hostAddressKey)
if o != "" {
hostAddress = o
}
o = os.Getenv(kubeletPortKey)
if o != "" {
kubeletPort = o
}
o = os.Getenv(kubeProxyPortKey)
if o != "" {
kubeProxyPort = o
}
kubeletHealthCheckEndpoint = fmt.Sprintf("http://%s/healthz", net.JoinHostPort(hostAddress, kubeletPort))
kubeProxyHealthCheckEndpoint = fmt.Sprintf("http://%s/healthz", net.JoinHostPort(hostAddress, kubeProxyPort))
}
func KubeProxyHealthCheckEndpoint() string {
return kubeProxyHealthCheckEndpoint
}
func KubeletHealthCheckEndpoint() string {
return kubeletHealthCheckEndpoint
}
type HealthChecker interface {
CheckHealth() (bool, error)
}

View File

@ -1,23 +0,0 @@
/*
Copyright 2021 The Kubernetes Authors All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package types
const (
DefaultCriCtl = "/usr/bin/crictl"
DefaultCriSocketPath = "unix:///var/run/containerd/containerd.sock"
UptimeTimeLayout = "Mon 2006-01-02 15:04:05 MST"
)

View File

@ -98,3 +98,101 @@ func TestLogPatternFlag(t *testing.T) {
})
}
}
func TestKubeEndpointConfiguration(t *testing.T) {
testCases := []struct {
name string
envConfig map[string]string
expectedKubeletEndpoint string
expectedKubeProxyEndpoint string
}{
{
name: "no overrides supplied",
envConfig: map[string]string{},
expectedKubeletEndpoint: "http://localhost:10248/healthz",
expectedKubeProxyEndpoint: "http://localhost:10256/healthz",
},
{
name: "HOST_ADDRESS override supplied",
envConfig: map[string]string{
"HOST_ADDRESS": "samplehost.testdomain.com",
},
expectedKubeletEndpoint: "http://samplehost.testdomain.com:10248/healthz",
expectedKubeProxyEndpoint: "http://samplehost.testdomain.com:10256/healthz",
},
{
name: "HOST_ADDRESS override supplied with IPv4",
envConfig: map[string]string{
"HOST_ADDRESS": "10.0.5.4",
},
expectedKubeletEndpoint: "http://10.0.5.4:10248/healthz",
expectedKubeProxyEndpoint: "http://10.0.5.4:10256/healthz",
},
{
name: "HOST_ADDRESS override supplied with IPv6",
envConfig: map[string]string{
"HOST_ADDRESS": "80:f4:16::1",
},
expectedKubeletEndpoint: "http://[80:f4:16::1]:10248/healthz",
expectedKubeProxyEndpoint: "http://[80:f4:16::1]:10256/healthz",
},
{
name: "KUBELET_PORT override supplied",
envConfig: map[string]string{
"KUBELET_PORT": "12345",
},
expectedKubeletEndpoint: "http://localhost:12345/healthz",
expectedKubeProxyEndpoint: "http://localhost:10256/healthz",
},
{
name: "KUBEPROXY_PORT override supplied",
envConfig: map[string]string{
"KUBEPROXY_PORT": "12345",
},
expectedKubeletEndpoint: "http://localhost:10248/healthz",
expectedKubeProxyEndpoint: "http://localhost:12345/healthz",
},
{
name: "HOST_ADDRESS and KUBELET_PORT override supplied",
envConfig: map[string]string{
"HOST_ADDRESS": "samplehost.testdomain.com",
"KUBELET_PORT": "12345",
},
expectedKubeletEndpoint: "http://samplehost.testdomain.com:12345/healthz",
expectedKubeProxyEndpoint: "http://samplehost.testdomain.com:10256/healthz",
},
{
name: "HOST_ADDRESS and KUBEPROXY_PORT override supplied",
envConfig: map[string]string{
"HOST_ADDRESS": "samplehost.testdomain.com",
"KUBEPROXY_PORT": "12345",
},
expectedKubeletEndpoint: "http://samplehost.testdomain.com:10248/healthz",
expectedKubeProxyEndpoint: "http://samplehost.testdomain.com:12345/healthz",
},
{
name: "HOST_ADDRESS, KUBELET_PORT and KUBEPROXY_PORT override supplied",
envConfig: map[string]string{
"HOST_ADDRESS": "10.0.10.1",
"KUBELET_PORT": "12345",
"KUBEPROXY_PORT": "12346",
},
expectedKubeletEndpoint: "http://10.0.10.1:12345/healthz",
expectedKubeProxyEndpoint: "http://10.0.10.1:12346/healthz",
},
}
for _, test := range testCases {
t.Run(test.name, func(t *testing.T) {
for key, val := range test.envConfig {
t.Setenv(key, val)
}
setKubeEndpoints()
kubeProxyHCEndpoint := KubeProxyHealthCheckEndpoint()
kubeletHCEndpoint := KubeletHealthCheckEndpoint()
assert.Equal(t, test.expectedKubeProxyEndpoint, kubeProxyHCEndpoint)
assert.Equal(t, test.expectedKubeletEndpoint, kubeletHCEndpoint)
})
}
}

View File

@ -0,0 +1,25 @@
//go:build unix
/*
Copyright 2021 The Kubernetes Authors All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package types
const (
DefaultCriCtl = "/usr/bin/crictl"
DefaultCriSocketPath = "unix:///var/run/containerd/containerd.sock"
UptimeTimeLayout = "Mon 2006-01-02 15:04:05 MST"
)

View File

@ -17,7 +17,7 @@ limitations under the License.
package types
const (
DefaultCriCtl = "C:/node/crictl.exe"
DefaultCriCtl = "C:/etc/kubernetes/node/bin/crictl.exe"
DefaultCriSocketPath = "npipe:////./pipe/containerd-containerd"
UptimeTimeLayout = "Mon 02 Jan 2006 15:04:05 MST"
LogParsingTimeFormat = "yyyy-MM-dd HH:mm:ss"

View File

@ -1,3 +1,4 @@
//go:build journald
// +build journald
/*
@ -22,7 +23,7 @@ import (
"fmt"
"time"
"k8s.io/apimachinery/pkg/util/clock"
"k8s.io/utils/clock"
"k8s.io/node-problem-detector/cmd/logcounter/options"
"k8s.io/node-problem-detector/pkg/logcounter/types"
@ -39,10 +40,11 @@ const (
)
type logCounter struct {
logCh <-chan *systemtypes.Log
buffer systemlogmonitor.LogBuffer
pattern string
clock clock.Clock
logCh <-chan *systemtypes.Log
buffer systemlogmonitor.LogBuffer
pattern string
revertPattern string
clock clock.Clock
}
func NewJournaldLogCounter(options *options.LogCounterOptions) (types.LogCounter, error) {
@ -58,10 +60,11 @@ func NewJournaldLogCounter(options *options.LogCounterOptions) (types.LogCounter
return nil, fmt.Errorf("error watching journald: %v", err)
}
return &logCounter{
logCh: logCh,
buffer: systemlogmonitor.NewLogBuffer(bufferSize),
pattern: options.Pattern,
clock: clock.RealClock{},
logCh: logCh,
buffer: systemlogmonitor.NewLogBuffer(bufferSize),
pattern: options.Pattern,
revertPattern: options.RevertPattern,
clock: clock.RealClock{},
}, nil
}
@ -83,6 +86,9 @@ func (e *logCounter) Count() (count int, err error) {
if len(e.buffer.Match(e.pattern)) != 0 {
count++
}
if e.revertPattern != "" && len(e.buffer.Match(e.revertPattern)) != 0 {
count--
}
case <-e.clock.After(timeout):
// Don't block forever if we do not get any new messages
return

View File

@ -1,3 +1,4 @@
//go:build journald
// +build journald
/*
@ -22,16 +23,16 @@ import (
"testing"
"time"
"k8s.io/apimachinery/pkg/util/clock"
testclock "k8s.io/utils/clock/testing"
"k8s.io/node-problem-detector/pkg/logcounter/types"
"k8s.io/node-problem-detector/pkg/systemlogmonitor"
systemtypes "k8s.io/node-problem-detector/pkg/systemlogmonitor/types"
)
func NewTestLogCounter(pattern string, startTime time.Time) (types.LogCounter, *clock.FakeClock, chan *systemtypes.Log) {
func NewTestLogCounter(pattern string, startTime time.Time) (types.LogCounter, *testclock.FakeClock, chan *systemtypes.Log) {
logCh := make(chan *systemtypes.Log)
clock := clock.NewFakeClock(startTime)
clock := testclock.NewFakeClock(startTime)
return &logCounter{
logCh: logCh,
buffer: systemlogmonitor.NewLogBuffer(bufferSize),

View File

@ -19,7 +19,7 @@ package problemdaemon
import (
"fmt"
"github.com/golang/glog"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/pkg/types"
)
@ -58,7 +58,7 @@ func NewProblemDaemons(monitorConfigPaths types.ProblemDaemonConfigPathMap) []ty
for _, config := range *configs {
if _, ok := problemDaemonMap[config]; ok {
// Skip the config if it's duplicated.
glog.Warningf("Duplicated problem daemon configuration %q", config)
klog.Warningf("Duplicated problem daemon configuration %q", config)
continue
}
problemDaemonMap[config] = handlers[problemDaemonType].CreateProblemDaemonOrDie(config)

View File

@ -17,16 +17,17 @@ limitations under the License.
package problemdetector
import (
"context"
"fmt"
"github.com/golang/glog"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/pkg/types"
)
// ProblemDetector collects statuses from all problem daemons and update the node condition and send node event.
type ProblemDetector interface {
Run(termCh <-chan error) error
Run(context.Context) error
}
type problemDetector struct {
@ -44,7 +45,7 @@ func NewProblemDetector(monitors []types.Monitor, exporters []types.Exporter) Pr
}
// Run starts the problem detector.
func (p *problemDetector) Run(termCh <-chan error) error {
func (p *problemDetector) Run(ctx context.Context) error {
// Start the log monitors one by one.
var chans []<-chan *types.Status
failureCount := 0
@ -52,7 +53,7 @@ func (p *problemDetector) Run(termCh <-chan error) error {
ch, err := m.Start()
if err != nil {
// Do not return error and keep on trying the following config files.
glog.Errorf("Failed to start problem daemon %v: %v", m, err)
klog.Errorf("Failed to start problem daemon %v: %v", m, err)
failureCount++
continue
}
@ -73,11 +74,11 @@ func (p *problemDetector) Run(termCh <-chan error) error {
}()
ch := groupChannel(chans)
glog.Info("Problem detector started")
klog.Info("Problem detector started")
for {
select {
case <-termCh:
case <-ctx.Done():
return nil
case status := <-ch:
for _, exporter := range p.exporters {

View File

@ -17,6 +17,7 @@ limitations under the License.
package problemdetector
import (
"context"
"testing"
"k8s.io/node-problem-detector/pkg/types"
@ -24,7 +25,7 @@ import (
func TestEmpty(t *testing.T) {
pd := NewProblemDetector([]types.Monitor{}, []types.Exporter{})
if err := pd.Run(nil); err == nil {
if err := pd.Run(context.Background()); err == nil {
t.Error("expected error when running an empty problem detector")
}
}

View File

@ -21,7 +21,7 @@ import (
"fmt"
"sync"
"github.com/golang/glog"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/pkg/util/metrics"
)
@ -56,7 +56,7 @@ func NewProblemMetricsManagerOrDie() *ProblemMetricsManager {
metrics.Sum,
[]string{"reason"})
if err != nil {
glog.Fatalf("Failed to create problem_counter metric: %v", err)
klog.Fatalf("Failed to create problem_counter metric: %v", err)
}
pmm.problemGauge, err = metrics.NewInt64Metric(
@ -67,7 +67,7 @@ func NewProblemMetricsManagerOrDie() *ProblemMetricsManager {
metrics.LastValue,
[]string{"type", "reason"})
if err != nil {
glog.Fatalf("Failed to create problem_gauge metric: %v", err)
klog.Fatalf("Failed to create problem_gauge metric: %v", err)
}
pmm.problemTypeToReason = make(map[string]string)

View File

@ -37,7 +37,8 @@ with new rule definition:
"type": "temporary/permanent",
"condition": "NodeConditionOfPermanentIssue",
"reason": "CamelCaseShortReason",
"message": "regexp matching the issue in the log"
"pattern": "regexp matching the issue in the log",
"patternGeneratedMessageSuffix": "Please check the network connectivity and ensure that all required services are running. For more details, see our documentation at https://example.com/docs/troubleshooting."
}
```

View File

@ -46,7 +46,7 @@ type MonitorConfig struct {
EnableMetricsReporting *bool `json:"metricsReporting,omitempty"`
}
// ApplyConfiguration applies default configurations.
// ApplyDefaultConfiguration applies default configurations.
func (mc *MonitorConfig) ApplyDefaultConfiguration() {
if mc.BufferSize == 0 {
mc.BufferSize = defaultBufferSize

View File

@ -18,16 +18,16 @@ package systemlogmonitor
import (
"encoding/json"
"io/ioutil"
"fmt"
"os"
"time"
"github.com/golang/glog"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/pkg/problemdaemon"
"k8s.io/node-problem-detector/pkg/problemmetrics"
"k8s.io/node-problem-detector/pkg/systemlogmonitor/logwatchers"
watchertypes "k8s.io/node-problem-detector/pkg/systemlogmonitor/logwatchers/types"
logtypes "k8s.io/node-problem-detector/pkg/systemlogmonitor/types"
systemlogtypes "k8s.io/node-problem-detector/pkg/systemlogmonitor/types"
"k8s.io/node-problem-detector/pkg/types"
"k8s.io/node-problem-detector/pkg/util"
@ -50,7 +50,7 @@ type logMonitor struct {
buffer LogBuffer
config MonitorConfig
conditions []types.Condition
logCh <-chan *logtypes.Log
logCh <-chan *systemlogtypes.Log
output chan *types.Status
tomb *tomb.Tomb
}
@ -62,21 +62,21 @@ func NewLogMonitorOrDie(configPath string) types.Monitor {
tomb: tomb.NewTomb(),
}
f, err := ioutil.ReadFile(configPath)
f, err := os.ReadFile(configPath)
if err != nil {
glog.Fatalf("Failed to read configuration file %q: %v", configPath, err)
klog.Fatalf("Failed to read configuration file %q: %v", configPath, err)
}
err = json.Unmarshal(f, &l.config)
if err != nil {
glog.Fatalf("Failed to unmarshal configuration file %q: %v", configPath, err)
klog.Fatalf("Failed to unmarshal configuration file %q: %v", configPath, err)
}
// Apply default configurations
(&l.config).ApplyDefaultConfiguration()
err = l.config.ValidateRules()
if err != nil {
glog.Fatalf("Failed to validate %s matching rules %+v: %v", l.configPath, l.config.Rules, err)
klog.Fatalf("Failed to validate %s matching rules %+v: %v", l.configPath, l.config.Rules, err)
}
glog.Infof("Finish parsing log monitor config file %s: %+v", l.configPath, l.config)
klog.Infof("Finish parsing log monitor config file %s: %+v", l.configPath, l.config)
l.watcher = logwatchers.GetLogWatcherOrDie(l.config.WatcherConfig)
l.buffer = NewLogBuffer(l.config.BufferSize)
@ -96,19 +96,19 @@ func initializeProblemMetricsOrDie(rules []systemlogtypes.Rule) {
if rule.Type == types.Perm {
err := problemmetrics.GlobalProblemMetricsManager.SetProblemGauge(rule.Condition, rule.Reason, false)
if err != nil {
glog.Fatalf("Failed to initialize problem gauge metrics for problem %q, reason %q: %v",
klog.Fatalf("Failed to initialize problem gauge metrics for problem %q, reason %q: %v",
rule.Condition, rule.Reason, err)
}
}
err := problemmetrics.GlobalProblemMetricsManager.IncrementProblemCounter(rule.Reason, 0)
if err != nil {
glog.Fatalf("Failed to initialize problem counter metrics for %q: %v", rule.Reason, err)
klog.Fatalf("Failed to initialize problem counter metrics for %q: %v", rule.Reason, err)
}
}
}
func (l *logMonitor) Start() (<-chan *types.Status, error) {
glog.Infof("Start log monitor %s", l.configPath)
klog.Infof("Start log monitor %s", l.configPath)
var err error
l.logCh, err = l.watcher.Watch()
if err != nil {
@ -119,7 +119,7 @@ func (l *logMonitor) Start() (<-chan *types.Status, error) {
}
func (l *logMonitor) Stop() {
glog.Infof("Stop log monitor %s", l.configPath)
klog.Infof("Stop log monitor %s", l.configPath)
l.tomb.Stop()
}
@ -134,20 +134,20 @@ func (l *logMonitor) monitorLoop() {
select {
case log, ok := <-l.logCh:
if !ok {
glog.Errorf("Log channel closed: %s", l.configPath)
klog.Errorf("Log channel closed: %s", l.configPath)
return
}
l.parseLog(log)
case <-l.tomb.Stopping():
l.watcher.Stop()
glog.Infof("Log monitor stopped: %s", l.configPath)
klog.Infof("Log monitor stopped: %s", l.configPath)
return
}
}
}
// parseLog parses one log line.
func (l *logMonitor) parseLog(log *logtypes.Log) {
func (l *logMonitor) parseLog(log *systemlogtypes.Log) {
// Once there is new log, log monitor will push it into the log buffer and try
// to match each rule. If any rule is matched, log monitor will report a status.
l.buffer.Push(log)
@ -157,16 +157,16 @@ func (l *logMonitor) parseLog(log *logtypes.Log) {
continue
}
status := l.generateStatus(matched, rule)
glog.Infof("New status generated: %+v", status)
klog.Infof("New status generated: %+v", status)
l.output <- status
}
}
// generateStatus generates status from the logs.
func (l *logMonitor) generateStatus(logs []*logtypes.Log, rule systemlogtypes.Rule) *types.Status {
func (l *logMonitor) generateStatus(logs []*systemlogtypes.Log, rule systemlogtypes.Rule) *types.Status {
// We use the timestamp of the first log line as the timestamp of the status.
timestamp := logs[0].Timestamp
message := generateMessage(logs)
message := generateMessage(logs, rule.PatternGeneratedMessageSuffix)
var events []types.Event
var changedConditions []*types.Condition
if rule.Type == types.Temp {
@ -192,6 +192,7 @@ func (l *logMonitor) generateStatus(logs []*logtypes.Log, rule systemlogtypes.Ru
condition.Type,
types.True,
rule.Reason,
message,
timestamp,
))
}
@ -207,14 +208,14 @@ func (l *logMonitor) generateStatus(logs []*logtypes.Log, rule systemlogtypes.Ru
for _, event := range events {
err := problemmetrics.GlobalProblemMetricsManager.IncrementProblemCounter(event.Reason, 1)
if err != nil {
glog.Errorf("Failed to update problem counter metrics for %q: %v", event.Reason, err)
klog.Errorf("Failed to update problem counter metrics for %q: %v", event.Reason, err)
}
}
for _, condition := range changedConditions {
err := problemmetrics.GlobalProblemMetricsManager.SetProblemGauge(
condition.Type, condition.Reason, condition.Status == types.True)
if err != nil {
glog.Errorf("Failed to update problem gauge metrics for problem %q, reason %q: %v",
klog.Errorf("Failed to update problem gauge metrics for problem %q, reason %q: %v",
condition.Type, condition.Reason, err)
}
}
@ -232,7 +233,7 @@ func (l *logMonitor) generateStatus(logs []*logtypes.Log, rule systemlogtypes.Ru
func (l *logMonitor) initializeStatus() {
// Initialize the default node conditions
l.conditions = initialConditions(l.config.DefaultConditions)
glog.Infof("Initialize condition generated: %+v", l.conditions)
klog.Infof("Initialize condition generated: %+v", l.conditions)
// Update the initial status
l.output <- &types.Status{
Source: l.config.Source,
@ -250,10 +251,14 @@ func initialConditions(defaults []types.Condition) []types.Condition {
return conditions
}
func generateMessage(logs []*logtypes.Log) string {
func generateMessage(logs []*systemlogtypes.Log, patternGeneratedMessageSuffix string) string {
messages := []string{}
for _, log := range logs {
messages = append(messages, log.Message)
}
return concatLogs(messages)
logMessage := concatLogs(messages)
if patternGeneratedMessageSuffix != "" {
return fmt.Sprintf("%s; %s", logMessage, patternGeneratedMessageSuffix)
}
return logMessage
}

View File

@ -26,6 +26,7 @@ import (
"k8s.io/node-problem-detector/pkg/problemdaemon"
"k8s.io/node-problem-detector/pkg/problemmetrics"
logtypes "k8s.io/node-problem-detector/pkg/systemlogmonitor/types"
systemlogtypes "k8s.io/node-problem-detector/pkg/systemlogmonitor/types"
"k8s.io/node-problem-detector/pkg/types"
"k8s.io/node-problem-detector/pkg/util"
"k8s.io/node-problem-detector/pkg/util/metrics"
@ -84,6 +85,7 @@ func TestGenerateStatusForConditions(t *testing.T) {
testConditionA,
types.True,
"test reason",
"test message 1\ntest message 2",
time.Unix(1000, 1000),
)},
Conditions: []types.Condition{
@ -698,3 +700,40 @@ func TestInitializeProblemMetricsOrDie(t *testing.T) {
})
}
}
func TestGenerateMessage(t *testing.T) {
tests := []struct {
name string
logs []*systemlogtypes.Log
patternGeneratedMessageSuffix string
want string
}{
{
name: "No rule message",
logs: []*systemlogtypes.Log{
{Message: "First log message"},
{Message: "Second log message"},
},
patternGeneratedMessageSuffix: "",
want: "First log message\nSecond log message",
},
{
name: "With rule message",
logs: []*systemlogtypes.Log{
{Message: "First log message"},
{Message: "Second log message"},
},
patternGeneratedMessageSuffix: "refer www.foo.com/docs for playbook on how to fix the issue",
want: "First log message\nSecond log message; refer www.foo.com/docs for playbook on how to fix the issue",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := generateMessage(tt.logs, tt.patternGeneratedMessageSuffix)
if got != tt.want {
t.Errorf("generateMessage() = %v, want %v", got, tt.want)
}
})
}
}

View File

@ -23,8 +23,7 @@ import (
"strings"
"time"
utilclock "code.cloudfoundry.org/clock"
"github.com/golang/glog"
"k8s.io/klog/v2"
"k8s.io/node-problem-detector/pkg/systemlogmonitor/logwatchers/types"
logtypes "k8s.io/node-problem-detector/pkg/systemlogmonitor/types"
@ -40,7 +39,6 @@ type filelogWatcher struct {
logCh chan *logtypes.Log
startTime time.Time
tomb *tomb.Tomb
clock utilclock.Clock
}
// NewSyslogWatcherOrDie creates a new log watcher. The function panics
@ -48,11 +46,11 @@ type filelogWatcher struct {
func NewSyslogWatcherOrDie(cfg types.WatcherConfig) types.LogWatcher {
uptime, err := util.GetUptimeDuration()
if err != nil {
glog.Fatalf("failed to get uptime: %v", err)
klog.Fatalf("failed to get uptime: %v", err)
}
startTime, err := util.GetStartTime(time.Now(), uptime, cfg.Lookback, cfg.Delay)
if err != nil {
glog.Fatalf("failed to get start time: %v", err)
klog.Fatalf("failed to get start time: %v", err)
}
return &filelogWatcher{
@ -62,7 +60,6 @@ func NewSyslogWatcherOrDie(cfg types.WatcherConfig) types.LogWatcher {
tomb: tomb.NewTomb(),
// A capacity 1000 buffer should be enough
logCh: make(chan *logtypes.Log, 1000),
clock: utilclock.NewClock(),
}
}
@ -77,7 +74,7 @@ func (s *filelogWatcher) Watch() (<-chan *logtypes.Log, error) {
}
s.reader = bufio.NewReader(r)
s.closer = r
glog.Info("Start watching filelog")
klog.Info("Start watching filelog")
go s.watchLoop()
return s.logCh, nil
}
@ -102,14 +99,14 @@ func (s *filelogWatcher) watchLoop() {
for {
select {
case <-s.tomb.Stopping():
glog.Infof("Stop watching filelog")
klog.Infof("Stop watching filelog")
return
default:
}
line, err := s.reader.ReadString('\n')
if err != nil && err != io.EOF {
glog.Errorf("Exiting filelog watch with error: %v", err)
klog.Errorf("Exiting filelog watch with error: %v", err)
return
}
buffer.WriteString(line)
@ -119,16 +116,28 @@ func (s *filelogWatcher) watchLoop() {
}
line = buffer.String()
buffer.Reset()
if s.filterSkipList(line) {
continue
}
log, err := s.translator.translate(strings.TrimSuffix(line, "\n"))
if err != nil {
glog.Warningf("Unable to parse line: %q, %v", line, err)
klog.Warningf("Unable to parse line: %q, %v", line, err)
continue
}
// Discard messages before start time.
if log.Timestamp.Before(s.startTime) {
glog.V(5).Infof("Throwing away msg %q before start time: %v < %v", log.Message, log.Timestamp, s.startTime)
klog.V(5).Infof("Throwing away msg %q before start time: %v < %v", log.Message, log.Timestamp, s.startTime)
continue
}
s.logCh <- log
}
}
func (s *filelogWatcher) filterSkipList(line string) bool {
for _ , skipItem := range s.cfg.SkipList {
if strings.Contains(line, skipItem) {
return true
}
}
return false
}

View File

@ -0,0 +1,29 @@
/*
Copyright 2023 The Kubernetes Authors All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package filelog
import (
"io"
"github.com/hpcloud/tail"
)
// getLogReader returns log reader for filelog log. Note that getLogReader doesn't look back
// to the rolled out logs.
func getLogReader(path string) (io.ReadCloser, error) {
return tail.OpenFile(path)
}

View File

@ -19,9 +19,8 @@ package filelog
import (
"fmt"
"io"
"k8s.io/node-problem-detector/third_party/forked/cadvisor/tail"
"os"
"github.com/google/cadvisor/utils/tail"
)
// getLogReader returns log reader for filelog log. Note that getLogReader doesn't look back

View File

@ -17,7 +17,6 @@ limitations under the License.
package filelog
import (
"io/ioutil"
"os"
"testing"
"time"
@ -26,8 +25,8 @@ import (
logtypes "k8s.io/node-problem-detector/pkg/systemlogmonitor/types"
"k8s.io/node-problem-detector/pkg/util"
"code.cloudfoundry.org/clock/fakeclock"
"github.com/stretchr/testify/assert"
testclock "k8s.io/utils/clock/testing"
)
// getTestPluginConfig returns a plugin config for test. Use configuration for
@ -43,7 +42,7 @@ func getTestPluginConfig() map[string]string {
func TestWatch(t *testing.T) {
// now is a fake time
now := time.Date(time.Now().Year(), time.January, 2, 3, 4, 5, 0, time.Local)
fakeClock := fakeclock.NewFakeClock(now)
fakeClock := testclock.NewFakeClock(now)
testCases := []struct {
uptime time.Duration
lookback string
@ -139,7 +138,7 @@ Jan 2 03:04:05 kernel: [2.000000] 3
}
for c, test := range testCases {
t.Logf("TestCase #%d: %#v", c+1, test)
f, err := ioutil.TempFile("", "log_watcher_test")
f, err := os.CreateTemp("", "log_watcher_test")
assert.NoError(t, err)
defer func() {
f.Close()
@ -156,8 +155,6 @@ Jan 2 03:04:05 kernel: [2.000000] 3
})
// Set the startTime.
w.(*filelogWatcher).startTime, _ = util.GetStartTime(fakeClock.Now(), test.uptime, test.lookback, test.delay)
// Set the fake clock.
w.(*filelogWatcher).clock = fakeClock
logCh, err := w.Watch()
assert.NoError(t, err)
defer w.Stop()
@ -170,7 +167,7 @@ Jan 2 03:04:05 kernel: [2.000000] 3
}
}
// The log channel should have already been drained
// There could stil be future messages sent into the channel, but the chance is really slim.
// There could still be future messages sent into the channel, but the chance is really slim.
timeout := time.After(100 * time.Millisecond)
select {
case log := <-logCh:
@ -179,3 +176,36 @@ Jan 2 03:04:05 kernel: [2.000000] 3
}
}
}
func TestFilterSkipList(t *testing.T) {
s := &filelogWatcher{
cfg: types.WatcherConfig{
SkipList: []string{
" audit:", " kubelet:",
},
},
}
testcase := []struct{
log string
expect bool
}{
{
log: `Jan 2 03:04:03 kernel: [0.000000] 1`,
expect: false,
},
{
log: `Jan 2 03:04:04 audit: [1.000000] 2`,
expect: true,
},
{
log: `Jan 2 03:04:05 kubelet: [2.000000] 3`,
expect: true,
},
}
for i, test := range testcase {
if s.filterSkipList(test.log) != test.expect {
t.Errorf("test case %d: expect %v but got %v", i, test.expect, s.filterSkipList(test.log))
}
}
}

Some files were not shown because too many files have changed in this diff Show More