Commit Graph

92 Commits

Author SHA1 Message Date
Harrison b8c28d00ff
Merge pull request #201 from HarrisonWAffel/retries-after-restart
Add ResetFailureCountOnServiceRestart
2024-10-17 12:25:14 -04:00
Harrison Affel fb4a027b4d Add ResetFailureCountOnServiceRestart, if true reset plan failure count after each restart of the system-agent 2024-10-16 14:12:27 -04:00
Harrison Affel 7300df0e0e Add tests and update CI 2024-10-01 15:06:03 -04:00
Harrison Affel bc9bd0b463 Windows updates 2024-09-23 17:44:11 -04:00
Jiaqi Luo befb1d33b2 Migrate from Drone to GitHub Action 2024-07-01 13:17:57 -07:00
Chris Kim 3bf716f8e0
Add support for CATTLE_AGENT_STRICT_VERIFY|STRICT_VERIFY environment variables to ensure kubeconfig CA data is valid (#171)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2024-06-27 11:29:39 -07:00
Peter Matseykanets 41c07d0600
Update Go to 1.21 and deps for k8s 1.27 (#152)
Ref: https://github.com/rancher/rancher/issues/43318
2024-02-26 16:21:27 -05:00
Chris Kim 806ef425e0
Add interlocks to ensure operations are not interrupted (#150)
* Add interlocks to ensure system-agent does not get restarted when it is applying a plan and does not start applying a plan when a restart is pending
* Remove s390x from drone file
* Don't always set CROSS to true when building

Signed-off-by: Chris Kim <oats87g@gmail.com>
2023-12-12 14:18:36 -08:00
Brad Davidson 3d8c2b53c8 Fix repeated time parse error on probes that have not yet run successfully
Check that last successful run time is not an empty string before trying to parse it.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-09-19 16:04:49 -07:00
Chris Kim 9e827a59b8
Add CATTLE_AGENT_ATTEMPT_NUMBER environment variable that corresponds to failure count for K8s plan application (#115)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2023-05-19 08:37:35 -07:00
Chris Kim e696ff63fe
Retry update with latest secret if plan still matches the applied plan (#114)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2023-05-04 17:01:12 -07:00
Chris Kim e57338eef9
Add error handling logic that handles edge cases to force the system-agent to restart if we encounter non-transient errors. Disallow the K8s watcher from manipulating a secret when the UID changes. (#112)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2023-04-19 08:42:44 -07:00
Chris Kim 24c523a440
Bump golang to 1.19.4-alpine3.17, rancher/wharfie to v0.5.3, and dapper to v0.6.0 (#102)
* Bump golang to 1.19.4-alpine3.17, rancher/wharfie to v0.5.3, and dapper to v0.6.0
* bump golangci-lint
* fix validate script
* fix CI for validation to run go fmt

Signed-off-by: Chris Kim <oats87g@gmail.com>
2023-01-06 09:19:35 -08:00
Jake Hyde 9f22484617 Add back TLSClientConfig to transport 2022-09-29 18:30:47 -04:00
Jake Hyde 7a0853f892 Add proxy to validate rest config 2022-09-26 21:17:18 -04:00
Jamie Phillips 29a9cda11c This fixes the TLS handshake on Windows. 2022-07-29 16:25:27 -04:00
Ross Kirkpatrick bbb696911e
Bump to go1.18 and client-go 1.24, remove windows-specific x509 logic (#86)
* initial 1.24 k8s support plus go1.18

* fix gocr and wharfie versioning

* bump dapper

* bump go version for builds to 1.18.3, bump alpine

* handle if

* fix wharfie and gocr version pins

* bump golangci-lint to 1.18 compat

* revert dapper bump for arm
2022-07-13 11:50:29 -04:00
Donnie Adams a509971a10 Fix nil-pointer dereference on windows context
Passing certContext to the deferred function ensures that the value will
be taken at that point. The issue is that it is nil when the deferred
function is called.

This change will capture the variable so its real value is passed
2022-05-06 13:57:08 -07:00
Ross Kirkpatrick 46fbba3a20
add windows support for root CA cert stores (#84)
* add windows support for system cert stores

* fix comment on unix prober

* ensure we loop over every certificate, add nil check

* clean up buffer logic

* nil check for unix prober

* fix goimports

* better code comments

* additional nil check

* check certcontext length

* init new cert pool in case of nil in prober
2022-05-05 17:42:29 -04:00
Chris Kim 5710abb984
Increase max periodic cooldown duration, tidy applyinator, and add debug messages (#81)
Increase max periodic cooldown duration, tidy applyinator, and add debug messages
Signed-off-by: Chris Kim <oats87g@gmail.com>
2022-04-13 15:16:51 -07:00
Chris Kim 2c80536ae1
add max-retries and periodic cooldown (#80)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2022-04-13 14:18:06 -07:00
Chris Kim 00181cd06b
Correctly pick up on failed apply (#79)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2022-03-08 10:52:51 -08:00
Chris Kim 05d9e51b0b
Move log messages around to prevent unnecessarily redundant messages (#78)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2022-03-08 06:38:07 -08:00
Chris Kim 414141a983
create directory for applied plans before listing the directory (#77)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2022-03-04 15:41:36 -07:00
Chris Kim ad6e3be9b8
Only write applied plan contents if the plan actually changes (#75)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2022-03-03 12:39:43 -07:00
Chris Kim b1f6aced9e
change field name (#73)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2022-02-15 13:03:21 -08:00
Chris Kim 278280e64b
Only set LastRunTime if periodic output was successful (#72)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2022-02-15 11:19:25 -08:00
Chris Kim 559b61591c
Set default period to 600 seconds for periodic instructions (#70)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2022-02-08 13:32:10 -08:00
Chris Kim bd24a4886e
Enhance system-agent for production readiness and periodic probe (#68)
* Add periodic instructions (and move existing instructions to OneTimeInstructions)
* Add retention policy for applied plan
* Add more clarity for log messages

Signed-off-by: Chris Kim <oats87g@gmail.com>
2022-02-08 08:53:06 -08:00
Chris Kim fda6e6a636 K8splan should run probes
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-08 10:39:56 -08:00
Jamie Phillips bd7429c081
Fixes various Windows specific bugs discovered during test.
File permissions on Windows don't behave the same as on Linux so those needed adjusting. 

Wharfie wasn't passing the OS information so the correct image wasn't being pulled for Windows.


Signed-off-by: Jamie Phillips <jamie.phillips@suse.com>
2021-11-05 12:54:14 -04:00
Chris Kim 37031465c0
Fix unnecessary writing of successful output to failure in the event that the prior plan had failed but the current one is successful (#52)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-08-24 15:17:22 -07:00
Brian Downs 7662889c43
update type assertion to prevent panic (#50)
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-08-23 15:16:26 -07:00
Chris Kim 20d93d3bab
Add failure handling to system-agent (#51)
* Deal with failure
* Enhance system agent to store and handle failure cases

Signed-off-by: Chris Kim <oats87g@gmail.com>
Co-authored-by: Brian Downs <brian.downs@gmail.com>
2021-08-23 15:16:12 -07:00
Jamie Phillips 0a4420a4c2
Merge pull request #46 from phillipsj/feature/windows-compilation
Adding Windows builds and compiles.
2021-08-20 08:24:59 -04:00
Darren Shepherd 1c0a7aac71 Compare resourceVersions as int 2021-08-18 21:33:43 -07:00
Jamie Phillips ee713d8bfe
Adding Windows builds and compiles.
Signed-off-by: Jamie Phillips <jamie.phillips@suse.com>
2021-08-18 20:39:49 -04:00
Chris Kim cbfd68459e
Don't apply if resource version is incorrect (#43)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-08-06 14:34:09 -07:00
Darren Shepherd c8e51740fe
Wait for command I/O to complete before check command exit code (#42) 2021-08-03 10:29:13 -07:00
Chris Kim 8bb75cf28a
Remove CA data if initial connection attempt fails and provide more context when unable to connect to Rancher (#41)
* Nullify ca data if initial connection fails
* Add own validate KC
* Only perform PUT if secret is actually changed
* Change message from debug to info and fix imports

Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-08-03 10:28:33 -07:00
Chris Kim 81ca4e28c1 Check probes regardless of resource version
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-06-18 16:29:40 -04:00
Chris Kim bd8caa5944 Initialize probeStatuses if not already
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-06-18 15:40:42 -04:00
Chris Kim 34a1c42b97 Perform safer secret processing
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-06-18 14:18:23 -04:00
Chris Kim 5777eca782 Check K8s cluster is healthy before proceeding to watch for remote K8s plans
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-06-15 21:43:59 -04:00
Darren Shepherd 60aedcadbb Move probe logic to reusable method to be used from rancherd 2021-06-15 18:31:16 -07:00
Darren Shepherd f6a8502ce6 Ignore last scanner err
We don't want i/o error to fail execution of the command. Race conditions
can cause the program to exit successfully but the I/O to fail.
2021-06-15 18:30:48 -07:00
Chris Kim 80bc81bbf1 Turn off local plan parser by default and set directory and file permissions to be a little more restrictive
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-06-09 15:24:46 -04:00
Chris Kim b622b599bf Add empty working directory support
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-06-07 14:07:38 -04:00
Chris Kim 1faec3e116 simplify duration parsing to just multiplication
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-06-01 13:24:22 -04:00
Chris Kim 391cdf8014 use proper duration parsing to ensure that probes do not immediately timeout
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-06-01 13:19:50 -04:00