Compare commits

...

201 Commits
v4.3.1 ... main

Author SHA1 Message Date
Podman Bot cc7d9b2a26 Add certificate for mohanboddu from containers/automation_sandbox (PR #122)
Signed-off-by: Podman Bot <podman.bot@example.com>
2025-08-12 12:37:41 -04:00
Podman Bot 0af8676cb8 Add certificate for mohanboddu from containers/automation_sandbox (PR #122)
Signed-off-by: Podman Bot <podman.bot@example.com>
2025-08-12 12:35:04 -04:00
Matt Heon f55fe34cfb
Merge pull request #251 from mohanboddu/fix-html
Fixing the PR link certificate_generator.html
2025-08-06 16:12:04 -04:00
Mohan Boddu 987689cc34 Fixing the PR link certificate_generator.html
Signed-off-by: Mohan Boddu <mboddu@redhat.com>
2025-08-06 15:40:43 -04:00
Neil Smith cb12019fba
Add certificate generator for first-time contributors (#249)
Add certificate generator for first-time contributors

This adds a web-based certificate generator to celebrate first-time
contributors to containers organization projects. The generator includes:

- Interactive HTML interface for creating certificates
- Customizable certificate template with Podman branding
- Real-time preview and HTML download functionality

The certificates can be used to recognize and celebrate community
members who make their first contribution to any containers project.
2025-07-17 17:48:30 +02:00
Paul Holzinger e1231d1520
Merge pull request #248 from containers/renovate/urllib3-2.x
chore(deps): update dependency urllib3 to <2.5.1
2025-06-18 19:17:55 +02:00
renovate[bot] b0959cb192
chore(deps): update dependency urllib3 to <2.5.1
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2025-06-18 17:01:57 +00:00
Paul Holzinger 7f213bf685
Merge pull request #247 from Luap99/macos-go
Revert "mac_pw_pool: hotfix go install"
2025-06-05 21:09:23 +02:00
Paul Holzinger 79e68ef97c
Revert "mac_pw_pool: hotfix go install"
This reverts commit d805c0c822.

Podman should build on 5.5 and main again due
db65baaa21

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2025-06-05 18:17:47 +02:00
Paul Holzinger aba42ca8ff
Merge pull request #246 from Luap99/macos-go
mac_pw_pool: hotfix go install
2025-05-07 20:15:05 +02:00
Paul Holzinger d805c0c822
mac_pw_pool: hotfix go install
We have to pin back the go version as it contains a regression that
causes podman compile failures.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2025-05-07 18:47:19 +02:00
Paul Holzinger e83dcfcabf
Merge pull request #243 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.6.2
2025-04-15 17:34:35 +02:00
renovate[bot] 7f13540563
[skip-ci] Update actions/upload-artifact action to v4.6.2
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2025-04-15 15:32:37 +00:00
Paul Holzinger 50c43af45e
Merge pull request #237 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.6.0
2025-04-15 17:32:13 +02:00
Paul Holzinger cd259102d4
Merge pull request #240 from containers/renovate/urllib3-2.x
chore(deps): update dependency urllib3 to <2.4.1
2025-04-15 17:31:51 +02:00
renovate[bot] 051f0951f1
chore(deps): update dependency urllib3 to <2.4.1
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2025-04-15 14:47:01 +00:00
Paul Holzinger e8a30ae1ea
Merge pull request #242 from Luap99/comment-action
github: fix wrong action call
2025-04-15 16:43:39 +02:00
Paul Holzinger a4888b2ce9
github: fix wrong action call
Missed one place where I had to replace the arguments.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2025-04-15 16:31:54 +02:00
Paul Holzinger 8faa8b216c
Merge pull request #241 from Luap99/comment-action
github: use thollander/actions-comment-pull-request
2025-04-15 15:45:36 +02:00
Paul Holzinger fd6f70913e
action: debug retropective
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2025-04-15 15:33:19 +02:00
Paul Holzinger f3777be65b
github: use thollander/actions-comment-pull-request
jungwinter/comment doesn't seem very much maintained and makes use of
the deprecated set-output[1].

[1] https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2025-04-15 15:04:21 +02:00
Paul Holzinger 16f757f699
Merge pull request #239 from Luap99/go
renovate: update to go 1.23
2025-03-13 11:23:52 +01:00
Paul Holzinger 26ab1b7744
renovate: update to go 1.23
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2025-03-12 17:33:30 +01:00
Paul Holzinger 994ba027c2
Merge pull request #238 from Luap99/zstd
mac_pw_pool: add zstd
2025-02-18 15:23:54 +01:00
Paul Holzinger fa70d9e3af
ci: remove python3-flake8-docstrings
This package no longer exists in fedora.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2025-02-18 15:07:33 +01:00
Paul Holzinger 3e2662f02b
mac_pw_pool: add zstd
The new macos 15 base image does not contain it and the repo_prep in
podman is failing because we need it to compress the tar with it.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2025-02-18 14:55:33 +01:00
renovate[bot] 0f5226e050
[skip-ci] Update actions/upload-artifact action to v4.6.0
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2025-01-10 17:47:14 +00:00
Paul Holzinger 24800f0f77
Merge pull request #236 from containers/renovate/urllib3-2.x
chore(deps): update dependency urllib3 to <2.3.1
2025-01-06 18:47:10 +01:00
Paul Holzinger 5ae1659c96
Merge pull request #235 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.5.0
2025-01-06 18:46:47 +01:00
renovate[bot] 3c034bcadc
chore(deps): update dependency urllib3 to <2.3.1
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-12-22 10:19:41 +00:00
renovate[bot] 7067540a52
[skip-ci] Update actions/upload-artifact action to v4.5.0
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-12-17 22:43:35 +00:00
Paul Holzinger e3c74c2aa4
Merge pull request #234 from Luap99/renovate
renovate: remove edsantiago as default reviewer
2024-11-26 16:02:45 +01:00
Paul Holzinger 8c5bb22af7
Merge pull request #233 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.4.3
2024-11-26 15:54:18 +01:00
Paul Holzinger 3b33514d26
Merge pull request #231 from containers/renovate/urllib3-2.x
chore(deps): update dependency urllib3 to <2.2.4
2024-11-26 15:53:56 +01:00
Paul Holzinger 973aa8c2fe
renovate: remove edsantiago as default reviewer
Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-11-26 15:50:34 +01:00
Ed Santiago 4d23dd41f0
Merge pull request #232 from Luap99/image-update-reviewers
renovate: update image update PR reviewers
2024-10-11 11:54:49 -06:00
renovate[bot] b9186a2b38
[skip-ci] Update actions/upload-artifact action to v4.4.3
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-10-11 12:24:16 +00:00
Paul Holzinger 8b1776b799
renovate: update image update PR reviewers
Chris no longer works on our team and has no time to review them. Add
Ed and myself as reviewers for these PR (we already reviewed them
anyway) so we get a notification for all PRs and do not miss them.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-10-11 11:44:18 +02:00
Paul Holzinger 8218f24c4d
Merge pull request #226 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.4.0
2024-10-11 11:38:02 +02:00
Paul Holzinger 8f39f4b1af
Merge pull request #230 from containers/renovate/ubuntu-24.x
chore(deps): update dependency ubuntu to v24
2024-10-11 11:37:35 +02:00
renovate[bot] 99d1c2662e
chore(deps): update dependency urllib3 to <2.2.4
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-10-11 09:36:37 +00:00
Paul Holzinger 32b94cedea
Merge pull request #228 from containers/renovate/urllib3-2.x
chore(deps): update dependency urllib3 to v2
2024-10-11 11:36:21 +02:00
Paul Holzinger 5ad53bd723
Merge pull request #229 from cevich/rm_renovate_cevich
Remove renovate cevich auto-assign
2024-10-11 11:35:35 +02:00
renovate[bot] 24a62a63d3
chore(deps): update dependency ubuntu to v24
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-09-26 19:23:47 +00:00
Chris Evich ab1f7624a0
Remove renovate cevich auto-assign
Previously renovate auto-assigned all updates in this repo to cevich
who's no longer on the team.  Fix this, and update the container FQIN
comment to a non-docker-hub location (to avoid rate-limit restrictions).

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-09-05 17:28:17 -04:00
renovate[bot] 35a29e5dfe
chore(deps): update dependency urllib3 to v2
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-09-05 19:11:08 +00:00
Ed Santiago 657247095b
Merge pull request #227 from Luap99/go-1.22
renovate: update to go 1.22
2024-09-05 13:10:48 -06:00
Paul Holzinger cc18e81abf
fix skopeo exit code
A change[1] in skope made it exit with 2 if the image is not found so
fix the test assumption here.

[1] 16b6f0ade5

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-09-04 18:49:58 +02:00
Paul Holzinger d2e5f7815e
remove broken timebomb test
This test doesn't wotk when run before 1pm UTC only after, we could add
a +24 hours but it is not clear what the purpose of this function is so
just remove it. We know that timebomb seems to work good enough in
practice and regressions are unlikely.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-09-04 15:08:44 +02:00
Paul Holzinger 48c9554a6c
renovate: update to go 1.22
We have pinned the renovate go version to the lowest version we support
as otherwise it will create PRs that update to new go version which we
always want to take care of manually as it included more changes
usually. While it doesn't prevent renovate from creating these PRs they
always fail as it cannot update to a newer go version so it is clear to
reviewers what is going on.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-09-04 14:39:57 +02:00
Paul Holzinger 0a0bc4f395
renovate: remove CI:DOCS from linter updates
Podman no longer uses CI:DOCS as it skips based on source changes.
As such this title doesn't add anything besides confusion why it is
there.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
2024-09-04 14:30:52 +02:00
renovate[bot] b8969128d0
[skip-ci] Update actions/upload-artifact action to v4.4.0
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-08-30 19:20:40 +00:00
Chris Evich 4739c8921c
Merge pull request #222 from cevich/deduplicate_pw_pool_docs
De-duplicate PW pool readme
2024-08-12 14:02:37 -04:00
Chris Evich 34ea41cc7f
De-duplicate PW pool readme
Several sections and individual items were duplicated or did not belong
in this file.  They've been moved to the private google-doc linked
in the "Prerequisites" section and included in the monitoring
website `index.html`.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-08-12 13:32:45 -04:00
Ed Santiago ee5fba7664
Merge pull request #221 from cevich/mac_pw_pool_worker_docs
Add debugging section to PW pool docs
2024-08-12 06:33:09 -06:00
Chris Evich 34e2995cd7
Add debugging section to PW pool docs
Signed-off-by: Chris Evich <cevich@redhat.com>
2024-08-08 14:43:46 -04:00
Chris Evich 51a2c1fbed
Merge pull request #217 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.3.6
2024-08-06 15:46:07 -04:00
Chris Evich 718ecdb04e
Merge pull request #220 from cevich/mac_pw_pool_fix_max_tasks
Fix possible max-tasks PW pool cascade failure
2024-08-06 14:17:45 -04:00
Chris Evich 7ae84eb74c
Fix possible max-tasks PW pool cascade failure
For integrity and safety reasons, there are multiple guardrails in place
to limit the potential damage of a rogue/broken/misconfigured worker
instance may cause. One of these restrictions is a maximum limit on the
number of tasks that a worker may execute. However, if the pool is
experiencing extraordinary utilization, it's possible that a large number
of workers could encounter this limit at/near the same time. Assuming the
pool load remains high, this will then further shorten the lifetime of
the remaining online instances.

Also:

* Double the limit on allowed tasks (12 was too small based on heavy
  utilization).
* Double the allowed setup time to account for network slowdowns.
* Show both the soft and hard uptime limits for each worker.
* Issue warning if worker exceeds soft uptime limit.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-08-06 14:09:59 -04:00
renovate[bot] d81a56f85b
[skip-ci] Update actions/upload-artifact action to v4.3.6
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-08-06 15:23:33 +00:00
Chris Evich 27f6f9363f
Merge pull request #216 from cevich/mac_pw_pool_fix_shutdown_timeout
Fix instance shutdown never timing out
2024-08-06 11:23:14 -04:00
Chris Evich 1b35e0e24d
Fix instance shutdown never timing out
In an attempt to try and prevent terminating an instance while a CI task
is running, the shutdown script checks for the existance of an agent
process.  Previously a calculation of a timeout for this delay was
stored, however it was never actually used.  Fix this by aborting the
delay after the timeout has expired.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-08-05 17:00:18 -04:00
Chris Evich 2c1ee35362
Merge pull request #211 from cevich/mac_pw_pool_fix_force_stagger
Fix extending PW beyond PW_MAX_HOURS
2024-08-05 16:59:30 -04:00
Chris Evich 447f70e9c7
Fix extending PW beyond PW_MAX_HOURS
Previously when using the `--force` option to `SetupInstances.sh` each
instance created would have its lifetime extended by
`$CREATE_STAGGER_HOURS`. For any instance beyond the first, that will
immediately put it beyond the `$PW_MAX_HOURS` hard-limit.  Eventually
this will result in multiple instances going offline at the same time,
which is undesirable.

Fix this by staggering instance lifetimes with decreasing values instead.
Include extra checks to make sure the value remains positive and sane.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-08-05 16:47:31 -04:00
Chris Evich 1809c5b6c0
Merge pull request #212 from cevich/mac_pw_pool_confirm_ssh_agent
Fail loudly when ssh-agent not running
2024-08-05 15:40:53 -04:00
Chris Evich c552d5bba1
Fail loudly when ssh-agent not running
The agent is required to keep the public key secure since the local and
remote user has sudo access.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-08-05 15:34:16 -04:00
Chris Evich 3568a50f52
Merge pull request #213 from cevich/mac_pw_pool_fix_pub_dns
Fix 'Expecting pub_dns to be set/non-empty' error
2024-08-05 15:31:57 -04:00
Chris Evich 436dceb68f
Fix 'Expecting pub_dns to be set/non-empty' error
While processing instances, if the script encounters an instance running
past PW_MAX_HOURS, it will force-terminate it.  However, this check was
happening before the script had looked up the required 'pub_dns' value.
Fix this by relocating the check.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-08-05 15:22:42 -04:00
Chris Evich 13be11668c
Merge pull request #214 from cevich/mac_pw_pool_fix_deadlock
Fix deadlock induced MacOS PW Pool collapse
2024-08-05 15:18:20 -04:00
Chris Evich 47a5015b07
Fix deadlock induced MacOS PW Pool collapse
Every night a script runs to check and possibly update all the scripts
in the repo.  When this happens, two important activities take place:

1. The script is restarted (presuming it's own code changed).
2. The container running nginx (for the usage graph) is restarted.

For unknown reasons, possibly due to a system update, a pasta
(previously slirp4netns) sub-process spawned by podman is holding open
the lock-file required by both the maintenance script and the (very
important) `Cron.sh`.  This leads to a deadlock situation where
the entire pool becomes unmanaged since `Cron.sh` can't run.

To prevent unchecked nefarious/unintended use, all workers automatically
recycle themselves after some time should they become unmanaged.
Therefore, without `Cron.sh` operating, the entire pool will eventually
collapse.

Though complex, as a (hopefully) temporary fix, ensure all non-stdio FDs
are closed (in a sub-shell) prior to restarting the container.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-08-05 15:08:26 -04:00
Chris Evich b0dde0f4fc
Merge pull request #210 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.3.5
2024-08-05 13:40:40 -04:00
Chris Evich 689cfa189c
Merge pull request #215 from cevich/fix_build-push_test
Fix build-push CI env setup failure
2024-08-05 13:15:06 -04:00
Chris Evich bb3343c0c4
Fix build-push CI env setup failure
For whatever reason, the `registries.conf` alias setup is no-longer
working and the docker rate-limiting is causing CI breakage.  Fix this
by simplifying to pulling directly from the google proxy.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-08-05 13:07:03 -04:00
renovate[bot] b1d7d1d447
[skip-ci] Update actions/upload-artifact action to v4.3.5
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-08-02 16:59:31 +00:00
Chris Evich 256fefe0dd
Merge pull request #208 from cevich/libkrun_on_mac_pw_pool
Mac PW Pool: Install libkrun (krunkit)
2024-07-31 16:27:27 -04:00
Chris Evich 11359412d4
Mac PW Pool: Install libkrun (krunkit)
In order to test accessibility of the host GPU inside a podman machine
container, it's necessary to install support for krun.  However, since
the list of brew recipes is ever growing, split it up into sections with
comments explaining why each is necessary and what uses it.

Also fix a minor bug WRT re-running setup with already disabled
softwareupdate.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-07-31 16:06:27 -04:00
Chris Evich 378249996e
Merge pull request #209 from cevich/fix_renovate_validation
Fix running renovate-config-validator
2024-07-31 15:19:28 -04:00
Chris Evich 12b7b27dda
Fix running renovate-config-validator
Newer renovate container images place the binary elsewhere, resulting in
this check encountering a file-not-found error.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-07-31 15:06:40 -04:00
Chris Evich 720ba14043
Merge pull request #207 from cevich/manual_testing_mac
Mac PW Pool: Add testing helper script
2024-07-16 14:20:36 -04:00
Chris Evich a69abee410
Mac PW Pool: Add testing helper script
Previously a lot of intricate and painful steps were requred to setup a
Mac dedicated-host for testing.  Make this process easier with a script
that does most of the work.  Update documentation accordingly.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-07-16 13:45:34 -04:00
Chris Evich 399120c350
Mac PW Pool: Allow variable DH name prefixes
Previously every dedicated-host and instance was named with the prefix
`MacM1`.  Support management of other sets of DHs with different
prefixes by turning this value into a variable.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-07-16 13:45:34 -04:00
Ed Santiago 4302d62c26
Merge pull request #206 from edsantiago/more-task-map
Simplify the new podman CI skips
2024-07-08 07:22:26 -06:00
Ed Santiago 8204fd5794 Simplify the new podman CI skips
They are now under only_if, not skip. And there's really no need
for individual names, just say "SKIP if not needed"

Also, add handling for 'skip CI=CI', currently used in minikube

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-07-08 07:11:37 -06:00
Chris Evich d0474a3847
Merge pull request #205 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.3.4
2024-07-05 15:23:31 -04:00
renovate[bot] 14fd648920
[skip-ci] Update actions/upload-artifact action to v4.3.4
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-07-05 17:14:46 +00:00
Ed Santiago 420ed9a467
Merge pull request #203 from edsantiago/automation-images
cirrus-task-map: tweaks for automation_images CI
2024-07-02 11:05:32 -06:00
Ed Santiago dc21cdf863 cirrus-task-map: add skips/only-ifs for automation_images
Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-07-02 10:55:32 -06:00
Ed Santiago b813ad7981 ImageMagick v7 deprecates "convert" command
Use "magick" instead, with a little shuffling of args

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-07-02 10:55:32 -06:00
Ed Santiago 415e21b68b
Merge pull request #202 from edsantiago/sort-by-type
cirrus-task-map: uptodateize
2024-07-01 06:19:31 -06:00
Ed Santiago 8b9ae348a0 handle the new 2024-06-18 CI skips
Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-06-27 12:57:44 -06:00
Ed Santiago 663cb85121 task-map: sort jobs by task type
Now that it's just one huge parallel blob, change our sorting
so we cluster all the int/sys/machine tests together.

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-06-27 12:57:44 -06:00
Chris Evich 9c771bf862
Merge pull request #201 from cevich/doc_golang_ind_vuln_config
Unconfigure golang indirect vulnerability support
2024-06-25 11:09:02 -04:00
Chris Evich 13aaf6100f
Unconfigure golang indirect vulnerability support
Discovered by log analysis, Renovate will initially setup a vulnerable
golang indirect dep for immediate PR creation.  However, later on in
its run, PR creation will be disabled by the global indirect-golang
default setting (disabled).  Extensive review of `packageRules`
configuration shows no way to filter based on vulnerability status.
This would be the only conceivable way to override the default.

Fix this by replacing the misleading/useless config. section with a
comment block indicating that indirect golang vulnerabilities must be
handled by hand.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-06-25 10:55:39 -04:00
Chris Evich 46d69a3969
Merge pull request #200 from cevich/add_mac_temp_docs
Add Mac PW Pool Launch Template docs
2024-06-07 11:01:05 -04:00
Chris Evich 081b9c3be5
Bump build-push test CI VM image
CentOS-stream 8 is EOL.

Also, use the latest buildah container image and update a build-push
test to cope with some minor behavior changes.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-06-07 10:52:54 -04:00
Chris Evich e4e0cdbd51
Add Mac PW Pool Launch Template docs
Signed-off-by: Chris Evich <cevich@redhat.com>
2024-06-07 10:52:54 -04:00
Chris Evich ae7f68a9ac
Merge pull request #199 from cevich/fewer_jobs
PW Pool: Reduce task-to-task corruption risk
2024-06-03 11:02:46 -04:00
Chris Evich 836d5a7487
PW Pool: Reduce task-to-task corruption risk
Previously instances would shutdown and auto-terminate if the
controlling VM's `SetupInstances.sh` examined the remote worker
log and found >= `PW_MAX_TASKS` logged.  However after examining the
production `Cron.log`, it was found that nowhere near this number of
tasks is actually running during `PW_MAX_HOURS`.  Cut the value in
half to lower the risk of one/more tasks corrupting processes or the
filesystem for other tasks.

Note: Eyeball average tasks before timed auto-shutdown was about 7

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-06-03 09:47:39 -04:00
Chris Evich 02d3c0a99c
Merge pull request #198 from cevich/more_mac_packages
Mac PW Pool: Install packages needed for skopeo CI
2024-05-31 09:55:21 -04:00
Chris Evich f750079c85
Mac PW Pool: Install packages needed for skopeo CI
Signed-off-by: Chris Evich <cevich@redhat.com>
2024-05-30 14:23:44 -04:00
Chris Evich 0eb6675f13
Merge pull request #197 from cevich/restrict_mac_sw_install
Mac PW Pool: Restrict software installation/updates
2024-05-29 12:55:46 -04:00
Chris Evich 3a39b5cafc
Mac PW Pool: Restrict software installation/updates
For whatever reason, non-admin users are permitted to install and update
software on Macs by default.  This is highly undesirable in a CI
environment, and especially so in one where the underlying resources are
shared across testing contexts.  Block this by altering system settings
to require admin access.

Further through experimentation, it was found that rosetta (allows arm64
macs to run x86_64 code) ignores the admin-required settings.  To give
pause to any users trying to run `softwareupdate`, move it out of general
reach.  This isn't a perfect solution, but should at least discourage all
simple usage.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-05-29 11:46:50 -04:00
Chris Evich 8a0e087c4b
Update Mac PW Pool docs
Specifically, detail the manual testing steps.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-05-29 11:46:50 -04:00
Chris Evich c910e69c12
Merge pull request #196 from cevich/install_rosetta
Mac PW Pool: Install rosetta
2024-05-22 10:00:49 -04:00
Chris Evich 37e71d45af
Mac PW Pool: Install rosetta
Podman machine testing needs rosetta to confirm running x86_64 binaries.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-05-21 16:40:02 -04:00
Chris Evich 9a8a1a2413
Merge pull request #195 from cevich/ignore_go_toolchain_updates
Never update golang toolchain
2024-04-30 12:06:59 -04:00
Chris Evich 2e805276bb
Never update golang toolchain
Fixes: #193

Despite restrictions on `go` directive updates by Renovate, it was still
proposing updates to the `toolchain` directive.  In order to maintain
consistency across all projects, this value needs to be managed
manually.  Detect when Renovate is trying to update it and shut it down.

Ref: Upstream https://github.com/renovatebot/renovate PR 28476

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-04-29 16:02:40 -04:00
Chris Evich 5d234f1e4a
Merge pull request #192 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.3.3
2024-04-23 15:27:26 -04:00
Chris Evich badedd4968
Merge pull request #194 from cevich/fix_egrep
Minor: egrep fixes + more debugging
2024-04-23 10:36:37 -04:00
Chris Evich 2cdb0b15ee
Minor: More debugging
For some reason, it seems to still be possible for `get_manifest_tags()`
to return non-zero despite `result_json` being an empty list.  Add some
more debugging to the function to help figure out why.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-04-23 10:21:14 -04:00
Chris Evich f27c7ae6d9
Minor: Fix use of egrep + some shellcheck findings
Signed-off-by: Chris Evich <cevich@redhat.com>
2024-04-23 09:52:42 -04:00
Chris Evich d7a884b8cf
Merge pull request #191 from cevich/warn_empty
Fix build-push failing on empty push list
2024-04-22 14:00:32 -04:00
Chris Evich 9336e20516
Resolve build-push test TODO
The mentioned bug has long-since been fixed.  This test should pass
despite there being no images present after mod-command runs.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-04-22 13:46:47 -04:00
Chris Evich 7feb7435c2
Fix build-push failing on empty push list
Prior to https://github.com/containers/image_build/pull 23 the
automation using `build-push.sh` always pushed its images.  This
obscured a bug that occurs when `fqin_names` is an empty string in
`get_manifest_tags()`.  In this case, the `grep` command will exit
non-zero, causing `push_images()` to:

```
die "Error retrieving set of manifest-list tags to push for '$FQIN'"
```

Fix this by adding an empty-string check and removing the unnecessary
`grep`.  Also, `push_images()` change `die "No FQIN(s) to be pushed."`
into a warning, since the condition should not be considered fatal.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-04-22 13:46:42 -04:00
Chris Evich 478b8d9d30
Minor: Fix build-push shellcheck findings
Signed-off-by: Chris Evich <cevich@redhat.com>
2024-04-22 13:46:42 -04:00
renovate[bot] 1bd2fbdfe3
[skip-ci] Update actions/upload-artifact action to v4.3.3
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-04-22 17:39:50 +00:00
Chris Evich d061d8061e
Merge pull request #190 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.3.2
2024-04-19 12:16:33 -04:00
renovate[bot] 13f6c9fb53
[skip-ci] Update actions/upload-artifact action to v4.3.2
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-04-18 16:41:32 +00:00
Chris Evich af1016e668
Merge pull request #189 from cevich/golang121
Bump golang to version 1.21
2024-04-17 14:02:08 -04:00
Chris Evich 74f8447d45
Bump golang to version 1.21
Lots of module updates are arriving which require this version, unblock
all repos that depend on it.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-04-17 13:38:25 -04:00
Chris Evich 3bf3cfd233
Merge pull request #188 from cevich/kill_inaccessable_instances
Fix unmanaged crashed/inaccessible worker
2024-04-05 11:17:56 -04:00
Chris Evich 428f06ed36
Fix unmanaged crashed/inaccessible worker
If a worker instances is inaccessible for an extended period of time,
it's a sign it may have crashed or been compromised in some way.
Previously, due to the order of status checks this condition would not
be noticed for multiple days.  Fix this by relocating the `PW_MAX_HOURS`
check to the beginning of the worker-loop.  This will force-terminate
any timed-out instances regardless of all other status checks.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-04-05 10:59:23 -04:00
Chris Evich b9ce71232f
Merge pull request #187 from cevich/constrain_go
Add big-fat-warning re: golang 1.21+ toolchain
2024-03-15 13:08:27 -04:00
Chris Evich 36c2bc68e9
Add big-fat-warning re: golang 1.21+ toolchain
Signed-off-by: Chris Evich <cevich@redhat.com>
2024-03-15 09:22:50 -04:00
Chris Evich df5c5e90ac
Update to the github hosted container image
This prevents running into docker-hub rate limits

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-03-15 09:22:49 -04:00
Chris Evich 11026c20a3
Renovate config reformat/cleanup
Updating to the latest config. linter reformats the entire config file.
Incorporate the new format, with some minor adjustments to comments.
No settings are actually changed here.  It's all cosmetic and
formatting.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-03-14 12:27:58 -04:00
Chris Evich 1f2ccedbfd
Merge pull request #186 from cevich/simplify_updates
Simplify pool maintenance script updates
2024-02-27 13:28:23 -05:00
Chris Evich 2c1a0c6c4c
Merge pull request #183 from cevich/docs_update
[skip-ci] Mac PW Pool script docs update
2024-02-27 13:27:36 -05:00
Chris Evich fb6ba4a224
Simplify pool maintenance script updates
Previously an unnecessarily complex mechanism was used to automatically
update the code on the Mac PW Pool maintenance VM.  Simplify this to a
short fixed time interval to improve reliability.  Also fix a minor bug
where the web container restarted attached rather than detached.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-02-27 13:14:02 -05:00
Chris Evich f12157050c
Merge pull request #184 from edsantiago/taskmap-shortcuts
cirrus-task-map: more shortcuts
2024-02-27 13:06:35 -05:00
Chris Evich 4353f8c5b1
Merge pull request #185 from cevich/stop_disk_indexing
Mac PW Pool: Stop indexing local disks
2024-02-21 13:25:35 -05:00
Chris Evich 86ddf63ac5
Mac PW Pool: Stop indexing local disks
There's no point of this operation on a CI machine, and it creates
non-deletable files for every user on the system.  Stop it for all
volumes, ignoring any failures.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-02-21 11:59:56 -05:00
Ed Santiago 948206e893 cirrus-task-map: more shortcuts
For handling recent (Feb 2024) changes to .cirrus.yml

Signed-off-by: Ed Santiago <santiago@redhat.com>
2024-02-19 08:28:13 -07:00
Chris Evich c0112c254c
Merge pull request #175 from cevich/stop_truncating_stdio
[5.0.0] Fix truncating stdio magic devices
2024-02-12 11:47:19 -05:00
Chris Evich 86660f745e
[5.0.0] Fix truncating stdio magic devices
Redirecting to `/dev/stderr` or `/dev/stdout` can have a normally
unintended side-effect when the caller wishes to send either of those
elsewhere (like an actual file).  Namely, it will truncate the file
before writing.  This is almost never the expected behavior.  Update all
redirects to magic devices to append instead.

N/B: These scripts are used far and wide.  On the off-chance some
downstream caller has previously depended on this side-effect, I'm
marking this commit as 'breaking' accordingly.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-02-12 10:49:20 -05:00
Chris Evich 679575c7d1
Ignore deprecation warnings while running tests
Signed-off-by: Chris Evich <cevich@redhat.com>
2024-02-12 10:49:07 -05:00
Chris Evich 0e328d6db5
Merge pull request #182 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.3.1
2024-02-06 09:36:21 -05:00
Chris Evich 71ede1b334
Mac PW Pool script docs update
Signed-off-by: Chris Evich <cevich@redhat.com>
2024-02-06 09:34:47 -05:00
renovate[bot] 1f5d6b5691
[skip-ci] Update actions/upload-artifact action to v4.3.1
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-02-05 22:34:26 +00:00
Chris Evich f425d902df
Merge pull request #181 from cevich/log_maintenance
Synchronize maintenance script changes
2024-02-01 13:53:26 -05:00
Chris Evich d4f5d65014
Synchronize maintenance script changes
Previously, the automation repo was updated by a cron job w/o regard to
possibly, currently executing scripts.  This is bad.  Fix the situation
by only updating the repo. while holding a `Cron.sh` lock, taking care
to restart the graph-presenting webserver container as required.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-02-01 12:27:07 -05:00
Chris Evich 0a0d617ee9
Merge pull request #180 from cevich/fix_podman_cmd
Minor: Update example crontab
2024-01-30 12:17:47 -05:00
Chris Evich 420d72a42e
Minor: Update example crontab
Also relocate usage-graph web container and logfile maintenance to
a dedicated script + crontab entry.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-01-30 12:07:37 -05:00
Chris Evich 907e840d64
Merge pull request #177 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.3.0
2024-01-24 14:50:53 -05:00
Chris Evich a19393dd92
Merge pull request #179 from cevich/fix_timebomb_test
Fix timebomb test using wrong basis
2024-01-24 13:15:26 -05:00
Chris Evich 72ed4a5532
Fix timebomb test using wrong basis
The "timebomb() function ignores TZ envar and forces UTC" test started
failing (triggering the bomb unintentionally).  Fixed by forcing the
in-line date-calculation to be based on UTC (which the test was
assuming previously).  Also updated the subsequent test similarly, for
consistency.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-01-24 13:04:29 -05:00
renovate[bot] 99a94ca880
[skip-ci] Update actions/upload-artifact action to v4.3.0
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-01-23 19:42:35 +00:00
Chris Evich 25651a0a31
Merge pull request #174 from cevich/timebomb
Add common timebomb function to mark workarounds
2024-01-23 12:02:59 -05:00
Chris Evich 47cf77670e
Add common timebomb function to mark workarounds
Because otherwise, as the saying goes:
    "There's nothing more permanent than temporary"

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-01-23 09:40:43 -05:00
Chris Evich 7ce27001a4
Merge pull request #176 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.2.0
2024-01-22 10:16:36 -05:00
renovate[bot] d4314cc954
[skip-ci] Update actions/upload-artifact action to v4.2.0
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-01-18 21:58:23 +00:00
Chris Evich 92ed5911d6
Resolve a bunch of shellcheck findings
Signed-off-by: Chris Evich <cevich@redhat.com>
2024-01-16 15:43:40 -05:00
Chris Evich 93455e8a08
Fix script failure
Error: `line 0: Cannot load input from 'Utilization.gnuplot'`

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-01-16 12:49:22 -05:00
Chris Evich 778e26b27c
Merge pull request #173 from cevich/webplot
Output web page with utilization graph
2024-01-16 11:56:51 -05:00
Chris Evich 3cd711bba5
Output web page with utilization graph
This makes it easy to serve a simple website with the
graph, so more than one person may observe easily.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-01-16 11:26:43 -05:00
Chris Evich 75c0f0bb47
Increase build-push test timeout
Network slowdowns can make package installs run slowly.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-01-16 11:26:43 -05:00
Chris Evich 22a0e4db8f
Merge pull request #172 from cevich/use_local_disk
Create, mount, and use local storage
2024-01-15 10:03:06 -05:00
Chris Evich 22fcddc3c2
Create, mount, and use local storage
Podman machine testing is very much storage-bound in terms of
performance.  The stock AWS setup uses networked storage for the system,
and a small local disk for `/tmp`.  However there is plenty of empty
space available on the local disk, and it's *MUCH* faster than network
storage.  Use this disk as the worker-user's home directory (where tests
run from).

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-01-15 09:54:15 -05:00
Chris Evich dfdb3ffd29
Merge pull request #171 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4.1.0
2024-01-12 16:02:55 -05:00
renovate[bot] 2441295d69
[skip-ci] Update actions/upload-artifact action to v4.1.0
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2024-01-12 18:01:25 +00:00
Chris Evich d74cf63fb4
Merge pull request #168 from cevich/simplify_pool_management
Improve/overhaul pool management/monitoring scripts
2024-01-11 14:20:51 -05:00
Chris Evich b182b9ba96
Resolve worker-testing TODO
This will allow executing tasks against the workers-under-test.

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-01-11 14:11:35 -05:00
Chris Evich a5b7947fed
Improve/overhaul pool management/monitoring scripts
The initial implementation was rushed into production with a minimum of
required features, and necessary amount of slop and bugs.  Attend to
a litany of needed improvements and enhancements:

* Add tracking of both started and completed tasks.
* Update utilization CSV entry addition to include tasks-ended
  (`taskf`).
* Update instance-ssh helper to support specifying by name or ID
* Fix multiple instance-ssh helper executions clashing over VNC port
  forwards.
* Update many comments
* Fix handling of case where no dedicated hosts or instances are found.
* Relocate `CREATE_STAGGER_HOURS` to `pw_lib.sh` and lower value to 2.
  This value should not include a margin representing boot/setup time.
  Also a lower value will allow for faster automated pool recovery
  should the entire thing collapse for some reason.
* Support dividing/managing a subset of all dedicated hosts and
  instances via a required tag and value.  This allows for easier
  testing of script changes w/o affecting the in-use (production) pool.
* Add check to confirm host name always matches instance name - in case
  a human screws this up.  Many/most of these management scripts
  otherwise assume the two name-tags always match.
* Update documentation for initializing a new set of dedicated hosts and
  instances.
* Forcibly terminate instances when certain exceptionally "bad" conditions
  are detected.  i.e. those which may signal a security breach or other
  issue the scripts will never be able to cope with.
* Add support for yanking an instance out of service by changing it's
  `PWPoolReady` tag.  Allow re-adding instance when set `true` again.
* Reimplement max instance lifetime check.
* Implement a check on maximum completed tasks per instance.
* Stop outputting normal-status lines when examining instances.  Keep
  output to the bare minimum, unless there is some fault condition.
* Move the scheduled instance shutdown timer from the setup script into
  the instance maintenance script.  Add a check to confirm the sleep +
  shutdown process is running.
* Check and enforce a maximum amount of time `setup.sh` is allowed to
  run.
* Greatly simplify pool-listener service script.
* Simplify instance `setup.sh` script.
* Update utilization GNUplot command file to obtain the number
  of active workers from `dh_status`.  Extend the timespan of
  the graph.  Plot worker utilization as a percentage based on
  number of running tasks (instead of the total completed).

Signed-off-by: Chris Evich <cevich@redhat.com>
2024-01-11 14:11:35 -05:00
Chris Evich cac7b02d4f
Merge pull request #170 from containers/renovate/actions-upload-artifact-4.x
[skip-ci] Update actions/upload-artifact action to v4
2023-12-14 13:24:09 -05:00
renovate[bot] 4f066e397d
[skip-ci] Update actions/upload-artifact action to v4
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2023-12-14 17:09:24 +00:00
Chris Evich f7a85f3a80
Merge pull request #169 from cevich/service_pool_fix
Fix two pool service script failure-modes
2023-12-14 12:09:10 -05:00
Chris Evich 646016818c
Fix two pool service script failure-modes
Fix typo in calculating sleep seconds.  Remove mode `e` from script, so
any failing command (i.e. a pgrep) doesn't cause the script to exit.
Also redirect null input into shutdown command, since it can behaves
oddly otherwise.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-12-14 12:01:21 -05:00
Chris Evich 851d152282
Merge pull request #167 from cevich/ignore_released
Properly handle 'released' DH status
2023-12-08 10:12:24 -05:00
Chris Evich 9a08aa2aed
Properly handle 'released' DH status
This is set when somebody removes a slot.  There's no current way for
that to ever happen except for human-action.  Try not to freak an
observer out by presenting it as a failure of some sort.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-12-08 09:57:09 -05:00
Chris Evich 61556ac3e9
Merge pull request #166 from cevich/fix_sleep
Fix sleep typo + reduce times
2023-12-07 14:09:57 -05:00
Chris Evich e8b260f41d
Fix sleep typo + reduce times
The darwin version of sleep doesn't support any suffix, and breaks if
you use one.  Fix the script and adjust the timings so the loop runs
quicker.

This has been tested on the currently in-use pool.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-12-07 13:54:47 -05:00
Chris Evich 8d8e12b3dd
Merge pull request #165 from cevich/further_limit_dh_by_tag
Allow dividing DH pool based on tag name/value
2023-12-05 11:30:38 -05:00
Chris Evich a9eb5b1f12
Allow dividing DH pool based on tag name/value
With an active and in-use dedicated host pool, it's very hard to test
changes to management scripts.  Add support for filtering the list of
DH to operate on, based on a defined tag name and value.  This way,
inactive DH can be manually re-tagged (temporarily) to allow testing
script changes against them.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-12-05 11:00:55 -05:00
Chris Evich 20df1f7904
Merge pull request #164 from cevich/minor_fixes
A Collection of minor fixes
2023-12-01 11:16:26 -05:00
Chris Evich 111991e6eb
Fix pkill permission-denied failure
Signed-off-by: Chris Evich <cevich@redhat.com>
2023-12-01 10:51:08 -05:00
Chris Evich 67c74ffe7c
Remove unnecessary/dangerous -u option
Signed-off-by: Chris Evich <cevich@redhat.com>
2023-12-01 10:51:07 -05:00
Chris Evich 8b968401af
Fix a handful of shellcheck complaints
Signed-off-by: Chris Evich <cevich@redhat.com>
2023-12-01 10:51:07 -05:00
Chris Evich e368472ce7
Enable remote VNC access to mac instances
There are some mac tools that can ONLY be used on the GUI.  Setting this
up requires some specialized manual work.  Make this a bit easier by
removing a required step (i.e. ssh forwarding).

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-12-01 09:49:11 -05:00
Chris Evich 93962e6cf1
Merge pull request #163 from cevich/add_mac_management_goodies
Add mac management goodies
2023-11-29 15:11:13 -05:00
Chris Evich 32554b55cd
Add GNUPlot command file
Simply displays an auto-refreshing graph showing alive pool workers
divided by the total number of CI tasks run.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-11-29 14:38:50 -05:00
Chris Evich 90da395f0a
Add example pool management cron script
Also update docs regarding its use.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-11-29 14:38:50 -05:00
Chris Evich 2aea32e1a4
Merge pull request #162 from cevich/log_exp_time
Better logging of worker expiration
2023-11-29 11:44:06 -05:00
Chris Evich 3e8e4726f6
Better logging of worker expiration
It's helpful for operators to be aware of the expiration-time for
workers.  Ensure this, along with any other `service_pool.sh` messages
are logged.  Extract and display the logged expiration notice,
or a warning if missing.  The constant log-grep is secondarily
useful as indication of worker log-file manipulation.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-11-29 11:21:36 -05:00
Chris Evich cc10ff405a
Merge pull request #160 from cevich/force_pool
Workaround lengthy startup of many instances
2023-11-21 11:19:27 -05:00
Chris Evich 77f63d7765
Workaround lengthy startup of many instances
When a pool is empty of instances, the launch-stagger mechanism can
introduce a substantial delay to achieving a full-pool of active
workers.  This will negatively impact service availability and worker
utilization - likely resulting in CI tasks queuing.

Add a simple workaround for this condition with the addition of a
`--force` option.  When used, it will force instance creation on
all available dedicated hosts.  Similarly it will also force instance
setup, though with an extended shutdown delay timer.

Update documentation regarding this operational mode and it's purpose.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-11-20 12:17:19 -05:00
Chris Evich 71622bfde6
Merge pull request #159 from cevich/mac_pw_pool_adjustments
PW pool management script adjustments
2023-11-20 10:58:37 -05:00
Chris Evich 723fbf1039
Fix last-launch time query failure behavior
If for whatever reason there is a failure in the query or search
for last-launch times, `$latest_launched` could be set to the current
time.  This will ultimately result in no instances being launched.  Fix
this by improved detection of an empty/null launch time in
`${launchtimes[@]}`.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-11-20 10:09:34 -05:00
Chris Evich d1a3503a7f
Minor: Adjust status message
The term "BUSY" implies the dedicated host is doing something else.
This is not the case for staggering launches.  Use a more descriptive
status indicator for this.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-11-17 14:50:27 -05:00
Chris Evich 3a9c2d4675
Fix truncating duplicated & redirected script output
For whatever reason, when a script that duplicates and redirects
stdout/stderr to a log-file calls one of the management scripts, the
log-file is truncated.  Updating output functions to append their output
seems to resolve this issue.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-11-16 16:41:21 -05:00
Chris Evich 7244323cef
Fix several DH management script bugs
Previously it was possible to fail to launch any instances do to bugs
and assumptions in the last-launch-time determination.  Fix this by
actually querying running instances, and searching for the most
recent launch time.  If there are no instances found, print a warning
operators may observe.  Also, fix missing `-t` option to several
readarray() calls.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-11-16 16:32:30 -05:00
Chris Evich d6ec0981eb
Alpha-sort dedicated host state file
Signed-off-by: Chris Evich <cevich@redhat.com>
2023-11-16 16:32:30 -05:00
Chris Evich c5b3a9a9e1
Record status details for each worker
Record the most recent status of all workers in a dedicated file.
Intended for use by humans or other automation.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-11-16 16:32:30 -05:00
Chris Evich 475167d677
Rename state file to better indicate content type
The file relates to dedicated hosts (DH), not persistent-workers (PW).

Also, don't exit non-zero if there is an error-status.  Rely on
consumers of state file to take appropriate action.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-11-16 10:54:52 -05:00
Chris Evich d41b3455df
Merge pull request #158 from cevich/mac_pw_pool
Cirrus-CI persistent worker pool management
2023-11-15 09:39:50 -05:00
Chris Evich aba52cf01f
Cirrus-CI persistent worker pool management
Implement a set of scripts to help with management of a Cirrus-CI
persistent worker pool of M1 Mac instances on AWS EC2.

* Implement script to help monitor a set of M1 Mac dedicated hosts,
  creating new instances as slots become available.

* Implement a script to help monitor M1 Mac instances, deploying
  and executing a setup script on newly created instances.

* Implement a ssh-helper script for humans, to quickly access
  instances based on their EC2 instance ID.

* Implement a setup script intended to run on M1 Macs, to help
  configure and join them to a pre-existing worker pool.

* Implement a helper script intended to run on M1 Macs, to
  support developers with a CI-like environment.

* At this time, all scripts are intended for manual/human-supervised
  use.  Future commits may improve this and/or better support use
  inside automation.

* Add very basic/expedient documentation.

N/B: The majority of this content, including the EC2-side setup has
been developed in a rush.  There are very likely major architecture,
design, and scripting bugs and shortfalls.  Some of these may be
addressed in future commits.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-11-14 13:45:44 -05:00
Chris Evich 6abea9345e
Merge pull request #154 from containers/renovate/actions-checkout-4.x
[skip-ci] Update actions/checkout action to v4
2023-10-20 14:08:40 -04:00
renovate[bot] b42bbe547b
[skip-ci] Update actions/checkout action to v4
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2023-10-19 11:15:11 +00:00
Chris Evich d277f04f02
Merge pull request #157 from cevich/minor_install_timestamp
Minor: Breadcrumb version and UTC timestamp
2023-09-26 16:58:23 -04:00
Chris Evich d4fb87ec3c
Minor: Breadcrumb version and UTC timestamp
Otherwise the timestamp is localized, which may be harder for humans
to relate/translate WRT other time-based items.  For example, Cirrus-CI
and GHA cron specifications.  Also add mention of the just-installed
version to the env. file, also to help with any needed auditing.

Signed-off-by: Chris Evich <cevich@redhat.com>
2023-09-26 11:23:44 -04:00
Chris Evich 6039ae9c96
Merge pull request #155 from containers/renovate/actions-upload-artifact-3.x
[skip-ci] Update actions/upload-artifact action to v3.1.3
2023-09-13 14:23:31 -04:00
renovate[bot] 849ff94def
[skip-ci] Update actions/upload-artifact action to v3.1.3
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2023-09-06 20:17:43 +00:00
49 changed files with 2737 additions and 208 deletions

View File

@ -27,13 +27,11 @@ cirrus-ci/unit-test_task:
cirrus-ci/renovate_validation_task:
only_if: *not_docs
container:
image: docker.io/renovate/renovate:latest
env:
RCV: /usr/local/bin/renovate-config-validator
image: "ghcr.io/renovatebot/renovate:latest"
preset_validate_script:
- $RCV $CIRRUS_WORKING_DIR/renovate/defaults.json5
- renovate-config-validator $CIRRUS_WORKING_DIR/renovate/defaults.json5
repo_validate_script:
- $RCV $CIRRUS_WORKING_DIR/.github/renovate.json5
- renovate-config-validator $CIRRUS_WORKING_DIR/.github/renovate.json5
# This is the same setup as used for Buildah CI
gcp_credentials: ENCRYPTED[fc95bcc9f4506a3b0d05537b53b182e104d4d3979eedbf41cf54205be6397ca0bce0831d0d47580cf578dae5776548a5]
@ -53,10 +51,10 @@ cirrus-ci/build-push_test_task:
# only stock, google-managed generic image. This also avoids needing to
# update custom-image last-used timestamps.
image_project: centos-cloud
image_family: centos-stream-8
timeout_in: 20
image_family: centos-stream-9
timeout_in: 30
env:
CIMG: quay.io/buildah/stable:v1.23.0
CIMG: quay.io/buildah/stable:latest
TEST_FQIN: quay.io/buildah/do_not_use
# Robot account credentials for test-push to
# $TEST_FQIN registry by build-push/test/testbuilds.sh

View File

@ -12,7 +12,7 @@
podman run -it \
-v ./.github/renovate.json5:/usr/src/app/renovate.json5:z \
docker.io/renovate/renovate:latest \
ghcr.io/renovatebot/renovate:latest \
renovate-config-validator
3. Commit.
@ -42,6 +42,4 @@
/*************************************************
*** Repository-specific configuration options ***
*************************************************/
// Don't leave dep. update. PRs "hanging", assign them to people.
"assignees": ["cevich"],
}

View File

@ -20,7 +20,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Clone the repository code
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
persist-credentials: false
path: ./

View File

@ -44,7 +44,7 @@ jobs:
GITHUB_TOKEN: ${{ github.token }}
- name: Clone latest main branch repository code
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
fetch-depth: 1
path: ./main
@ -64,16 +64,14 @@ jobs:
- if: steps.retro.outputs.do_intg == 'true'
id: create_pr_comment
name: Create a status comment in the PR
# Ref: https://github.com/marketplace/actions/comment-action
uses: jungwinter/comment@v1
uses: thollander/actions-comment-pull-request@v3
with:
issue_number: '${{ steps.retro.outputs.prn }}'
type: 'create'
token: '${{ secrets.GITHUB_TOKEN }}'
pr-number: '${{ steps.retro.outputs.prn }}'
comment-tag: retro
# N/B: At the time of this comment, it is not possible to provide
# direct links to specific job-steps (here) nor links to artifact
# files. There are open RFE's for this capability to be added.
body: >-
message: >-
[Cirrus-CI Retrospective Github
Action](https://github.com/${{github.repository}}/actions/runs/${{github.run_id}})
has started. Running against
@ -84,7 +82,7 @@ jobs:
# block allow direct checkout of PR code.
- if: steps.retro.outputs.do_intg == 'true'
name: Clone all repository code
uses: actions/checkout@v3
uses: actions/checkout@v4
with:
# Get ALL available history to avoid problems during any run of
# 'git describe' from any script in the repo.
@ -119,12 +117,11 @@ jobs:
- if: steps.retro.outputs.do_intg == 'true'
id: edit_pr_comment_build
name: Update status comment on PR
uses: jungwinter/comment@v1
uses: thollander/actions-comment-pull-request@v3
with:
type: 'edit'
comment_id: '${{ steps.create_pr_comment.outputs.id }}'
token: '${{ secrets.GITHUB_TOKEN }}'
body: >-
pr-number: '${{ steps.retro.outputs.prn }}'
comment-tag: retro
message: >-
Unit-testing passed (`${{ env.HELPER_LIB_TEST }}`)passed.
[Cirrus-CI Retrospective Github
Action](https://github.com/${{github.repository}}/actions/runs/${{github.run_id}})
@ -135,12 +132,11 @@ jobs:
- if: steps.retro.outputs.do_intg == 'true'
id: edit_pr_comment_exec
name: Update status comment on PR again
uses: jungwinter/comment@v1
uses: thollander/actions-comment-pull-request@v3
with:
type: 'edit'
comment_id: '${{ steps.edit_pr_comment_build.outputs.id }}'
token: '${{ secrets.GITHUB_TOKEN }}'
body: >-
pr-number: '${{ steps.retro.outputs.prn }}'
comment-tag: retro
message: >-
Smoke testing passed [Cirrus-CI Retrospective Github
Action](https://github.com/${{github.repository}}/actions/runs/${{github.run_id}})
is triggering Cirrus-CI ${{ env.ACTION_TASK }} task.
@ -154,12 +150,12 @@ jobs:
run: |
set +x
trap "history -c" EXIT
curl --request POST \
curl --fail-with-body --request POST \
--url https://api.cirrus-ci.com/graphql \
--header "Authorization: Bearer ${{ secrets.CIRRUS_API_TOKEN }}" \
--header 'content-type: application/json' \
--data '{"query":"mutation {\n trigger(input: {taskId: \"${{steps.retro.outputs.tid}}\", clientMutationId: \"${{env.UUID}}\"}) {\n clientMutationId\n task {\n name\n }\n }\n}"}' \
> ./test_artifacts/action_task_trigger.json
| tee ./test_artifacts/action_task_trigger.json
actual=$(jq --raw-output '.data.trigger.clientMutationId' ./test_artifacts/action_task_trigger.json)
echo "Verifying '$UUID' matches returned tracking value '$actual'"
@ -167,12 +163,11 @@ jobs:
- if: steps.retro.outputs.do_intg == 'true'
name: Update comment on workflow success
uses: jungwinter/comment@v1
uses: thollander/actions-comment-pull-request@v3
with:
type: 'edit'
comment_id: '${{ steps.edit_pr_comment_exec.outputs.id }}'
token: '${{ secrets.GITHUB_TOKEN }}'
body: >-
pr-number: '${{ steps.retro.outputs.prn }}'
comment-tag: retro
message: >-
Successfully triggered [${{ env.ACTION_TASK }}
task](https://cirrus-ci.com/task/${{ steps.retro.outputs.tid }}?command=main#L0)
to indicate
@ -183,12 +178,11 @@ jobs:
- if: failure() && steps.retro.outputs.do_intg == 'true'
name: Update comment on workflow failure
uses: jungwinter/comment@v1
uses: thollander/actions-comment-pull-request@v3
with:
type: 'edit'
comment_id: '${{ steps.create_pr_comment.outputs.id }}'
token: '${{ secrets.GITHUB_TOKEN }}'
body: >-
pr-number: '${{ steps.retro.outputs.prn }}'
comment-tag: retro
message: >-
Failure running [Cirrus-CI Retrospective Github
Action](https://github.com/${{github.repository}}/actions/runs/${{github.run_id}})
failed against this PR's
@ -197,24 +191,22 @@ jobs:
# This can happen because of --force push, manual cancel button press, or some other cause.
- if: cancelled() && steps.retro.outputs.do_intg == 'true'
name: Update comment on workflow cancellation
uses: jungwinter/comment@v1
uses: thollander/actions-comment-pull-request@v3
with:
type: 'edit'
comment_id: '${{ steps.create_pr_comment.outputs.id }}'
token: '${{ secrets.GITHUB_TOKEN }}'
body: '[Cancelled](https://github.com/${{github.repository}}/pull/${{steps.retro.outputs.prn}}/commits/${{steps.retro.outputs.sha}})'
pr-number: '${{ steps.retro.outputs.prn }}'
comment-tag: retro
message: '[Cancelled](https://github.com/${{github.repository}}/pull/${{steps.retro.outputs.prn}}/commits/${{steps.retro.outputs.sha}})'
# Abnormal workflow ($ACTION-TASK task already ran / not paused on a PR).
- if: steps.retro.outputs.is_pr == 'true' && steps.retro.outputs.do_intg != 'true'
id: create_error_pr_comment
name: Create an error status comment in the PR
# Ref: https://github.com/marketplace/actions/comment-action
uses: jungwinter/comment@v1
uses: thollander/actions-comment-pull-request@v3
with:
issue_number: '${{ steps.retro.outputs.prn }}'
type: 'create'
token: '${{ secrets.GITHUB_TOKEN }}'
body: >-
pr-number: '${{ steps.retro.outputs.prn }}'
comment-tag: error
message: >-
***ERROR***: [cirrus-ci_retrospective
action](https://github.com/${{github.repository}}/actions/runs/${{github.run_id}})
found `${{ env.ACTION_TASK }}` task with unexpected `${{ steps.retro.outputs.tst }}`
@ -230,7 +222,7 @@ jobs:
# Provide an archive of files for debugging/analysis.
- if: always() && steps.retro.outputs.do_intg == 'true'
name: Archive event, build, and debugging output
uses: actions/upload-artifact@v3.1.2
uses: actions/upload-artifact@v4.6.2
with:
name: pr_${{ steps.retro.outputs.prn }}_debug.zip
path: ./test_artifacts

View File

@ -28,9 +28,9 @@ jobs:
fi
unit-tests: # N/B: Duplicates `ubuntu_unit_tests.yml` - templating not supported
runs-on: ubuntu-22.04
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
# Testing installer requires a full repo. history
fetch-depth: 0
@ -77,7 +77,7 @@ jobs:
tag_name: ${{ steps.get_tag.outputs.TAG_NAME }}
release_name: ${{ steps.get_tag.outputs.TAG_NAME }}
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
fetch-depth: 0
path: ./
@ -102,7 +102,7 @@ jobs:
REPO_USER: libpod
REPO_NAME: cirrus-ci_retrospective
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
fetch-depth: 0
path: ./
@ -145,7 +145,7 @@ jobs:
run: jq --indent 4 --color-output . ${{ github.event_path }}
- if: always()
uses: actions/upload-artifact@v3.1.2
uses: actions/upload-artifact@v4.6.2
name: Archive triggering event JSON
with:
name: event.json.zip

View File

@ -4,9 +4,9 @@ on: [push, pull_request]
jobs:
automation_unit-tests:
runs-on: ubuntu-22.04
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
fetch-depth: 0
persist-credentials: false

View File

@ -80,7 +80,7 @@ if [[ -n "$AUTOMATION_LIB_PATH" ]]; then
else
(
echo "WARNING: It doesn't appear containers/automation common was installed."
) > /dev/stderr
) >> /dev/stderr
fi
...do stuff...

View File

@ -36,7 +36,7 @@ INSTALL_PREFIX="${INSTALL_PREFIX%%/}" # Make debugging path problems easier
# When installing as root, allow sourcing env. vars. from this file
INSTALL_ENV_FILEPATH="${INSTALL_ENV_FILEPATH:-/etc/automation_environment}"
# Used internally here and in unit-testing, do not change without a really, really good reason.
_ARGS="$@"
_ARGS="$*"
_MAGIC_JUJU=${_MAGIC_JUJU:-XXXXX}
_DEFAULT_MAGIC_JUJU=d41d844b68a14ee7b9e6a6bb88385b4d
@ -109,7 +109,8 @@ install_automation() {
fi
# Allow re-installing different versions, clean out old version if found
if [[ -d "$actual_inst_path" ]] && [[ -r "$actual_inst_path/AUTOMATION_VERSION" ]]; then
local installed_version=$(cat "$actual_inst_path/AUTOMATION_VERSION")
local installed_version
installed_version=$(<"$actual_inst_path/AUTOMATION_VERSION")
msg "Warning: Removing existing installed version '$installed_version'"
rm -rvf "$actual_inst_path"
elif [[ -d "$actual_inst_path" ]]; then
@ -125,8 +126,8 @@ install_automation() {
dbg "Configuring environment file $INSTALLATION_SOURCE/environment"
cat <<EOF>"$INSTALLATION_SOURCE/environment"
# Added on $(date --iso-8601=minutes) by $actual_inst_path/bin/$SCRIPT_FILENAME"
# Any manual modifications will be lost upon upgrade or reinstall.
# Added on $(date --utc --iso-8601=minutes) by $actual_inst_path/bin/$SCRIPT_FILENAME"
# for version '$AUTOMATION_VERSION'. Any manual modifications will be lost upon upgrade or reinstall.
export AUTOMATION_LIB_PATH="$actual_inst_path/lib"
export PATH="$PATH:$actual_inst_path/bin"
EOF
@ -217,7 +218,7 @@ check_args() {
msg " Use version '$MAGIC_LOCAL_VERSION' to install from local source."
msg " Use version 'latest' to install from current upstream"
exit 2
elif ! echo "$AUTOMATION_VERSION" | egrep -q "$arg_rx"; then
elif ! echo "$AUTOMATION_VERSION" | grep -E -q "$arg_rx"; then
msg "Error: '$AUTOMATION_VERSION' does not appear to be a valid version number"
exit 4
elif [[ -z "$_ARGS" ]] && [[ "$_MAGIC_JUJU" == "XXXXX" ]]; then
@ -254,6 +255,8 @@ elif [[ "$_MAGIC_JUJU" == "$_DEFAULT_MAGIC_JUJU" ]]; then
CHAIN_TO="$INSTALLATION_SOURCE/$arg/.install.sh"
if [[ -r "$CHAIN_TO" ]]; then
# Cannot assume common was installed system-wide
# AUTOMATION_LIB_PATH defined by anchors.sh
# shellcheck disable=SC2154
env AUTOMATION_LIB_PATH=$AUTOMATION_LIB_PATH \
AUTOMATION_VERSION=$AUTOMATION_VERSION \
INSTALLATION_SOURCE=$INSTALLATION_SOURCE \

View File

@ -20,10 +20,10 @@ runner_script_filename="$(basename $0)"
for test_subdir in $(find "$(realpath $(dirname $0)/../)" -type d -name test | sort -r); do
test_runner_filepath="$test_subdir/$runner_script_filename"
if [[ -x "$test_runner_filepath" ]] && [[ "$test_runner_filepath" != "$this_script_filepath" ]]; then
echo -e "\nExecuting $test_runner_filepath..." > /dev/stderr
echo -e "\nExecuting $test_runner_filepath..." >> /dev/stderr
$test_runner_filepath
else
echo -e "\nWARNING: Skipping $test_runner_filepath" > /dev/stderr
echo -e "\nWARNING: Skipping $test_runner_filepath" >> /dev/stderr
fi
done

View File

@ -22,7 +22,7 @@ if [[ ! -r "$AUTOMATION_LIB_PATH/common_lib.sh" ]]; then
echo "ERROR: Expecting \$AUTOMATION_LIB_PATH to contain the installation"
echo " directory path for the common automation tooling."
echo " Please refer to the README.md for installation instructions."
) > /dev/stderr
) >> /dev/stderr
exit 2 # Verified by tests
fi
@ -228,7 +228,8 @@ parse_args() {
dbg "Grabbing Context parameter: '$arg'."
CONTEXT=$(realpath -e -P $arg || die_help "$E_CONTEXT '$arg'")
else
# Properly handle any embedded special characters
# Hack: Allow array addition to handle any embedded special characters
# shellcheck disable=SC2207
BUILD_ARGS+=($(printf "%q" "$arg"))
fi
;;
@ -290,12 +291,12 @@ stage_notice() {
# N/B: It would be nice/helpful to resolve any env. vars. in '$@'
# for display. Unfortunately this is hard to do safely
# with (e.g.) eval echo "$@" :(
msg="$@"
msg="$*"
(
echo "############################################################"
echo "$msg"
echo "############################################################"
) > /dev/stderr
) >> /dev/stderr
}
BUILTIID="" # populated with the image-id on successful build
@ -322,7 +323,7 @@ parallel_build() {
# Keep user-specified BUILD_ARGS near the beginning so errors are easy to spot
# Provide a copy of the output in case something goes wrong in a complex build
stage_notice "Executing build command: '$RUNTIME build ${BUILD_ARGS[@]} ${_args[@]}'"
stage_notice "Executing build command: '$RUNTIME build ${BUILD_ARGS[*]} ${_args[*]}'"
"$RUNTIME" build "${BUILD_ARGS[@]}" "${_args[@]}"
}
@ -378,6 +379,8 @@ run_prepmod_cmd() {
local kind="$1"
shift
dbg "Exporting variables '$_CMD_ENV'"
# The indirect export is intentional here
# shellcheck disable=SC2163
export $_CMD_ENV
stage_notice "Executing $kind-command: " "$@"
bash -c "$@"
@ -402,14 +405,19 @@ get_manifest_tags() {
fi
dbg "Image listing json: $result_json"
if [[ -n "$result_json" ]]; then
if [[ -n "$result_json" ]]; then # N/B: value could be '[]'
# Rely on the caller to handle an empty list, ignore items missing a name key.
if ! fqin_names=$(jq -r '.[]? | .names[]?'<<<"$result_json"); then
die "Error obtaining image names from '$FQIN' manifest-list search result:
$result_json"
fi
grep "$FQIN"<<<"$fqin_names" | sort
dbg "Sorting fqin_names"
# Don't emit an empty newline when the list is empty
[[ -z "$fqin_names" ]] || \
sort <<<"$fqin_names"
fi
dbg "get_manifest_tags() returning successfully"
}
push_images() {
@ -420,10 +428,10 @@ push_images() {
# It's possible that --modcmd=* removed all images, make sure
# this is known to the caller.
if ! fqin_list=$(get_manifest_tags); then
die "Error retrieving set of manifest-list tags to push for '$FQIN'"
die "Retrieving set of manifest-list tags to push for '$FQIN'"
fi
if [[ -z "$fqin_list" ]]; then
die "No FQIN(s) to be pushed."
warn "No FQIN(s) to be pushed."
fi
if ((PUSH)); then
@ -446,7 +454,7 @@ push_images() {
# Handle requested help first before anything else
if grep -q -- '--help' <<<"$@"; then
echo "$E_USAGE" > /dev/stdout # allow grep'ing
echo "$E_USAGE" >> /dev/stdout # allow grep'ing
exit 0
fi

View File

@ -38,6 +38,6 @@ elif [[ "$1" == "info" ]]; then
elif [[ "$1" == "images" ]]; then
echo '[{"names":["localhost/foo/bar:latest"]}]'
else
echo "ERROR: Unexpected arg '$1' to fake_buildah.sh" > /dev/stderr
echo "ERROR: Unexpected arg '$1' to fake_buildah.sh" >> /dev/stderr
exit 9
fi

View File

@ -4,22 +4,16 @@
set -eo pipefail
# shellcheck disable=SC2154
if [[ "$CIRRUS_CI" == "true" ]]; then
# Cirrus-CI is setup (see .cirrus.yml) to run tests on CentOS
# for simplicity, but it has no native qemu-user-static. For
# the benefit of CI testing, cheat and use whatever random
# emulators are included in the container image.
# Workaround silly stupid hub rate-limiting
cat >> /etc/containers/registries.conf << EOF
[[registry]]
prefix="docker.io/library"
location="mirror.gcr.io"
EOF
# N/B: THIS IS NOT SAFE FOR PRODUCTION USE!!!!!
podman run --rm --privileged \
docker.io/multiarch/qemu-user-static:latest \
mirror.gcr.io/multiarch/qemu-user-static:latest \
--reset -p yes
elif [[ -x "/usr/bin/qemu-aarch64-static" ]]; then
# TODO: Better way to determine if kernel already setup?

View File

@ -4,7 +4,7 @@
# Any/all other usage is virtually guaranteed to fail and/or cause
# harm to the system.
for varname in RUNTIME TEST_FQIN BUILDAH_USERNAME BUILDAH_PASSWORD; do
for varname in RUNTIME SUBJ_FILEPATH TEST_CONTEXT TEST_SOURCE_DIRPATH TEST_FQIN BUILDAH_USERNAME BUILDAH_PASSWORD; do
value=${!varname}
if [[ -z "$value" ]]; then
echo "ERROR: Required \$$varname variable is unset/empty."
@ -13,6 +13,8 @@ for varname in RUNTIME TEST_FQIN BUILDAH_USERNAME BUILDAH_PASSWORD; do
done
unset value
# RUNTIME is defined by caller
# shellcheck disable=SC2154
$RUNTIME --version
test_cmd "Confirm $(basename $RUNTIME) is available" \
0 "buildah version .+" \
@ -23,7 +25,9 @@ test_cmd "Confirm skopeo is available" \
0 "skopeo version .+" \
skopeo --version
PREPCMD='echo "SpecialErrorMessage:$REGSERVER" > /dev/stderr && exit 42'
PREPCMD='echo "SpecialErrorMessage:$REGSERVER" >> /dev/stderr && exit 42'
# SUBJ_FILEPATH and TEST_CONTEXT are defined by caller
# shellcheck disable=SC2154
test_cmd "Confirm error output and exit(42) from --prepcmd" \
42 "SpecialErrorMessage:localhost" \
bash -c "$SUBJ_FILEPATH --nopush localhost/foo/bar $TEST_CONTEXT --prepcmd='$PREPCMD' 2>&1"
@ -53,7 +57,7 @@ test_cmd "Confirm manifest-list can be removed by name" \
$RUNTIME manifest rm containers-storage:localhost/foo/bar:latest
test_cmd "Verify expected partial failure when passing bogus architectures" \
125 "error creating build.+architecture staple" \
125 "no image found in image index for architecture" \
bash -c "A_DEBUG=1 $SUBJ_FILEPATH --arches=correct,horse,battery,staple localhost/foo/bar --nopush $TEST_CONTEXT 2>&1"
MODCMD='$RUNTIME tag $FQIN:latest $FQIN:9.8.7-testing'
@ -86,15 +90,12 @@ test_cmd "Verify tagged manifest image digest matches the same in latest" \
MODCMD='
set -x;
$RUNTIME images && \
$RUNTIME manifest rm containers-storage:$FQIN:latest && \
$RUNTIME manifest rm containers-storage:$FQIN:9.8.7-testing && \
$RUNTIME manifest rm $FQIN:latest && \
$RUNTIME manifest rm $FQIN:9.8.7-testing && \
echo "AllGone";
'
# TODO: Test fails due to: https://github.com/containers/buildah/issues/3490
# for now pretend it should exit(125) which will be caught when bug is fixed
# - causing it to exit(0) as it should
test_cmd "Verify --modcmd can execute a long string with substitutions" \
125 "AllGone" \
test_cmd "Verify --modcmd can execute command string that removes all tags" \
0 "AllGone.*No FQIN.+to be pushed" \
bash -c "A_DEBUG=1 $SUBJ_FILEPATH --modcmd='$MODCMD' localhost/foo/bar --nopush $TEST_CONTEXT 2>&1"
test_cmd "Verify previous --modcmd removed the 'latest' tagged image" \
@ -109,6 +110,8 @@ FAKE_VERSION=$RANDOM
MODCMD="set -ex;
\$RUNTIME tag \$FQIN:latest \$FQIN:$FAKE_VERSION;
\$RUNTIME manifest rm \$FQIN:latest;"
# TEST_FQIN and TEST_SOURCE_DIRPATH defined by caller
# shellcheck disable=SC2154
test_cmd "Verify e2e workflow w/ additional build-args" \
0 "Pushing $TEST_FQIN:$FAKE_VERSION" \
bash -c "env A_DEBUG=1 $SUBJ_FILEPATH \
@ -121,7 +124,7 @@ test_cmd "Verify e2e workflow w/ additional build-args" \
2>&1"
test_cmd "Verify latest tagged image was not pushed" \
1 "(Tag latest was deleted or has expired.)|(manifest unknown: manifest unknown)" \
2 'reading manifest latest in quay\.io/buildah/do_not_use: manifest unknown' \
skopeo inspect docker://$TEST_FQIN:latest
test_cmd "Verify architectures can be obtained from manifest list" \
@ -132,7 +135,7 @@ test_cmd "Verify architectures can be obtained from manifest list" \
for arch in amd64 s390x arm64 ppc64le; do
test_cmd "Verify $arch architecture present in $TEST_FQIN:$FAKE_VERSION" \
0 "" \
fgrep -qx "$arch" $TEST_TEMP/maniarches
grep -Fqx "$arch" $TEST_TEMP/maniarches
done
test_cmd "Verify pushed image can be removed" \

View File

@ -0,0 +1,27 @@
# Podman First-Time Contributor Certificate Generator
This directory contains a simple web-based certificate generator to celebrate first-time contributors to the Podman project.
## Files
- **`certificate_generator.html`** - Interactive web interface for creating certificates
- **`certificate_template.html`** - The certificate template used for generation
- **`first_pr.png`** - Podman logo/branding image used in certificates
## Usage
1. Open `certificate_generator.html` in a web browser
2. Fill in the contributor's details:
- Name
- Pull Request number
- Date (defaults to current date)
3. Preview the certificate in real-time
4. Click "Download Certificate" to save as HTML
## Purpose
These certificates are designed to recognize and celebrate community members who make their first contribution to the Podman project. The certificates feature Podman branding and can be customized for each contributor.
## Contributing
Feel free to improve the design, add features, or suggest enhancements to make the certificate generator even better for recognizing our amazing contributors!

View File

@ -0,0 +1,277 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Podman Certificate Generator</title>
<style>
@import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;600&display=swap');
@import url('https://fonts.googleapis.com/css2?family=Merriweather:wght@400;700;900&display=swap');
body {
font-family: 'Inter', sans-serif;
background-color: #f0f2f5;
margin: 0;
padding: 2rem;
}
.container {
display: grid;
grid-template-columns: 380px 1fr;
gap: 2rem;
max-width: 1600px;
margin: auto;
}
.form-panel {
background-color: white;
padding: 2rem;
border-radius: 8px;
box-shadow: 0 4px 12px rgba(0,0,0,0.1);
height: fit-content;
position: sticky;
top: 2rem;
}
.form-panel h2 {
margin-top: 0;
color: #333;
font-family: 'Merriweather', serif;
}
.form-group {
margin-bottom: 1.5rem;
}
.form-group label {
display: block;
margin-bottom: 0.5rem;
font-weight: 600;
color: #555;
}
.form-group input {
width: 100%;
padding: 0.75rem;
border: 1px solid #ccc;
border-radius: 4px;
box-sizing: border-box;
font-size: 1rem;
}
.action-buttons {
display: flex;
gap: 1rem;
margin-top: 1.5rem;
}
.action-buttons button {
flex-grow: 1;
padding: 0.75rem;
border: none;
border-radius: 4px;
font-size: 1rem;
font-weight: 600;
cursor: pointer;
transition: background-color 0.3s;
}
#downloadBtn {
background-color: #28a745;
color: white;
}
#downloadBtn:hover {
background-color: #218838;
}
.preview-panel {
display: flex;
justify-content: center;
align-items: flex-start;
}
/* Certificate Styles (copied from template and scaled) */
.certificate {
width: 800px;
height: 1100px;
background: #fdfaf0;
border: 2px solid #333;
position: relative;
box-shadow: 0 10px 30px rgba(0,0,0,0.2);
padding: 50px;
box-sizing: border-box;
display: flex;
flex-direction: column;
align-items: center;
font-family: 'Merriweather', serif;
transform: scale(0.8);
transform-origin: top center;
}
.party-popper { position: absolute; font-size: 40px; }
.top-left { top: 40px; left: 40px; }
.top-right { top: 40px; right: 40px; }
.main-title { font-size: 48px; font-weight: 900; color: #333; text-align: center; margin-top: 60px; line-height: 1.2; text-transform: uppercase; }
.subtitle { font-size: 24px; font-weight: 400; color: #333; text-align: center; margin-top: 30px; text-transform: uppercase; letter-spacing: 2px; }
.contributor-name { font-size: 56px; font-weight: 700; color: #333; text-align: center; margin: 15px 0 50px; }
.mascot-image { width: 450px; height: 450px; background-image: url('first_pr.png'); background-size: contain; background-repeat: no-repeat; background-position: center; margin-top: 20px; -webkit-print-color-adjust: exact; print-color-adjust: exact; }
.description { font-size: 22px; color: #333; line-height: 1.6; text-align: center; margin-top: 40px; }
.description strong { font-weight: 700; }
.footer { width: 100%; margin-top: auto; padding-top: 30px; border-top: 1px solid #ccc; display: flex; justify-content: space-between; align-items: flex-end; font-size: 16px; color: #333; }
.pr-info { text-align: left; }
.signature { text-align: right; font-style: italic; }
@media print {
body {
background: #fff;
margin: 0;
padding: 0;
}
.form-panel, .action-buttons {
display: none;
}
.container {
display: block;
margin: 0;
padding: 0;
}
.preview-panel {
padding: 0;
margin: 0;
}
.certificate {
transform: scale(1);
box-shadow: none;
width: 100%;
height: 100vh;
page-break-inside: avoid;
}
}
</style>
</head>
<body>
<div class="container">
<div class="form-panel">
<h2>Certificate Generator</h2>
<div class="form-group">
<label for="contributorName">Contributor Name</label>
<input type="text" id="contributorName" value="Mike McGrath">
</div>
<div class="form-group">
<label for="prNumber">PR Number</label>
<input type="text" id="prNumber" value="26393">
</div>
<div class="form-group">
<label for="mergeDate">Date</label>
<input type="text" id="mergeDate" value="June 13, 2025">
</div>
<div class="action-buttons">
<button id="downloadBtn">Download HTML</button>
</div>
</div>
<div class="preview-panel">
<div id="certificatePreview">
<!-- Certificate HTML will be injected here by script -->
</div>
</div>
</div>
<script>
const nameInput = document.getElementById('contributorName');
const prNumberInput = document.getElementById('prNumber');
const dateInput = document.getElementById('mergeDate');
const preview = document.getElementById('certificatePreview');
function generateCertificateHTML(name, prNumber, date) {
const prLink = `https://github.com/containers/podman/pull/${prNumber}`;
// This is the full, self-contained HTML for the certificate
return `
<div class="certificate">
<div class="party-popper top-left">🎉</div>
<div class="party-popper top-right">🎉</div>
<div class="main-title">Certificate of<br>Contribution</div>
<div class="subtitle">Awarded To</div>
<div class="contributor-name">${name}</div>
<div class="mascot-image"></div>
<div class="description">
For successfully submitting and merging their <strong>First Pull Request</strong> to the <strong>Podman project</strong>.<br>
Your contribution helps make open source better—one PR at a time!
</div>
<div class="footer">
<div class="pr-info">
<div>🔧 Merged PR: <a href="${prLink}" target="_blank">${prLink}</a></div>
<div style="margin-top: 5px;">${date}</div>
</div>
<div class="signature">
Keep hacking, keep contributing!<br>
The Podman Community
</div>
</div>
</div>
`;
}
function updatePreview() {
const name = nameInput.value || '[CONTRIBUTOR_NAME]';
const prNumber = prNumberInput.value || '[PR_NUMBER]';
const date = dateInput.value || '[DATE]';
preview.innerHTML = generateCertificateHTML(name, prNumber, date);
}
document.getElementById('downloadBtn').addEventListener('click', () => {
const name = nameInput.value || 'contributor';
const prNumber = prNumberInput.value || '00000';
const date = dateInput.value || 'Date';
const certificateHTML = generateCertificateHTML(name, prNumber, date);
const fullPageHTML = `
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Certificate for ${name}</title>
<style>
/* All the CSS from the generator page */
@import url('https://fonts.googleapis.com/css2?family=Merriweather:wght@400;700;900&display=swap');
body { margin: 20px; font-family: 'Merriweather', serif; background: #e0e0e0; }
.certificate {
transform: scale(1);
box-shadow: none;
margin: auto;
}
/* Paste all certificate-related styles here */
.certificate { width: 800px; height: 1100px; background: #fdfaf0; border: 2px solid #333; position: relative; padding: 50px; box-sizing: border-box; display: flex; flex-direction: column; align-items: center; }
.party-popper { position: absolute; font-size: 40px; }
.top-left { top: 40px; left: 40px; }
.top-right { top: 40px; right: 40px; }
.main-title { font-size: 48px; font-weight: 900; color: #333; text-align: center; margin-top: 60px; line-height: 1.2; text-transform: uppercase; }
.subtitle { font-size: 24px; font-weight: 400; color: #333; text-align: center; margin-top: 30px; text-transform: uppercase; letter-spacing: 2px; }
.contributor-name { font-size: 56px; font-weight: 700; color: #333; text-align: center; margin: 15px 0 50px; }
.mascot-image { width: 450px; height: 450px; background-image: url('first_pr.png'); background-size: contain; background-repeat: no-repeat; background-position: center; margin-top: 20px; -webkit-print-color-adjust: exact; print-color-adjust: exact; }
.description { font-size: 22px; color: #333; line-height: 1.6; text-align: center; margin-top: 40px; }
.description strong { font-weight: 700; }
.footer { width: 100%; margin-top: auto; padding-top: 30px; border-top: 1px solid #ccc; display: flex; justify-content: space-between; align-items: flex-end; font-size: 16px; color: #333; }
.pr-info { text-align: left; }
.signature { text-align: right; font-style: italic; }
@media print {
@page { size: A4 portrait; margin: 0; }
body, html { width: 100%; height: 100%; margin: 0; padding: 0; }
.certificate { width: 100%; height: 100%; box-shadow: none; transform: scale(1); }
}
</style>
</head>
<body>${certificateHTML}</body>
</html>
`;
const blob = new Blob([fullPageHTML], { type: 'text/html' });
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = `podman-contribution-certificate-${name.toLowerCase().replace(/\s+/g, '-')}.html`;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
});
// Add event listeners to update preview on input change
[nameInput, prNumberInput, dateInput].forEach(input => {
input.addEventListener('input', updatePreview);
});
// Initial preview generation
updatePreview();
</script>
</body>
</html>

View File

@ -0,0 +1,175 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Podman Certificate of Contribution</title>
<style>
@import url('https://fonts.googleapis.com/css2?family=Merriweather:wght@400;700;900&display=swap');
body {
margin: 0;
padding: 20px;
font-family: 'Merriweather', serif;
background: #e0e0e0;
display: flex;
justify-content: center;
align-items: center;
min-height: 100vh;
}
.certificate {
width: 800px;
height: 1100px;
background: #fdfaf0;
border: 2px solid #333;
position: relative;
box-shadow: 0 10px 30px rgba(0, 0, 0, 0.2);
padding: 50px;
box-sizing: border-box;
display: flex;
flex-direction: column;
align-items: center;
}
.party-popper {
position: absolute;
font-size: 40px;
}
.top-left {
top: 40px;
left: 40px;
}
.top-right {
top: 40px;
right: 40px;
}
.main-title {
font-size: 48px;
font-weight: 900;
color: #333;
text-align: center;
margin-top: 60px;
line-height: 1.2;
text-transform: uppercase;
}
.subtitle {
font-size: 24px;
font-weight: 400;
color: #333;
text-align: center;
margin-top: 30px;
text-transform: uppercase;
letter-spacing: 2px;
}
.contributor-name {
font-size: 56px;
font-weight: 700;
color: #333;
text-align: center;
margin: 15px 0 50px;
}
.mascot-image {
width: 450px;
height: 450px;
background-image: url('first_pr.png');
background-size: contain;
background-repeat: no-repeat;
background-position: center;
margin-top: 20px;
-webkit-print-color-adjust: exact;
print-color-adjust: exact;
}
.description {
font-size: 22px;
color: #333;
line-height: 1.6;
text-align: center;
margin-top: 40px;
}
.description strong {
font-weight: 700;
}
.footer {
width: 100%;
margin-top: auto;
padding-top: 30px;
border-top: 1px solid #ccc;
display: flex;
justify-content: space-between;
align-items: flex-end;
font-size: 16px;
color: #333;
}
.pr-info {
text-align: left;
}
.signature {
text-align: right;
font-style: italic;
}
@media print {
@page {
size: A4 portrait;
margin: 0;
}
body, html {
width: 100%;
height: 100%;
margin: 0;
padding: 0;
background: #fdfaf0;
}
.certificate {
width: 100%;
height: 100vh;
box-shadow: none;
transform: scale(1);
border-radius: 0;
page-break-inside: avoid;
}
}
</style>
</head>
<body>
<div class="certificate">
<div class="party-popper top-left">🎉</div>
<div class="party-popper top-right">🎉</div>
<div class="main-title">Certificate of<br>Contribution</div>
<div class="subtitle">Awarded To</div>
<div class="contributor-name">[CONTRIBUTOR_NAME]</div>
<div class="mascot-image"></div>
<div class="description">
For successfully submitting and merging their <strong>First Pull Request</strong> to the <strong>Podman project</strong>.<br>
Your contribution helps make open source better—one PR at a time!
</div>
<div class="footer">
<div class="pr-info">
<div>🔧 Merged PR: [PR_LINK]</div>
<div style="margin-top: 5px;">[DATE]</div>
</div>
<div class="signature">
Keep hacking, keep contributing!<br>
The Podman Community
</div>
</div>
</div>
</body>
</html>

Binary file not shown.

After

Width:  |  Height:  |  Size: 578 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 138 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 138 KiB

View File

@ -6,7 +6,7 @@ RUN microdnf update -y && \
perl-Test perl-Test-Simple perl-Test-Differences \
perl-YAML-LibYAML perl-FindBin \
python3 python3-virtualenv python3-pip gcc python3-devel \
python3-flake8 python3-pep8-naming python3-flake8-docstrings python3-flake8-import-order python3-flake8-polyfill python3-mccabe python3-pep8-naming && \
python3-flake8 python3-pep8-naming python3-flake8-import-order python3-flake8-polyfill python3-mccabe python3-pep8-naming && \
microdnf clean all && \
rm -rf /var/cache/dnf
# Required by perl

View File

@ -16,7 +16,7 @@ if [[ -z "$AUTOMATION_LIB_PATH" ]]; then
(
echo "ERROR: Expecting \$AUTOMATION_LIB_PATH to be defined with the"
echo " installation directory of automation tooling."
) > /dev/stderr
) >> /dev/stderr
exit 1
fi

View File

@ -16,4 +16,4 @@ PyYAML~=6.0
aiohttp[speedups]~=3.8
gql[requests]~=3.3
requests>=2,<3
urllib3<2.0.0
urllib3<2.5.1

View File

@ -160,6 +160,8 @@ class TestMain(unittest.TestCase):
fake_stdout = StringIO()
fake_stderr = StringIO()
with redirect_stderr(fake_stderr), redirect_stdout(fake_stdout):
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)
results = ccia.main(self.bid)
self.assertEqual(fake_stderr.getvalue(), '')
for line in fake_stdout.getvalue().splitlines():

View File

@ -64,7 +64,7 @@ curl_post() {
die "Expecting non-empty data argument"
[[ -n "$token" ]] || \
dbg "### Warning: \$GITHUB_TOKEN is empty, performing unauthenticated query" > /dev/stderr
dbg "### Warning: \$GITHUB_TOKEN is empty, performing unauthenticated query" >> /dev/stderr
# Don't expose secrets on any command-line
local headers_tmpf
local headers_tmpf=$(tmpfile headers)
@ -81,7 +81,7 @@ EOF
local curl_cmd="$CURL --silent --request POST --url $url --header @$headers_tmpf --data @$data_tmpf"
dbg "### Executing '$curl_cmd'"
local ret="0"
$curl_cmd > /dev/stdout || ret=$?
$curl_cmd >> /dev/stdout || ret=$?
# Don't leave secrets lying around in files
rm -f "$headers_tmpf" "$data_tmpf" &> /dev/null

View File

@ -197,11 +197,12 @@ sub write_img {
# Annotate: add signature line at lower left
# FIXME: include git repo info?
if (grep { -x "$_/convert" } split(":", $ENV{PATH})) {
if (grep { -x "$_/magick" } split(":", $ENV{PATH})) {
unlink $img_out_tmp;
my $signature = strftime("Generated %Y-%m-%dT%H:%M:%S%z by $ME v$VERSION", localtime);
my @cmd = (
"convert",
"magick",
$img_out,
'-family' => 'Courier',
'-pointsize' => '12',
# '-style' => 'Normal', # Argh! This gives us Bold!?
@ -209,7 +210,7 @@ sub write_img {
'-fill' => '#000',
'-gravity' => 'SouthWest',
"-annotate", "+5+5", $signature,
"$img_out" => "$img_out_tmp"
$img_out_tmp
);
if (system(@cmd) == 0) {
rename $img_out_tmp => $img_out;
@ -424,18 +425,44 @@ sub _size {
}
##############
# _by_size # sort helper, for putting big nodes at bottom
# _by_type # sort helper, for clustering int/sys/machine tests
##############
sub _by_size {
_size($a) <=> _size($b) ||
$a->{name} cmp $b->{name};
sub _by_type {
my $ax = $a->{name};
my $bx = $b->{name};
# The big test types, in the order we want to show them
my @types = qw(integration system bud machine);
my %type_order = map { $types[$_] => $_ } (0..$#types);
my $type_re = join('|', @types);
if ($ax =~ /($type_re)/) {
my $a_type = $1;
if ($bx =~ /($type_re)/) {
my $b_type = $1;
return $type_order{$a_type} <=> $type_order{$b_type}
|| $ax cmp $bx;
}
else {
# e.g., $b is "win installer", $a is in @types, $b < $a
return 1;
}
}
elsif ($bx =~ /($type_re)/) {
# e.g., $a is "win installer", $b is in @types, $a < $b
return -1;
}
# Neither a nor b is in @types
$ax cmp $bx;
}
sub depended_on_by {
my $self = shift;
if (my $d = $self->{_depended_on_by}) {
my @d = sort _by_size map { $self->{_tasklist}->find($_) } @$d;
my @d = sort _by_type map { $self->{_tasklist}->find($_) } @$d;
return @d;
}
return;
@ -755,12 +782,16 @@ sub _draw_boxes {
if (my $only_if = $task->{yml}{only_if}) {
$shape = 'record';
$label .= '|' if $label;
if ($only_if =~ /CI:DOCS.*CI:BUILD/) {
$label .= "[SKIP: CI:BUILD]\\l[SKIP: CI:DOCS]\\l";
}
elsif ($only_if =~ /CI:DOCS/) {
$label .= "[SKIP: CI:DOCS]\\l";
# Collapse whitespace, and remove leading/trailing
$only_if =~ s/[\s\n]+/ /g;
$only_if =~ s/^\s+|\s+$//g;
# 2024-06-18 Paul CI skips
if ($only_if =~ m{\$CIRRUS_PR\s+==\s+''\s+.*\$CIRRUS_CHANGE_TITLE.*CI:ALL.*changesInclude.*test}) {
$label .= "[SKIP if not needed]";
}
# 2020-10 used in automation_images repo
elsif ($only_if eq q{$CIRRUS_PR != ''}) {
$label .= "[only if PR]";
@ -803,7 +834,28 @@ sub _draw_boxes {
elsif ($only_if =~ /CIRRUS_BRANCH\s+==\s+'main'\s+&&\s+\$CIRRUS_CRON\s+==\s+''/) {
$label .= "[only on merge]";
}
elsif ($only_if =~ /CIRRUS_BRANCH\s+!=~\s+'v.*-rhel'\s+&&\s+\$CIRRUS_BASE_BRANCH\s+!=~\s+'v.*-rhel'/) {
$label .= "[only if no RHEL release]";
}
elsif ($only_if =~ /CIRRUS_CHANGE_TITLE.*CI:BUILD.*CIRRUS_CHANGE_TITLE.*CI:MACHINE/s) {
$label .= "[SKIP: CI:BUILD or CI:MACHINE]";
}
elsif ($only_if =~ /CIRRUS_CHANGE_TITLE\s+!=.*CI:MACHINE.*CIRRUS_BRANCH.*main.*CIRRUS_BASE_BRANCH.*main.*\)/s) {
$label .= "[only if: main]";
}
# automation_images
elsif ($only_if eq q{$CIRRUS_CRON == '' && $CIRRUS_BRANCH == $CIRRUS_DEFAULT_BRANCH}) {
$label .= "[only if DEFAULT_BRANCH and not cron]";
}
elsif ($only_if eq q{$CIRRUS_PR != '' && $CIRRUS_PR_LABELS !=~ ".*no_build-push.*"}) {
$label .= "[only if PR, but not no_build-push]";
}
elsif ($only_if eq q{$CIRRUS_CRON == 'lifecycle'}) {
$label .= "[only on cron=lifecycle]";
}
else {
warn "$ME: unexpected only_if: $only_if\n";
$label .= "[only if: $only_if]";
}
}
@ -818,10 +870,27 @@ sub _draw_boxes {
if (my $skip = $task->{yml}{skip}) {
$shape = 'record';
$label .= '|' if $label && $label !~ /SKIP/;
# Collapse whitespace, and remove leading/trailing
$skip =~ s/[\s\n]+/ /g;
$skip =~ s/^\s+|\s+$//g;
my @reasons;
push @reasons, 'BRANCH','TAG' if $skip =~ /CIRRUS_PR.*CIRRUS_TAG/;
push @reasons, 'TAG' if $skip eq q{$CIRRUS_TAG != ''};
push @reasons, 'CI:DOCS' if $skip =~ /CI:DOCS/;
# automation_images
if ($skip eq q{$CIRRUS_CHANGE_TITLE =~ '.*CI:DOCS.*' || $CIRRUS_CHANGE_TITLE =~ '.*CI:TOOLING.*'}) {
push @reasons, "CI:DOCS or CI:TOOLING";
}
elsif ($skip eq q{$CIRRUS_CHANGE_TITLE =~ '.*CI:DOCS.*'}) {
push @reasons, "CI:DOCS";
}
elsif ($skip eq '$CI == $CI') {
push @reasons, "DISABLED MANUALLY";
}
elsif ($skip) {
warn "$ME: unexpected skip '$skip'\n";
}
if (@reasons) {
$label .= join('', map { "[SKIP: $_]\\l" } @reasons);
}

View File

@ -90,14 +90,14 @@ end_task:
- "middle_2"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"real_name_of_initial" [shape=ellipse style=bold color=a fontcolor=a]
"real_name_of_initial" -> "end" [color=a]
"end" [shape=ellipse style=bold color=z fontcolor=z]
"real_name_of_initial" -> "middle_1" [color=a]
"middle_1" [shape=ellipse style=bold color=b fontcolor=b]
"middle_1" -> "end" [color=b]
"end" [shape=ellipse style=bold color=z fontcolor=z]
"real_name_of_initial" -> "middle_2" [color=a]
"middle_2" [shape=ellipse style=bold color=c fontcolor=c]
"middle_2" -> "end" [color=c]
"real_name_of_initial" -> "end" [color=a]
<<<<<<<<<<<<<<<<<< env interpolation 1
env:
@ -510,10 +510,12 @@ success_task:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
"automation" [shape=ellipse style=bold color=a fontcolor=a]
"automation" -> "success" [color=a]
"success" [shape=ellipse style=bold color="#000000" fillcolor="#00f000" style=filled fontcolor="#000000"]
"automation" -> "build" [color=a]
"build" [shape=record style=bold color="#0000f0" fillcolor="#f0f0f0" style=filled fontcolor="#0000f0" label="build\l|- Build for fedora-32\l- Build for fedora-31\l- Build for ubuntu-20\l- Build for ubuntu-19\l"]
"build" -> "alt_build" [color="#0000f0"]
"alt_build" [shape=record style=bold color="#0000f0" fillcolor="#f0f0f0" style=filled fontcolor="#0000f0" label="alt build\l|- Build Each Commit\l- Windows Cross\l- Build Without CGO\l- Build varlink API\l- Static build\l- Test build RPM\l"]
"alt_build" -> "success" [color="#0000f0"]
"success" [shape=ellipse style=bold color="#000000" fillcolor="#00f000" style=filled fontcolor="#000000"]
"build" -> "bindings" [color="#0000f0"]
"bindings" [shape=ellipse style=bold color=b fontcolor=b]
"bindings" -> "success" [color=b]
@ -526,25 +528,23 @@ success_task:
"build" -> "osx_cross" [color="#0000f0"]
"osx_cross" [shape=ellipse style=bold color=e fontcolor=e]
"osx_cross" -> "success" [color=e]
"build" -> "success" [color="#0000f0"]
"build" -> "swagger" [color="#0000f0"]
"swagger" [shape=ellipse style=bold color=f fontcolor=f]
"swagger" -> "success" [color=f]
"build" -> "unit_test" [color="#0000f0"]
"unit_test" [shape=record style=bold color="#000000" fillcolor="#f09090" style=filled fontcolor="#000000" label="unit test\l|- Unit tests on fedora-32\l- Unit tests on fedora-31\l- Unit tests on ubuntu-20\l- Unit tests on ubuntu-19\l"]
"unit_test" -> "success" [color="#f09090"]
"build" -> "validate" [color="#0000f0"]
"validate" [shape=record style=bold color="#00c000" fillcolor="#f0f0f0" style=filled fontcolor="#00c000" label="validate\l|= Validate fedora-32 Build\l"]
"validate" -> "success" [color="#00c000"]
"build" -> "vendor" [color="#0000f0"]
"vendor" [shape=ellipse style=bold color=g fontcolor=g]
"vendor" -> "success" [color=g]
"build" -> "unit_test" [color="#0000f0"]
"unit_test" [shape=record style=bold color="#000000" fillcolor="#f09090" style=filled fontcolor="#000000" label="unit test\l|- Unit tests on fedora-32\l- Unit tests on fedora-31\l- Unit tests on ubuntu-20\l- Unit tests on ubuntu-19\l"]
"unit_test" -> "success" [color="#f09090"]
"build" -> "alt_build" [color="#0000f0"]
"alt_build" [shape=record style=bold color="#0000f0" fillcolor="#f0f0f0" style=filled fontcolor="#0000f0" label="alt build\l|- Build Each Commit\l- Windows Cross\l- Build Without CGO\l- Build varlink API\l- Static build\l- Test build RPM\l"]
"alt_build" -> "success" [color="#0000f0"]
"build" -> "success" [color="#0000f0"]
"automation" -> "success" [color=a]
"ext_svc_check" [shape=ellipse style=bold color=h fontcolor=h]
"ext_svc_check" -> "success" [color=h]
"ext_svc_check" -> "build" [color=h]
"ext_svc_check" -> "success" [color=h]
"smoke" [shape=ellipse style=bold color=i fontcolor=i]
"smoke" -> "success" [color=i]
"smoke" -> "build" [color=i]
"smoke" -> "success" [color=i]

View File

@ -10,7 +10,7 @@ set -eo pipefail
SCRIPT_BASEDIR="$(basename $0)"
badusage() {
echo "Incorrect usage: $SCRIPT_BASEDIR) <command> [options]" > /dev/stderr
echo "Incorrect usage: $SCRIPT_BASEDIR) <command> [options]" >> /dev/stderr
echo "ERROR: $1"
exit 121
}

View File

@ -28,7 +28,7 @@ automation_version() {
if [[ -n "$_avcache" ]]; then
echo "$_avcache"
else
echo "Error determining version number" > /dev/stderr
echo "Error determining version number" >> /dev/stderr
exit 1
fi
}

View File

@ -3,6 +3,7 @@
# A Library of contextual console output-related operations.
# Intended for use by other scripts, not to be executed directly.
# shellcheck source=common/lib/defaults.sh
source $(dirname $(realpath "${BASH_SOURCE[0]}"))/defaults.sh
# helper, not intended for use outside this file
@ -10,10 +11,11 @@ _rel_path() {
if [[ -z "$1" ]]; then
echo "<stdin>"
else
local abs_path=$(realpath "$1")
local rel_path=$(realpath --relative-to=. $abs_path)
local abs_path_len=${#abs_path}
local rel_path_len=${#rel_path}
local abs_path rel_path abs_path_len rel_path_len
abs_path=$(realpath "$1")
rel_path=$(realpath --relative-to=. $abs_path)
abs_path_len=${#abs_path}
rel_path_len=${#rel_path}
if ((abs_path_len <= rel_path_len)); then
echo "$abs_path"
else
@ -24,9 +26,10 @@ _rel_path() {
# helper, not intended for use outside this file
_ctx() {
local shortest_source_path grandparent_func
# Caller's caller details
local shortest_source_path=$(_rel_path "${BASH_SOURCE[3]}")
local grandparent_func="${FUNCNAME[2]}"
shortest_source_path=$(_rel_path "${BASH_SOURCE[3]}")
grandparent_func="${FUNCNAME[2]}"
[[ -n "$grandparent_func" ]] || \
grandparent_func="main"
echo "$shortest_source_path:${BASH_LINENO[2]} in ${FUNCNAME[3]}()"
@ -34,9 +37,10 @@ _ctx() {
# helper, not intended for use outside this file.
_fmt_ctx() {
local stars="************************************************"
local prefix="${1:-no prefix given}"
local message="${2:-no message given}"
local stars prefix message
stars="************************************************"
prefix="${1:-no prefix given}"
message="${2:-no message given}"
echo "$stars"
echo "$prefix ($(_ctx))"
echo "$stars"
@ -44,37 +48,40 @@ _fmt_ctx() {
# Print a highly-visible message to stderr. Usage: warn <msg>
warn() {
_fmt_ctx "$WARNING_MSG_PREFIX ${1:-no warning message given}" > /dev/stderr
_fmt_ctx "$WARNING_MSG_PREFIX ${1:-no warning message given}" >> /dev/stderr
}
# Same as warn() but exit non-zero or with given exit code
# usage: die <msg> [exit-code]
die() {
_fmt_ctx "$ERROR_MSG_PREFIX ${1:-no error message given}" > /dev/stderr
_fmt_ctx "$ERROR_MSG_PREFIX ${1:-no error message given}" >> /dev/stderr
local exit_code=${2:-1}
((exit_code==0)) || \
exit $exit_code
}
dbg() {
local shortest_source_path
if ((A_DEBUG)); then
local shortest_source_path=$(_rel_path "${BASH_SOURCE[1]}")
shortest_source_path=$(_rel_path "${BASH_SOURCE[1]}")
(
echo
echo "$DEBUG_MSG_PREFIX ${1:-No debugging message given} ($shortest_source_path:${BASH_LINENO[0]} in ${FUNCNAME[1]}())"
) > /dev/stderr
) >> /dev/stderr
fi
}
msg() {
echo "${1:-No message specified}" &> /dev/stderr
echo "${1:-No message specified}" &>> /dev/stderr
}
# Mimic set +x for a single command, along with calling location and line.
showrun() {
local -a context
# Tried using readarray, it broke tests for some reason, too lazy to investigate.
# shellcheck disable=SC2207
context=($(caller 0))
echo "+ $@ # ${context[2]}:${context[0]} in ${context[1]}()" > /dev/stderr
echo "+ $* # ${context[2]}:${context[0]} in ${context[1]}()" >> /dev/stderr
"$@"
}
@ -109,7 +116,7 @@ show_env_vars() {
warn "The \$SECRET_ENV_RE var. unset/empty: Not filtering sensitive names!"
fi
for env_var_name in $(awk 'BEGIN{for(v in ENVIRON) print v}' | grep -Eiv "$filter_rx" | sort -u); do
for env_var_name in $(awk 'BEGIN{for(v in ENVIRON) print v}' | grep -Eiv "$filter_rx" | sort); do
line="${env_var_name}=${!env_var_name}"
msg " $line"

View File

@ -8,6 +8,8 @@ OS_REL_VER="${OS_REL_VER:-$OS_RELEASE_ID-$OS_RELEASE_VER}"
# Ensure no user-input prompts in an automation context
export DEBIAN_FRONTEND="${DEBIAN_FRONTEND:-noninteractive}"
# _TEST_UID only needed for unit-testing
# shellcheck disable=SC2154
if ((UID)) || ((_TEST_UID)); then
SUDO="${SUDO:-sudo}"
if [[ "$OS_RELEASE_ID" =~ (ubuntu)|(debian) ]]; then
@ -43,13 +45,51 @@ passthrough_envars() {
for envar in SECRET_ENV_RE PASSTHROUGH_ENV_EXACT PASSTHROUGH_ENV_ATSTART PASSTHROUGH_ENV_ANYWHERE passthrough_env_re; do
if [[ -z "${!envar}" ]]; then
echo "Error: Required env. var. \$$envar is unset or empty in call to passthrough_envars()" > /dev/stderr
echo "Error: Required env. var. \$$envar is unset or empty in call to passthrough_envars()" >> /dev/stderr
exit 1
fi
done
echo "Warning: Will pass env. vars. matching the following regex:
$passthrough_env_re" > /dev/stderr
$passthrough_env_re" >> /dev/stderr
compgen -A variable | grep -Ev "$SECRET_ENV_RE" | grep -E "$passthrough_env_re"
}
# On more occasions than we'd like, it's necessary to put temporary
# platform-specific workarounds in place. To help ensure they'll
# actually be temporary, it's useful to place a time limit on them.
# This function accepts two arguments:
# - A (required) future date of the form YYYYMMDD (UTC based).
# - An (optional) message string to display upon expiry of the timebomb.
timebomb() {
local expire="$1"
if ! expr "$expire" : '[0-9]\{8\}$' > /dev/null; then
echo "timebomb: '$expire' must be UTC-based and of the form YYYYMMDD"
exit 1
fi
if [[ $(date -u +%Y%m%d) -lt $(date -u -d "$expire" +%Y%m%d) ]]; then
return
fi
declare -a frame
read -a frame < <(caller)
cat << EOF >> /dev/stderr
***********************************************************
* TIME BOMB EXPIRED!
*
* >> ${frame[1]}:${frame[0]}: ${2:-No reason given, tsk tsk}
*
* Temporary workaround expired on ${expire:0:4}-${expire:4:2}-${expire:6:2}.
*
* Please review the above source file and either remove the
* workaround or, if absolutely necessary, extend it.
*
* Please also check for other timebombs while you're at it.
***********************************************************
EOF
exit 1
}

View File

@ -6,6 +6,6 @@ set -e
cd $(dirname $0)
for testscript in test???-*.sh; do
echo -e "\nExecuting $testscript..." > /dev/stderr
echo -e "\nExecuting $testscript..." >> /dev/stderr
./$testscript
done

View File

@ -3,9 +3,14 @@
# Unit-tests for library script in the current directory
# Also verifies test script is derived from library filename
# shellcheck source-path=./
source $(dirname ${BASH_SOURCE[0]})/testlib.sh || exit 1
# Must be statically defined, 'source-path' directive can't work here.
# shellcheck source=../lib/platform.sh disable=SC2154
source "$TEST_DIR/$SUBJ_FILENAME" || exit 2
# For whatever reason, SCRIPT_PATH cannot be resolved.
# shellcheck disable=SC2154
test_cmd "Library $SUBJ_FILENAME is not executable" \
0 "" \
test ! -x "$SCRIPT_PATH/$SUBJ_FILENAME"
@ -26,8 +31,12 @@ done
for OS_RELEASE_ID in 'debian' 'ubuntu'; do
(
export _TEST_UID=$RANDOM # Normally $UID is read-only
# Must be statically defined, 'source-path' directive can't work here.
# shellcheck source=../lib/platform.sh disable=SC2154
source "$TEST_DIR/$SUBJ_FILENAME" || exit 2
# The point of this test is to confirm it's defined
# shellcheck disable=SC2154
test_cmd "The '\$SUDO' env. var. is non-empty when \$_TEST_UID is non-zero" \
0 "" \
test -n "$SUDO"
@ -56,12 +65,10 @@ test_cmd "The passthrough_envars() func. has output by default." \
# Test from a mostly empty environment to limit possibility of expr mismatch flakes
declare -a printed_envs
printed_envs=(\
$(env --ignore-environment PATH="$PATH" FOOBARBAZ="testing" \
SECRET_ENV_RE="(^PATH$)|(^BASH_FUNC)|(^_.*)|(FOOBARBAZ)|(SECRET_ENV_RE)" \
CI="true" AUTOMATION_LIB_PATH="$AUTOMATION_LIB_PATH" \
bash -c "source $TEST_DIR/$SUBJ_FILENAME && passthrough_envars")
)
readarray -t printed_envs <<<$(env --ignore-environment PATH="$PATH" FOOBARBAZ="testing" \
SECRET_ENV_RE="(^PATH$)|(^BASH_FUNC)|(^_.*)|(FOOBARBAZ)|(SECRET_ENV_RE)" \
CI="true" AUTOMATION_LIB_PATH="/path/to/some/place" \
bash -c "source $TEST_DIR/$SUBJ_FILENAME && passthrough_envars")
test_cmd "The passthrough_envars() func. w/ overriden \$SECRET_ENV_RE hides test variable." \
1 "0" \
@ -71,5 +78,23 @@ test_cmd "The passthrough_envars() func. w/ overriden \$SECRET_ENV_RE returns CI
0 "[1-9]+[0-9]*" \
expr match "${printed_envs[*]}" '.*CI.*'
test_cmd "timebomb() function requires at least one argument" \
1 "must be UTC-based and of the form YYYYMMDD" \
timebomb
TZ=UTC12 \
test_cmd "timebomb() function ignores TZ and compares < UTC-forced current date" \
1 "TIME BOMB EXPIRED" \
timebomb $(TZ=UTC date +%Y%m%d)
test_cmd "timebomb() alerts user when no description given" \
1 "No reason given" \
timebomb 00010101
EXPECTED_REASON="test${RANDOM}test"
test_cmd "timebomb() gives reason when one was provided" \
1 "$EXPECTED_REASON" \
timebomb 00010101 "$EXPECTED_REASON"
# Must be last call
exit_with_status

View File

@ -2,7 +2,7 @@
# This file is intended for sourcing by the cirrus-ci_retrospective workflow
# It should not be used under any other context.
source $(dirname $BASH_SOURCE[0])/github_common.sh || exit 1
source $(dirname ${BASH_SOURCE[0]})/github_common.sh || exit 1
# Cirrus-CI Build status codes that represent completion
COMPLETE_STATUS_RE='FAILED|COMPLETED|ABORTED|ERRORED'
@ -63,7 +63,7 @@ load_ccir() {
was_pr='true'
# Don't race vs another cirrus-ci build triggered _after_ GH action workflow started
# since both may share the same check_suite. e.g. task re-run or manual-trigger
if echo "$bst" | egrep -q "$COMPLETE_STATUS_RE"; then
if echo "$bst" | grep -E -q "$COMPLETE_STATUS_RE"; then
if [[ -n "$tst" ]] && [[ "$tst" == "PAUSED" ]]; then
dbg "Detected action status $tst"
do_intg='true'

5
mac_pw_pool/.gitignore vendored Normal file
View File

@ -0,0 +1,5 @@
/Cron.log
/utilization.csv
/dh_status.txt*
/pw_status.txt*
/html/utilization.png*

200
mac_pw_pool/AllocateTestDH.sh Executable file
View File

@ -0,0 +1,200 @@
#!/bin/bash
# This script is intended for use by humans to allocate a dedicated-host
# and create an instance on it for testing purposes. When executed,
# it will create a temporary clone of the repository with the necessary
# modifications to manipulate the test host. It's the user's responsibility
# to cleanup this directory after manually removing the instance (see below).
#
# **Note**: Due to Apple/Amazon restrictions on the removal of these
# resources, cleanup must be done manually. You will need to shutdown and
# terminate the instance, then wait 24-hours before releasing the
# dedicated-host. The hosts cost money w/n an instance is running.
#
# The script assumes:
#
# * The current $USER value reflects your actual identity such that
# the test instance may be labeled appropriatly for auditing.
# * The `aws` CLI tool is installed on $PATH.
# * Appropriate `~/.aws/credentials` credentials are setup.
# * The us-east-1 region is selected in `~/.aws/config`.
# * The $POOLTOKEN env. var. is set to value available from
# https://cirrus-ci.com/pool/1cf8c7f7d7db0b56aecd89759721d2e710778c523a8c91c7c3aaee5b15b48d05
# * The local ssh-agent is able to supply the appropriate private key (stored in BW).
set -eo pipefail
# shellcheck source-path=SCRIPTDIR
source $(dirname ${BASH_SOURCE[0]})/pw_lib.sh
# Support debugging all mac_pw_pool scripts or only this one
I_DEBUG="${I_DEBUG:0}"
if ((I_DEBUG)); then
X_DEBUG=1
warn "Debugging enabled."
fi
dbg "\$USER=$USER"
[[ -n "$USER" ]] || \
die "The variable \$USER must not be empty"
[[ -n "$POOLTOKEN" ]] || \
die "The variable \$POOLTOKEN must not be empty"
INST_NAME="${USER}Testing"
LIB_DIRNAME=$(realpath --relative-to=$REPO_DIRPATH $LIB_DIRPATH)
# /tmp is usually a tmpfs, don't let an accidental reboot ruin
# access to a test DH/instance for a developer.
TMP_CLONE_DIRPATH="/var/tmp/${LIB_DIRNAME}_${INST_NAME}"
dbg "\$TMP_CLONE_DIRPATH=$TMP_CLONE_DIRPATH"
if [[ -d "$TMP_CLONE_DIRPATH" ]]; then
die "Found existing '$TMP_CLONE_DIRPATH', assuming in-use/relevant; If not, manual cleanup is required."
fi
msg "Creating temporary clone dir and transfering any uncommited files."
git clone --no-local --no-hardlinks --depth 1 --single-branch --no-tags --quiet "file://$REPO_DIRPATH" "$TMP_CLONE_DIRPATH"
declare -a uncommited_filepaths
readarray -t uncommited_filepaths <<<$(
pushd "$REPO_DIRPATH" &> /dev/null
# Obtaining uncommited relative staged filepaths
git diff --name-only HEAD
# Obtaining uncommited relative unstaged filepaths
git ls-files . --exclude-standard --others
popd &> /dev/null
)
dbg "Copying \$uncommited_filepaths[*]=${uncommited_filepaths[*]}"
for uncommited_file in "${uncommited_filepaths[@]}"; do
uncommited_file_src="$REPO_DIRPATH/$uncommited_file"
uncommited_file_dest="$TMP_CLONE_DIRPATH/$uncommited_file"
uncommited_file_dest_parent=$(dirname "$uncommited_file_dest")
#dbg "Working on uncommited file '$uncommited_file_src'"
if [[ -r "$uncommited_file_src" ]]; then
mkdir -p "$uncommited_file_dest_parent"
#dbg "$uncommited_file_src -> $uncommited_file_dest"
cp -a "$uncommited_file_src" "$uncommited_file_dest"
fi
done
declare -a modargs
# Format: <pw_lib.sh var name> <new value> <old value>
modargs=(
# Necessary to prevent in-production macs from trying to use testing instance
"DH_REQ_VAL $INST_NAME $DH_REQ_VAL"
# Necessary to make test dedicated host stand out when auditing the set in the console
"DH_PFX $INST_NAME $DH_PFX"
# The default launch template name includes $DH_PFX, ensure the production template name is used.
# N/B: The old/unmodified pw_lib.sh is still loaded for the running script
"TEMPLATE_NAME $TEMPLATE_NAME Cirrus${DH_PFX}PWinstance"
# Permit developer to use instance for up to 3 days max (orphan vm cleaning process will nail it after that).
"PW_MAX_HOURS 72 $PW_MAX_HOURS"
# Permit developer to execute as many Cirrus-CI tasks as they want w/o automatic shutdown.
"PW_MAX_TASKS 9999 $PW_MAX_TASKS"
)
for modarg in "${modargs[@]}"; do
set -- $modarg # Convert the "tuple" into the param args $1 $2...
dbg "Modifying pw_lib.sh \$$1 definition to '$2' (was '$3')"
sed -i -r -e "s/^$1=.*/$1=\"$2\"/" "$TMP_CLONE_DIRPATH/$LIB_DIRNAME/pw_lib.sh"
# Ensure future script invocations use the new values
unset $1
done
cd "$TMP_CLONE_DIRPATH/$LIB_DIRNAME"
source ./pw_lib.sh
# Before going any further, make sure there isn't an existing
# dedicated-host named ${INST_NAME}-0. If there is, it can
# be re-used instead of failing the script outright.
existing_dh_json=$(mktemp -p "." dh_allocate_XXXXX.json)
$AWS ec2 describe-hosts --filter "Name=tag:Name,Values=${INST_NAME}-0" --query 'Hosts[].HostId' > "$existing_dh_json"
if grep -Fqx '[]' "$existing_dh_json"; then
msg "Creating the dedicated host '${INST_NAME}-0'"
declare dh_allocate_json
dh_allocate_json=$(mktemp -p "." dh_allocate_XXXXX.json)
declare -a awsargs
# Word-splitting of $AWS is desireable
# shellcheck disable=SC2206
awsargs=(
$AWS
ec2 allocate-hosts
--availability-zone us-east-1a
--instance-type mac2.metal
--auto-placement off
--host-recovery off
--host-maintenance off
--quantity 1
--tag-specifications
"ResourceType=dedicated-host,Tags=[{Key=Name,Value=${INST_NAME}-0},{Key=$DH_REQ_TAG,Value=$DH_REQ_VAL},{Key=PWPoolReady,Value=true},{Key=automation,Value=false}]"
)
# N/B: Apple/Amazon require min allocation time of 24hours!
dbg "Executing: ${awsargs[*]}"
"${awsargs[@]}" > "$dh_allocate_json" || \
die "Provisioning new dedicated host $INST_NAME failed. Manual debugging & cleanup required."
dbg $(jq . "$dh_allocate_json")
dhid=$(jq -r -e '.HostIds[0]' "$dh_allocate_json")
[[ -n "$dhid" ]] || \
die "Obtaining DH ID of new host. Manual debugging & cleanup required."
# There's a small delay between allocating the dedicated host and LaunchInstances.sh
# being able to interact with it. There's no sensible way to monitor for this state :(
sleep 3s
else # A dedicated host already exists
dhid=$(jq -r -e '.[0]' "$existing_dh_json")
fi
# Normally allocation is fairly instant, but not always. Confirm we're able to actually
# launch a mac instance onto the dedicated host.
for ((attempt=1 ; attempt < 11 ; attempt++)); do
msg "Attempt #$attempt launching a new instance on dedicated host"
./LaunchInstances.sh --force
if grep -E "^${INST_NAME}-0 i-" dh_status.txt; then
attempt=-1 # signal success
break
fi
sleep 1s
done
[[ "$attempt" -eq -1 ]] || \
die "Failed to use LaunchInstances.sh. Manual debugging & cleanup required."
# At this point the script could call SetupInstances.sh in another loop
# but it takes about 20-minutes to complete. Also, the developer may
# not need it, they may simply want to ssh into the instance to poke
# around. i.e. they don't need to run any Cirrus-CI jobs on the test
# instance.
warn "---"
warn "NOT copying/running setup.sh to new instance (in case manual activities are desired)."
warn "---"
w="PLEASE REMEMBER TO terminate instance, wait two hours, then
remove the dedicated-host in the web console, or run
'aws ec2 release-hosts --host-ids=$dhid'."
msg "---"
msg "Dropping you into a shell inside a temp. repo clone:
($TMP_CLONE_DIRPATH/$LIB_DIRNAME)"
msg "---"
msg "Once it finishes booting (5m), you may use './InstanceSSH.sh ${INST_NAME}-0'
to access it. Otherwise to fully setup the instance for Cirrus-CI, you need
to execute './SetupInstances.sh' repeatedly until the ${INST_NAME}-0 line in
'pw_status.txt' includes the text 'complete alive'. That process can take 20+
minutes. Once alive, you may then use Cirrus-CI to test against this specific
instance with any 'persistent_worker' task having a label of
'$DH_REQ_TAG=$DH_REQ_VAL' set."
msg "---"
warn "$w"
export POOLTOKEN # ensure availability in sub-shell
bash -l
warn "$w"

70
mac_pw_pool/Cron.sh Executable file
View File

@ -0,0 +1,70 @@
#!/bin/bash
# Intended to be run from $HOME/deve/automation/mac_pw_pool/
# using a crontab like:
# # Every date/timestamp in PW Pool management is UTC-relative
# # make cron do the same for consistency.
# CRON_TZ=UTC
#
# PATH=/home/shared/.local/bin:/home/shared/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
#
# # Keep log from filling up disk & make sure webserver is running
# # (5am UTC is during CI-activity lul)
# 59 4 * * * $HOME/devel/automation/mac_pw_pool/nightly_maintenance.sh &>> $CRONLOG
#
# # PW Pool management (usage drop-off from 03:00-15:00 UTC)
# POOLTOKEN=<from https://cirrus-ci.com/pool/1cf8c7f7d7db0b56aecd89759721d2e710778c523a8c91c7c3aaee5b15b48d05>
# CRONLOG=/home/shared/devel/automation/mac_pw_pool/Cron.log
# */5 * * * * /home/shared/devel/automation/mac_pw_pool/Cron.sh &>> $CRONLOG
# shellcheck disable=SC2154
[ "${FLOCKER}" != "$0" ] && exec env FLOCKER="$0" flock -e -w 300 "$0" "$0" "$@" || :
# shellcheck source=./pw_lib.sh
source $(dirname "${BASH_SOURCE[0]}")/pw_lib.sh
cd $SCRIPT_DIRPATH || die "Cannot enter '$SCRIPT_DIRPATH'"
# SSH agent required to provide key for accessing workers
# Started with `ssh-agent -s > /run/user/$UID/ssh-agent.env`
# followed by adding/unlocking the necessary keys.
# shellcheck disable=SC1090
source /run/user/$UID/ssh-agent.env
date -u -Iminutes
now_minutes=$(date -u +%M)
if (($now_minutes%10==0)); then
$SCRIPT_DIRPATH/LaunchInstances.sh
echo "Exit: $?"
fi
$SCRIPT_DIRPATH/SetupInstances.sh
echo "Exit: $?"
[[ -r "$PWSTATE" ]] || \
die "Can't read $PWSTATE to generate utilization data."
uzn_file="$SCRIPT_DIRPATH/utilization.csv"
# Run input through `date` to validate values are usable timestamps
timestamp=$(date -u -Iseconds -d \
$(grep -E '^# SetupInstances\.sh run ' "$PWSTATE" | \
awk '{print $4}'))
pw_state=$(grep -E -v '^($|#+| +)' "$PWSTATE")
n_workers=$(grep 'complete alive' <<<"$pw_state" | wc -l)
n_tasks=$(awk "BEGIN{B=0} /${DH_PFX}-[0-9]+ complete alive/{B+=\$4} END{print B}" <<<"$pw_state")
n_taskf=$(awk "BEGIN{E=0} /${DH_PFX}-[0-9]+ complete alive/{E+=\$5} END{print E}" <<<"$pw_state")
printf "%s,%i,%i,%i\n" "$timestamp" "$n_workers" "$n_tasks" "$n_taskf" | tee -a "$uzn_file"
# Prevent uncontrolled growth of utilization.csv. Assume this script
# runs every $interval minutes, keep only $history_hours worth of data.
interval_minutes=5
history_hours=36
lines_per_hour=$((60/$interval_minutes))
max_uzn_lines=$(($history_hours * $lines_per_hour))
tail -n $max_uzn_lines "$uzn_file" > "${uzn_file}.tmp"
mv "${uzn_file}.tmp" "$uzn_file"
# If possible, generate the webpage utilization graph
gnuplot -c Utilization.gnuplot || true

39
mac_pw_pool/InstanceSSH.sh Executable file
View File

@ -0,0 +1,39 @@
#!/bin/bash
set -eo pipefail
# Helper for humans to access an existing instance. It depends on:
#
# * You know the instance-id or name.
# * All requirements listed in the top `LaunchInstances.sh` comment.
# * The local ssh-agent is able to supply the appropriate private key.
# shellcheck source-path=SCRIPTDIR
source $(dirname ${BASH_SOURCE[0]})/pw_lib.sh
SSH="ssh $SSH_ARGS" # N/B: library default nulls stdin
if nc -z localhost 5900; then
# Enable access to VNC if it's running
# ref: https://repost.aws/knowledge-center/ec2-mac-instance-gui-access
SSH+=" -L 5900:localhost:5900"
fi
[[ -n "$1" ]] || \
die "Must provide EC2 instance ID as first argument"
case "$1" in
i-*)
inst_json=$($AWS ec2 describe-instances --instance-ids "$1") ;;
*)
inst_json=$($AWS ec2 describe-instances --filter "Name=tag:Name,Values=$1") ;;
esac
shift
pub_dns=$(jq -r -e '.Reservations?[0]?.Instances?[0]?.PublicDnsName?' <<<"$inst_json")
if [[ -z "$pub_dns" ]] || [[ "$pub_dns" == "null" ]]; then
die "Instance '$1' does not exist, or have a public DNS address allocated (yet)."
fi
echo "+ $SSH ec2-user@$pub_dns $*" >> /dev/stderr
exec $SSH ec2-user@$pub_dns "$@"

310
mac_pw_pool/LaunchInstances.sh Executable file
View File

@ -0,0 +1,310 @@
#!/bin/bash
set -eo pipefail
# Script intended to be executed by humans (and eventually automation) to
# ensure instances are launched from the current template version, on all
# available Cirrus-CI Persistent Worker M1 Mac dedicated hosts. These
# dedicated host (slots) are selected at runtime based their possessing a
# 'true' value for their `PWPoolReady` tag. The script assumes:
#
# * The `aws` CLI tool is installed on $PATH.
# * Appropriate `~/.aws/credentials` credentials are setup.
# * The us-east-1 region is selected in `~/.aws/config`.
#
# N/B: Dedicated Host names and instance names are assumed to be identical,
# only the IDs differ.
# shellcheck source-path=SCRIPTDIR
source $(dirname ${BASH_SOURCE[0]})/pw_lib.sh
L_DEBUG="${L_DEBUG:0}"
if ((L_DEBUG)); then
X_DEBUG=1
warn "Debugging enabled - temp. dir will not be cleaned up '$TEMPDIR' $(ctx 0)."
trap EXIT
fi
# Helper intended for use inside `name_hostid` loop.
# arg1 either "INST" or "HOST"
# arg2: Brief failure message
# arg3: Failure message details
handle_failure() {
[[ -n "$inststate" ]] || die "Expecting \$inststate to be set $(ctx 2)"
[[ -n "$name" ]] || die "Expecting \$name to be set $(ctx 2)"
if [[ "$1" != "INST" ]] && [[ "$1" != "HOST" ]]; then
die "Expecting either INST or HOST as argument $(ctx 2)"
fi
[[ -n "$2" ]] || die "Expecting brief failure message $(ctx 2)"
[[ -n "$3" ]] || die "Expecting detailed failure message $(ctx 2)"
warn "$2 $(ctx 2)"
(
# Script is sensitive to this first-line format
echo "# $name $1 ERROR: $2"
# Make it obvious which host/instance the details pertain to
awk -e '{print "# "$0}'<<<"$3"
) > "$inststate"
}
# Wrapper around handle_failure()
host_failure() {
[[ -r "$hostoutput" ]] || die "Expecting readable $hostoutput file $(ctx)"
handle_failure HOST "$1" "aws CLI output: $(<$hostoutput)"
}
inst_failure() {
[[ -r "$instoutput" ]] || die "Expecting readable $instoutput file $(ctx)"
handle_failure INST "$1" "aws CLI output: $(<$instoutput)"
}
# Find dedicated hosts to operate on.
dh_name_flt="Name=tag:Name,Values=${DH_PFX}-*"
dh_tag_flt="Name=tag:$DH_REQ_TAG,Values=$DH_REQ_VAL"
dh_qry='Hosts[].{HostID:HostId, Name:[Tags[?Key==`Name`].Value][] | [0]}'
dh_searchout="$TEMPDIR/hosts.output" # JSON or error message
if ! $AWS ec2 describe-hosts --filter "$dh_name_flt" "$dh_tag_flt" --query "$dh_qry" &> "$dh_searchout"; then
die "Searching for dedicated hosts $(ctx 0):
$(<$dh_searchout)"
fi
# Array item format: "<Name> <ID>"
dh_fmt='.[] | .Name +" "+ .HostID'
# Avoid always processing hosts in the same alpha-sorted order, as that would
# mean hosts at the end of the list consistently wait the longest for new
# instances to be created (see creation-stagger code below).
if ! readarray -t NAME2HOSTID <<<$(json_query "$dh_fmt" "$dh_searchout" | sort --random-sort); then
die "Extracting dedicated host 'Name' and 'HostID' fields $(ctx 0):
$(<$dh_searchout)"
fi
n_dh=0
n_dh_total=${#NAME2HOSTID[@]}
if [[ -z "${NAME2HOSTID[*]}" ]] || ! ((n_dh_total)); then
msg "No dedicated hosts found"
exit 0
fi
latest_launched="1970-01-01T00:00+00:00" # in case $DHSTATE is missing
dcmpfmt="+%Y%m%d%H%M" # date comparison format compatible with numeric 'test'
# To find the latest instance launch time, script can't rely on reading
# $DHSTATE or $PWSTATE because they may not exist or be out of date.
# Search for all running instances by name and running state, returning
# their launch timestamps.
declare -a pw_filt
pw_filts=(
"Name=tag:Name,Values=${DH_PFX}-*"
'Name=tag:PWPoolReady,Values=true'
"Name=tag:$DH_REQ_TAG,Values=$DH_REQ_VAL"
'Name=instance-state-name,Values=running'
)
pw_query='Reservations[].Instances[].LaunchTime'
inst_lt_f=$TEMPDIR/inst_launch_times
dbg "Obtaining launch times for all running ${DH_PFX}-* instances"
dbg "$AWS ec2 describe-instances --filters '${pw_filts[*]}' --query '$pw_query' &> '$inst_lt_f'"
if ! $AWS ec2 describe-instances --filters "${pw_filts[@]}" --query "$pw_query" &> "$inst_lt_f"; then
die "Can not query instances:
$(<$inst_lt_f)"
else
declare -a launchtimes
if ! readarray -t launchtimes<<<$(json_query '.[]?' "$inst_lt_f") ||
[[ "${#launchtimes[@]}" -eq 0 ]] ||
[[ "${launchtimes[0]}" == "" ]]; then
warn "Found no running instances, this should not happen."
else
dbg "launchtimes=[${launchtimes[*]}]"
for launch_time in "${launchtimes[@]}"; do
if [[ "$launch_time" == "" ]] || [[ "$launch_time" == "null" ]]; then
warn "Ignoring empty/null instance launch time."
continue
fi
# Assume launch_time is never malformed
launched_hour=$(date -u -d "$launch_time" "$dcmpfmt")
latest_launched_hour=$(date -u -d "$latest_launched" "$dcmpfmt")
dbg "instance launched on $launched_hour; latest launch hour: $latest_launched_hour"
if [[ $launched_hour -gt $latest_launched_hour ]]; then
dbg "Updating latest launched timestamp"
latest_launched="$launch_time"
fi
done
fi
fi
# Increase readability for humans by always ensuring the two important
# date stamps line up regardless of the length of $n_dh_total.
_n_dh_sp=$(printf ' %.0s' seq 1 ${#n_dh_total})
msg "Operating on $n_dh_total dedicated hosts at $(date -u -Iseconds)"
msg " ${_n_dh_sp}Last instance launch on $latest_launched"
echo -e "# $(basename ${BASH_SOURCE[0]}) run $(date -u -Iseconds)\n#" > "$TEMPDIR/$(basename $DHSTATE)"
# When initializing a new pool of workers, it would take many hours
# to wait for the staggered creation mechanism on each host. This
# would negativly impact worker utilization. Provide a workaround.
force=0
# shellcheck disable=SC2199
if [[ "$@" =~ --force ]]; then
warn "Forcing instance creation: Ignoring staggered creation limits."
force=1
fi
for name_hostid in "${NAME2HOSTID[@]}"; do
n_dh=$(($n_dh+1))
_I=" "
msg " " # make output easier to read
read -r name hostid junk<<<"$name_hostid"
msg "Working on Dedicated Host #$n_dh/$n_dh_total '$name' for HostID '$hostid'."
hostoutput="$TEMPDIR/${name}_host.output" # JSON or error message from aws describe-hosts
instoutput="$TEMPDIR/${name}_inst.output" # JSON or error message from aws describe-instance or run-instance
inststate="$TEMPDIR/${name}_inst.state" # Line to append to $DHSTATE
if ! $AWS ec2 describe-hosts --host-ids $hostid &> "$hostoutput"; then
host_failure "Failed to look up dedicated host."
continue
# Allow hosts to be taken out of service easily/manually by editing its tags.
# Also detect any JSON parsing problems in the output.
elif ! PWPoolReady=$(json_query '.Hosts?[0]?.Tags? | map(select(.Key == "PWPoolReady")) | .[].Value' "$hostoutput"); then
host_failure "Empty/null/failed JSON query of PWPoolReady tag."
continue
elif [[ "$PWPoolReady" != "true" ]]; then
msg "Dedicated host tag 'PWPoolReady' == '$PWPoolReady' != 'true'."
echo "# $name HOST DISABLED: PWPoolReady==$PWPoolReady" > "$inststate"
continue
fi
if ! hoststate=$(json_query '.Hosts?[0]?.State?' "$hostoutput"); then
host_failure "Empty/null/failed JSON query of dedicated host state."
continue
fi
if [[ "$hoststate" == "pending" ]] || \
[[ "$hoststate" == "under-assessment" ]] || \
[[ "$hoststate" == "released" ]]
then
# When an instance is terminated, its dedicated host goes into an unusable state
# for about 1-1/2 hours. There's absolutely nothing that can be done to avoid
# this or work around it. Ignore hosts in this state, assuming a later run of the
# script will start an instance on the (hopefully) available host).
#
# I have no idea what 'under-assessment' means, and it doesn't last as long as 'pending',
# but functionally it behaves the same.
#
# Hosts in 'released' state are about to go away, hopefully due to operator action.
# Don't treat this as an error.
msg "Dedicated host is untouchable due to '$hoststate' state."
# Reference the actual output text, in case of false-match or unexpected contents.
echo "# $name HOST BUSY: $hoststate" > "$inststate"
continue
elif [[ "$hoststate" != "available" ]]; then
# The "available" state means the host is ready for zero or more instances to be created.
# Detect all other states (they should be extremely rare).
host_failure "Unsupported dedicated host state '$hoststate'."
continue
fi
# Counter-intuitively, dedicated hosts can support more than one running instance. Except
# for Mac instances, but this is not reflected anywhere in the JSON. Trying to start a new
# Mac instance on an already occupied host is bound to fail. Inconveniently this error
# will look an aweful lot like many other types of errors, confusing any human examining
# $DHSTATE. Detect dedicated-hosts with existing instances.
InstanceId=$(set +e; jq -r '.Hosts?[0]?.Instances?[0].InstanceId?' "$hostoutput")
dbg "InstanceId='$InstanceId'"
# Stagger creation of instances by $CREATE_STAGGER_HOURS
launch_new=0
if [[ "$InstanceId" == "null" ]] || [[ "$InstanceId" == "" ]]; then
launch_threshold=$(date -u -Iseconds -d "$latest_launched + $CREATE_STAGGER_HOURS hours")
launch_threshold_hour=$(date -u -d "$launch_threshold" "$dcmpfmt")
now_hour=$(date -u "$dcmpfmt")
dbg "launch_threshold_hour=$launch_threshold_hour"
dbg " now_hour=$now_hour"
if [[ "$force" -eq 0 ]] && [[ $now_hour -lt $launch_threshold_hour ]]; then
msg "Cannot launch new instance until $launch_threshold"
echo "# $name HOST THROTTLE: Inst. creation delayed until $launch_threshold" > "$inststate"
continue
else
launch_new=1
fi
fi
if ((launch_new)); then
msg "Creating new $name instance on $name host."
if ! $AWS ec2 run-instances \
--launch-template LaunchTemplateName=${TEMPLATE_NAME} \
--tag-specifications \
"ResourceType=instance,Tags=[{Key=Name,Value=$name},{Key=$DH_REQ_TAG,Value=$DH_REQ_VAL},{Key=PWPoolReady,Value=true},{Key=automation,Value=true}]" \
--placement "HostId=$hostid" &> "$instoutput"; then
inst_failure "Failed to create new instance on available host."
continue
else
# Block further launches (assumes script is running in a 10m while loop).
latest_launched=$(date -u -Iseconds)
msg "Successfully created new instance; Waiting for 'running' state (~1m typical)..."
# N/B: New Mac instances take ~5-10m to actually become ssh-able
if ! InstanceId=$(json_query '.Instances?[0]?.InstanceId' "$instoutput"); then
inst_failure "Empty/null/failed JSON query of brand-new InstanceId"
continue
fi
# Instance "running" status is good enough for this script, and since network
# accessibility can take 5-20m post creation.
# Polls 40 times with 15-second delay (non-configurable).
if ! $AWS ec2 wait instance-running \
--instance-ids $InstanceId &> "${instoutput}.wait"; then
# inst_failure() would include unhelpful $instoutput detail
(
echo "# $name INST ERROR: Running-state timeout."
awk -e '{print "# "$0}' "${instoutput}.wait"
) > "$inststate"
continue
fi
fi
fi
# If an instance was created, $instoutput contents are already obsolete.
# If an existing instance, $instoutput doesn't exist.
if ! $AWS ec2 describe-instances --instance-ids $InstanceId &> "$instoutput"; then
inst_failure "Failed to describe host instance."
continue
fi
# Describe-instance has unnecessarily complex structure, simplify it.
if ! json_query '.Reservations?[0]?.Instances?[0]?' "$instoutput" > "${instoutput}.simple"; then
inst_failure "Empty/null/failed JSON simplification of describe-instances."
fi
mv "$instoutput" "${instoutput}.describe" # leave for debugging
mv "${instoutput}.simple" "${instoutput}"
msg "Parsing new or existing instance ($InstanceId) details."
if ! InstanceId=$(json_query '.InstanceId' $instoutput); then
inst_failure "Empty/null/failed JSON query of InstanceId"
continue
elif ! InstName=$(json_query '.Tags | map(select(.Key == "Name")) | .[].Value' $instoutput) || \
[[ "$InstName" != "$name" ]]; then
inst_failure "Inst. name '$InstName' != DH name '$name'"
elif ! LaunchTime=$(json_query '.LaunchTime' $instoutput); then
inst_failure "Empty/null/failed JSON query of LaunchTime"
continue
fi
echo "$name $InstanceId $LaunchTime" > "$inststate"
done
_I=""
msg " "
msg "Processing all dedicated host and instance states."
# Consuming state file in alpha-order is easier on human eyes
readarray -t NAME2HOSTID <<<$(json_query "$dh_fmt" "$dh_searchout" | sort)
for name_hostid in "${NAME2HOSTID[@]}"; do
read -r name hostid<<<"$name_hostid"
inststate="$TEMPDIR/${name}_inst.state"
[[ -r "$inststate" ]] || \
die "Expecting to find instance-state file $inststate for host '$name' $(ctx 0)."
cat "$inststate" >> "$TEMPDIR/$(basename $DHSTATE)"
done
dbg "Creating/updating state file"
if [[ -r "$DHSTATE" ]]; then
cp "$DHSTATE" "${DHSTATE}~"
fi
mv "$TEMPDIR/$(basename $DHSTATE)" "$DHSTATE"

138
mac_pw_pool/README.md Normal file
View File

@ -0,0 +1,138 @@
# Cirrus-CI persistent worker maintenance
These scripts are intended to be used from a repository clone,
by cron, on an always-on cloud machine. They make a lot of
other assumptions, some of which may not be well documented.
Please see the comments at the top of each scripts for more
detailed/specific information.
## Prerequisites
* The `aws` binary present somewhere on `$PATH`.
* Standard AWS `credentials` and `config` files exist under `~/.aws`
and set the region to `us-east-1`.
* A copy of the ssh-key referenced by `CirrusMacM1PWinstance` launch template
under "Assumptions" below.
* The ssh-key has been added to a running ssh-agent.
* The running ssh-agent sh-compatible env. vars. are stored in
`/run/user/$UID/ssh-agent.env`
* The env. var. `POOLTOKEN` is set to the Cirrus-CI persistent worker pool
token value.
## Assumptions
* You've read all scripts in this directory, generally follow
their purpose, and meet any requirements stated within the
header comment.
* You've read the [private documentation](https://docs.google.com/document/d/1PX6UyqDDq8S72Ko9qe_K3zoV2XZNRQjGxPiWEkFmQQ4/edit)
and understand the safety/security section.
* You have permissions to access all referenced AWS resources.
* There are one or more dedicated hosts allocated and have set:
* A name tag like `MacM1-<some number>` (NO SPACES!)
* The `mac2` instance family
* The `mac2.metal` instance type
* Disabled "Instance auto-placement", "Host recovery", and "Host maintenance"
* Quantity: 1
* Tags: `automation=false`, `purpose=prod`, and `PWPoolReady=true`
* The EC2 `CirrusMacM1PWinstance` instance-template exists and sets:
* Shutdown-behavior: terminate
* Same "key pair" referenced under `Prerequisites`
* All other required instance parameters complete
* A user-data script that shuts down the instance after 2 days.
## Operation (Theory)
The goal is to maintain sufficient alive/running/working instances
to service most Cirrus-CI tasks pointing at the pool. This is
best achieved with slower maintenance of hosts compared to setup
of ready instances. This is because hosts can be inaccessible for
up to 2 hours, but instances come up in ~10-20m, ready to run tasks.
Either hosts and/or instances may be removed from management by
setting "false" or removing their `PWPoolReady=true` tag. Otherwise,
the pool should be maintained by installing the crontab lines
indicated in the `Cron.sh` script.
Cirrus-CI will assign tasks (specially) targeted at the pool, to an
instance with a running listener (`cirrus worker run` process). If
there are none, the task will queue forever (there might be a 24-hour
timeout, I can't remember). From a PR perspective, there is little
control over which instance you get. It could easily be one where
a previous task barfed all over and rendered unusable.
## Initialization
It is assumed that neither the `Cron.sh` nor any related maintenance
scripts are installed (in crontab) or currently running.
Once several dedicated hosts have been manually created, they
should initially have no instances on them. If left alone, the
maintenance scripts will eventually bring them all up, however
complete creation and setup will take many hours. This may be
bypassed by *manually* running `LaunchInstances.sh --force`.
In order to prevent all the instances from being recycled at the same
(future) time, the shutdown time installed by `SetupInstances.sh` also
needs to be adjusted. The operator should first wait about 20 minutes
for all new instances to fully boot. Followed by a call to
`SetupInstances.sh --force`.
Now the `Cron.sh` cron-job may be installed, enabled and started.
## Manual Testing
Verifying changes to these scripts / cron-job must be done manually.
To support this, every dedicated host and instance has a `purpose`
tag, which must correspond to the value indicated in `pw_lib.sh`
and in the target repo `.cirrus.yml`. To test script and/or
CI changes:
1. Make sure you have locally met all requirements spelled out in the
header-comment of `AllocateTestDH.sh`.
1. Execute `AllocateTestDH.sh`. It will operate out of a temporary
clone of the repository to prevent pushing required test-modifications
upstream.
1. Repeatedly execute `SetupInstances.sh`. It will update `pw_status.txt`
with any warnings/errors. When successful, lines will include
the host name, "complete", and "alive" status strings.
1. If instance debugging is needed, the `InstanceSSH.sh` script may be
used. Simply pass the name of the host you want to access. Every
instance should have a `setup.log` file in the `ec2-user` homedir. There
should also be `/private/tmp/<name>-worker.log` with entries from the
pool listener process.
1. To test CI changes against the test instance(s), push a PR that includes
`.cirrus.yml` changes to the task's `persistent_worker` dictionary's
`purpose` attribute. Set the value the same as the tag in step 1.
1. When you're done with all testing, terminate the instance. Then wait
a full 24-hours before "releasing" the dedicated host. Both operations
can be performed using the AWS EC2 WebUI. Please remember to do the
release step, as the $-clock continues to run while it's allocated.
Note: Instances are set to auto-terminate on shutdown. They should
self shutdown after 24-hours automatically. After termination for
any cause, there's about a 2-hour waiting period before a new instance
can be allocated. The `LaunchInstances.sh` script is able deal with this
properly.
## Script Debugging Hints
* On each MacOS instance:
* The pool listener process (running as the worker user) keeps a log under `/private/tmp`. The
file includes the registered name of the worker. For example, on MacM1-7 you would find `/private/tmp/MacM1-7-worker.log`.
This log shows tasks taken on, completed, and any errors reported back from Cirrus-CI internals.
* In the ec2-user's home directory is a `setup.log` file. This stores the output from executing
`setup.sh`. It also contains any warnings/errors from the (very important) `service_pool.sh` script - which should
_always_ be running in the background.
* There are several drop-files in the `ec2-user` home directory which are checked by `SetupInstances.sh`
to record state. If removed, along with `setup.log`, the script will re-execute (a possibly newer version of) `setup.sh`.
* On the management host:
* Automated operations are setup and run by `Cron.sh`, and logged to `Cron.log`. When running scripts manually, `Cron.sh`
can serve as a template for the intended order of operations.
* Critical operations are protected by a mandatory, exclusive file lock on `mac_pw_pool/Cron.sh`. Should
there be a deadlock, management of the pool (by `Cron.sh`) will stop. However the effects of this will not be observed
until workers begin hitting their lifetime and/or task limits.
* Without intervention, the `nightly_maintenance.sh` script will update the containers/automation repo clone on the
management VM. This happens if the repo becomes out of sync by more than 7 days (or as defined in the script).
When the repo is updated, the `pw_pool_web` container will be restarted. The container will also be restarted if its
found to not be running.

463
mac_pw_pool/SetupInstances.sh Executable file
View File

@ -0,0 +1,463 @@
#!/bin/bash
set -eo pipefail
# Script intended to be executed by humans (and eventually automation)
# to provision any/all accessible Cirrus-CI Persistent Worker instances
# as they become available. This is intended to operate independently
# from `LaunchInstances.sh` soas to "hide" the nearly 2-hours of cumulative
# startup and termination wait times. This script depends on:
#
# * All requirements listed in the top `LaunchInstances.sh` comment.
# * The $DHSTATE file created/updated by `LaunchInstances.sh`.
# * The $POOLTOKEN env. var. is defined
# * The local ssh-agent is able to supply the appropriate private key.
# shellcheck source-path=SCRIPTDIR
source $(dirname ${BASH_SOURCE[0]})/pw_lib.sh
# Update temporary-dir status file for instance $name
# status type $1 and value $2. Where status type is
# 'setup', 'listener', 'tasks', 'taskf' or 'comment'.
set_pw_status() {
[[ -n "$name" ]] || \
die "Expecting \$name to be set"
case $1 in
setup) ;;
listener) ;;
tasks) ;; # started
taskf) ;; # finished
ftasks) ;;
comment) ;;
*) die "Status type must be 'setup', 'listener', 'tasks', 'taskf' or 'comment'"
esac
if [[ "$1" != "comment" ]] && [[ -z "$2" ]]; then
die "Expecting comment text (status argument) to be non-empty."
fi
echo -n "$2" > $TEMPDIR/${name}.$1
}
# Wrapper around msg() and warn() which also set_pw_status() comment.
pwst_msg() { set_pw_status comment "$1"; msg "$1"; }
pwst_warn() { set_pw_status comment "$1"; warn "$1"; }
# Attempt to signal $SPOOL_SCRIPT to stop picking up new CI tasks but
# support PWPoolReady being reset to 'true' in the future to signal
# a new $SETUP_SCRIPT run. Cancel future $SHDWN_SCRIPT action.
# Requires both $pub_dns and $name are set
stop_listener(){
dbg "Attempting to stop pool listener and reset setup state"
$SSH ec2-user@$pub_dns rm -f \
"/private/tmp/${name}_cfg_*" \
"./.setup.done" \
"./.setup.started" \
"/var/tmp/shutdown.sh"
}
# Forcibly shutdown an instance immediately, printing warning and status
# comment from first argument. Requires $name, $instance_id, and $pub_dns
# to be set.
force_term(){
local varname
local termoutput
termoutput="$TEMPDIR/${name}_term.output"
local term_msg
term_msg="${1:-no inst_panic() message provided} Terminating immediately! $(ctx)"
for varname in name instance_id pub_dns; do
[[ -n "${!varname}" ]] || \
die "Expecting \$$varname to be set/non-empty."
done
# $SSH has built-in -n; ignore failure, inst may be in broken state already
echo "$term_msg" | ssh $SSH_ARGS ec2-user@$pub_dns sudo wall || true
# Set status and print warning message
pwst_warn "$term_msg"
# Instance is going to be terminated, immediately stop any attempts to
# restart listening for jobs. Ignore failure if unreachable for any reason -
# we/something else could have already started termination previously
stop_listener || true
# Termination can take a few minutes, block further use of instance immediately.
$AWS ec2 create-tags --resources $instance_id --tags "Key=PWPoolReady,Value=false" || true
# Prefer possibly recovering a broken pool over debug-ability.
if ! $AWS ec2 terminate-instances --instance-ids $instance_id &> "$termoutput"; then
# Possible if the instance recently/previously started termination process.
warn "Could not terminate instance $instance_id $(ctx 0):
$(<$termoutput)"
fi
}
# Set non-zero to enable debugging / prevent removal of temp. dir.
S_DEBUG="${S_DEBUG:0}"
if ((S_DEBUG)); then
X_DEBUG=1
warn "Debugging enabled - temp. dir will not be cleaned up '$TEMPDIR' $(ctx 0)."
trap EXIT
fi
[[ -n "$POOLTOKEN" ]] || \
die "Expecting \$POOLTOKEN to be defined/non-empty $(ctx 0)."
[[ -r "$DHSTATE" ]] || \
die "Can't read from state file: $DHSTATE"
if [[ -z "$SSH_AUTH_SOCK" ]] || [[ -z "$SSH_AGENT_PID" ]]; then
die "Cannot access an ssh-agent. Please run 'ssh-agent -s > /run/user/$UID/ssh-agent.env' and 'ssh-add /path/to/required/key'."
fi
declare -a _dhstate
readarray -t _dhstate <<<$(grep -E -v '^($|#+| +)' "$DHSTATE" | sort)
n_inst=0
n_inst_total="${#_dhstate[@]}"
if [[ -z "${_dhstate[*]}" ]] || ! ((n_inst_total)); then
msg "No operable hosts found in $DHSTATE:
$(<$DHSTATE)"
# Assume this script is running in a loop, and unf. there are
# simply no dedicated-hosts in 'available' state.
exit 0
fi
# N/B: Assumes $DHSTATE represents reality
msg "Operating on $n_inst_total instances from $(head -1 $DHSTATE)"
echo -e "# $(basename ${BASH_SOURCE[0]}) run $(date -u -Iseconds)\n#" > "$TEMPDIR/$(basename $PWSTATE)"
# Previous instance state needed for some optional checks
declare -a _pwstate
n_pw_total=0
if [[ -r "$PWSTATE" ]]; then
readarray -t _pwstate <<<$(grep -E -v '^($|#+| +)' "$PWSTATE" | sort)
n_pw_total="${#_pwstate[@]}"
# Handle single empty-item array
if [[ -z "${_pwstate[*]}" ]] || ! ((n_pw_total)); then
_pwstate=()
_n_pw_total=0
fi
fi
# Assuming the `--force` option was used to initialize a new pool of
# workers, then instances need to be configured with a staggered
# self-termination shutdown delay. This prevents all the instances
# from being terminated at the same time, potentially impacting
# CI usage.
runtime_hours_reduction=0
# shellcheck disable=SC2199
if [[ "$@" =~ --force ]]; then
warn "Forcing instance creation w/ staggered existence limits."
runtime_hours_reduction=$CREATE_STAGGER_HOURS
fi
for _dhentry in "${_dhstate[@]}"; do
read -r name instance_id launch_time junk<<<"$_dhentry"
_I=" "
msg " "
n_inst=$(($n_inst+1))
msg "Working on Instance #$n_inst/$n_inst_total '$name' with ID '$instance_id'."
# Clear buffers used for updating status files
n_started_tasks=0
n_finished_tasks=0
instoutput="$TEMPDIR/${name}_inst.output"
ncoutput="$TEMPDIR/${name}_nc.output"
logoutput="$TEMPDIR/${name}_log.output"
# Most operations below 'continue' looping on error. Ensure status files match.
set_pw_status tasks 0
set_pw_status taskf 0
set_pw_status setup error
set_pw_status listener error
set_pw_status comment ""
if ! $AWS ec2 describe-instances --instance-ids $instance_id &> "$instoutput"; then
pwst_warn "Could not query instance $instance_id $(ctx 0)."
continue
fi
dbg "Verifying required $DH_REQ_TAG=$DH_REQ_VAL"
tagq=".Reservations?[0]?.Instances?[0]?.Tags | map(select(.Key == \"$DH_REQ_TAG\")) | .[].Value"
if ! inst_tag=$(json_query "$tagq" "$instoutput"); then
pwst_warn "Could not look up instance $DH_REQ_TAG tag"
continue
fi
if [[ "$inst_tag" != "$DH_REQ_VAL" ]]; then
pwst_warn "Required inst. '$DH_REQ_TAG' tag != '$DH_REQ_VAL'"
continue
fi
dbg "Looking up instance name"
nameq='.Reservations?[0]?.Instances?[0]?.Tags | map(select(.Key == "Name")) | .[].Value'
if ! inst_name=$(json_query "$nameq" "$instoutput"); then
pwst_warn "Could not look up instance Name tag"
continue
fi
if [[ "$inst_name" != "$name" ]]; then
pwst_warn "Inst. name '$inst_name' != DH name '$name'"
continue
fi
dbg "Looking up public DNS"
if ! pub_dns=$(json_query '.Reservations?[0]?.Instances?[0]?.PublicDnsName?' "$instoutput"); then
pwst_warn "Could not lookup of public DNS for instance $instance_id $(ctx 0)"
continue
fi
# It's really important that instances have a defined and risk-relative
# short lifespan. Multiple mechanisms are in place to assist, but none
# are perfect. Ensure instances running for an excessive time are forcefully
# terminated as soon as possible from this script.
launch_epoch=$(date -u -d "$launch_time" +%s)
now_epoch=$(date -u +%s)
age_sec=$((now_epoch-launch_epoch))
hard_max_sec=$((PW_MAX_HOURS*60*60*2)) # double PW_MAX_HOURS
dbg "launch_epoch=$launch_epoch"
dbg " now_epoch=$now_epoch"
dbg " age_sec=$age_sec"
dbg "hard_max_sec=$hard_max_sec"
# Soft time limit is enforced via 'sleep $PW_MAX_HOURS && shutdown' started during instance setup (below).
msg "Instance alive for $((age_sec/60/60)) hours (soft max: $PW_MAX_HOURS hard: $((hard_max_sec/60/60)))"
if [[ $age_sec -gt $hard_max_sec ]]; then
force_term "Excess instance lifetime; $(((age_sec - hard_max_sec)/60))m past hard max limit."
continue
elif [[ $age_sec -gt $((PW_MAX_HOURS*60*60)) ]]; then
pwst_warn "Instance alive longer than soft max. Investigation recommended."
fi
dbg "Attempting to contact '$name' at $pub_dns"
if ! nc -z -w 13 $pub_dns 22 &> "$ncoutput"; then
pwst_warn "Could not connect to port 22 on '$pub_dns' $(ctx 0)."
continue
fi
if ! $SSH ec2-user@$pub_dns true; then
pwst_warn "Could not ssh to 'ec2-user@$pub_dns' $(ctx 0)."
continue
fi
dbg "Check if instance should be managed"
if ! PWPoolReady=$(json_query '.Reservations?[0]?.Instances?[0]?.Tags? | map(select(.Key == "PWPoolReady")) | .[].Value' "$instoutput"); then
pwst_warn "Instance does not have a PWPoolReady tag"
PWPoolReady="absent"
fi
# Mechanism for a developer to manually debug operations w/o fear of new tasks or instance shutdown.
if [[ "$PWPoolReady" != "true" ]]; then
pwst_msg "Instance disabled via tag 'PWPoolReady' == '$PWPoolReady'."
set_pw_status setup disabled
set_pw_status listener disabled
(
set +e # All commands below are best-effort only!
dbg "Attempting to stop any pending shutdowns"
$SSH ec2-user@$pub_dns sudo pkill shutdown
stop_listener
dbg "Attempting to stop shutdown sleep "
$SSH ec2-user@$pub_dns pkill -u ec2-user -f "'bash -c sleep'"
if $SSH ec2-user@$pub_dns pgrep -u ec2-user -f service_pool.sh; then
sleep 10s # Allow service_pool to exit gracefully
fi
# N/B: This will not stop any currently running CI tasks.
dbg "Guarantee pool listener is dead"
$SSH ec2-user@$pub_dns sudo pkill -u ${name}-worker -f "'cirrus worker run'"
)
continue
fi
if ! $SSH ec2-user@$pub_dns test -r .setup.done; then
if ! $SSH ec2-user@$pub_dns test -r .setup.started; then
if $SSH ec2-user@$pub_dns test -r setup.log; then
# Can be caused by operator flipping PWPoolReady value on instance for debugging
pwst_warn "Setup log found, prior executions may have failed $(ctx 0)."
fi
pwst_msg "Setting up new instance"
# Ensure bash used for consistency && some ssh commands below
# don't play nicely with zsh.
$SSH ec2-user@$pub_dns sudo chsh -s /bin/bash ec2-user &> /dev/null
if ! $SCP $SETUP_SCRIPT $SPOOL_SCRIPT $SHDWN_SCRIPT ec2-user@$pub_dns:/var/tmp/; then
pwst_warn "Could not scp scripts to instance $(ctx 0)."
continue # try again next loop
fi
if ! $SCP $CIENV_SCRIPT ec2-user@$pub_dns:./; then
pwst_warn "Could not scp CI Env. script to instance $(ctx 0)."
continue # try again next loop
fi
if ! $SSH ec2-user@$pub_dns chmod +x "/var/tmp/*.sh" "./ci_env.sh"; then
pwst_warn "Could not chmod scripts $(ctx 0)."
continue # try again next loop
fi
# Keep runtime_hours_reduction w/in sensible, positive bounds.
if [[ $runtime_hours_reduction -ge $((PW_MAX_HOURS - CREATE_STAGGER_HOURS)) ]]; then
runtime_hours_reduction=$CREATE_STAGGER_HOURS
fi
shutdown_seconds=$((60*60*PW_MAX_HOURS - 60*60*runtime_hours_reduction))
[[ $shutdown_seconds -gt $((60*60*CREATE_STAGGER_HOURS)) ]] || \
die "Detected unacceptably short \$shutdown_seconds ($shutdown_seconds) value."
pwst_msg "Starting automatic instance recycling in $((shutdown_seconds/60/60)) hours"
# Darwin is really weird WRT active terminals and the shutdown
# command. Instead of installing a future shutdown, stick an
# immediate shutdown at the end of a long sleep. This is the
# simplest workaround I could find :S
# Darwin sleep only accepts seconds.
if ! $SSH ec2-user@$pub_dns bash -c \
"'sleep $shutdown_seconds && /var/tmp/shutdown.sh' </dev/null >>setup.log 2>&1 &"; then
pwst_warn "Could not start automatic instance recycling."
continue # try again next loop
fi
pwst_msg "Executing setup script."
# Run setup script in background b/c it takes ~10m to complete.
# N/B: This drops .setup.started and eventually (hopefully) .setup.done
if ! $SSH ec2-user@$pub_dns \
env POOLTOKEN=$POOLTOKEN \
bash -c "'/var/tmp/setup.sh $DH_REQ_TAG:\ $DH_REQ_VAL' </dev/null >>setup.log 2>&1 &"; then
# This is critical, no easy way to determine what broke.
force_term "Failed to start background setup script"
continue
fi
msg "Setup script started."
set_pw_status setup started
# No sense in incrementing if there was a failure running setup
# shellcheck disable=SC2199
if [[ "$@" =~ --force ]]; then
runtime_hours_reduction=$((runtime_hours_reduction + CREATE_STAGGER_HOURS))
fi
# Let setup run in the background
continue
fi
# Setup started in previous loop. Set to epoch on error.
since_timestamp=$($SSH ec2-user@$pub_dns tail -1 .setup.started || echo "@0")
since_epoch=$(date -u -d "$since_timestamp" +%s)
running_seconds=$((now_epoch-since_epoch))
# Be helpful to human monitors, show the last few lines from the log to help
# track progress and/or any errors/warnings.
pwst_msg "Setup incomplete; Running for $((running_seconds/60)) minutes (~10 typical)"
msg "setup.log tail: $($SSH ec2-user@$pub_dns tail -n 1 setup.log)"
if [[ $running_seconds -gt $SETUP_MAX_SECONDS ]]; then
force_term "Setup running for ${running_seconds}s, max ${SETUP_MAX_SECONDS}s."
fi
continue
fi
dbg "Instance setup has completed"
set_pw_status setup complete
# Spawned by setup.sh
dbg "Checking service_pool.sh script"
if ! $SSH ec2-user@$pub_dns pgrep -u ec2-user -q -f service_pool.sh; then
# This should not happen at this stage; Nefarious or uncontrolled activity?
force_term "Pool servicing script (service_pool.sh) is not running."
continue
fi
dbg "Checking cirrus listener"
state_fault=0
if ! $SSH ec2-user@$pub_dns pgrep -u "${name}-worker" -q -f "'cirrus worker run'"; then
# Don't try to examine prior state if there was none.
if ((n_pw_total)); then
for _pwentry in "${_pwstate[@]}"; do
read -r _name _setup_state _listener_state _tasks _taskf _junk <<<"$_pwentry"
dbg "Examining pw_state.txt entry '$_name' with listener state '$_listener_state'"
if [[ "$_name" == "$name" ]] && [[ "$_listener_state" != "alive" ]]; then
# service_pool.sh did not restart listener since last loop
# and node is not in maintenance mode (PWPoolReady == 'true')
force_term "Pool listener '$_listener_state' state fault."
state_fault=1
break
fi
done
fi
# The instance is in the process of shutting-down/terminating, move on to next instance.
if ((state_fault)); then
continue
fi
# Previous state didn't exist, or listener status was 'alive'.
# Process may have simply crashed, allow service_pool.sh time to restart it.
pwst_warn "Cirrus worker listener process NOT running, will recheck again $(ctx 0)."
# service_pool.sh should catch this and restart the listener. If not, the next time
# through this loop will force_term() the instance.
set_pw_status listener dead # service_pool.sh should restart listener
continue
else
set_pw_status listener alive
fi
dbg "Checking worker log"
logpath="/private/tmp/${name}-worker.log" # set in setup.sh
if ! $SSH ec2-user@$pub_dns cat "'$logpath'" &> "$logoutput"; then
# The "${name}-worker" user has write access to this log
force_term "Missing worker log $logpath."
continue
fi
dbg "Checking worker registration"
# First lines of log should always match this
if ! head -10 "$logoutput" | grep -q 'worker successfully registered'; then
# This could signal log manipulation by worker user, or it could be harmless.
pwst_warn "Missing registration log entry"
fi
# The CI user has write-access to this log file on the instance,
# make this known to humans in case they care.
n_started_tasks=$(grep -Ei 'started task [0-9]+' "$logoutput" | wc -l) || true
n_finished_tasks=$(grep -Ei 'task [0-9]+ completed' "$logoutput" | wc -l) || true
set_pw_status tasks $n_started_tasks
set_pw_status taskf $n_finished_tasks
msg "Apparent tasks started/finished/running: $n_started_tasks $n_finished_tasks $((n_started_tasks-n_finished_tasks)) (max $PW_MAX_TASKS)"
dbg "Checking apparent task limit"
# N/B: This is only enforced based on the _previous_ run of this script worker-count.
# Doing this on the _current_ alive worker count would add a lot of complexity.
if [[ "$n_finished_tasks" -gt $PW_MAX_TASKS ]] && [[ $n_pw_total -gt $PW_MIN_ALIVE ]]; then
# N/B: Termination based on _finished_ tasks, so if a task happens to be currently running
# it will very likely have _just_ started in the last few seconds. Cirrus will retry
# automatically on another worker.
force_term "Instance exceeded $PW_MAX_TASKS apparent tasks."
elif [[ $n_pw_total -le $PW_MIN_ALIVE ]]; then
pwst_warn "Not enforcing max-tasks limit, only $n_pw_total workers online last run."
fi
done
_I=""
msg " "
msg "Processing all persistent worker states."
for _dhentry in "${_dhstate[@]}"; do
read -r name otherstuff<<<"$_dhentry"
_f1=$name
_f2=$(<$TEMPDIR/${name}.setup)
_f3=$(<$TEMPDIR/${name}.listener)
_f4=$(<$TEMPDIR/${name}.tasks)
_f5=$(<$TEMPDIR/${name}.taskf)
_f6=$(<$TEMPDIR/${name}.comment)
[[ -z "$_f6" ]] || _f6=" # $_f6"
printf '%s %s %s %s %s%s\n' \
"$_f1" "$_f2" "$_f3" "$_f4" "$_f5" "$_f6" >> "$TEMPDIR/$(basename $PWSTATE)"
done
dbg "Creating/updating state file"
if [[ -r "$PWSTATE" ]]; then
cp "$PWSTATE" "${PWSTATE}~"
fi
mv "$TEMPDIR/$(basename $PWSTATE)" "$PWSTATE"

View File

@ -0,0 +1,32 @@
# Intended to be run like: `gnuplot -p -c Utilization.gnuplot`
# Requires a file named `utilization.csv` produced by commands
# in `Cron.sh`.
#
# Format Ref: http://gnuplot.info/docs_5.5/Overview.html
set terminal png enhanced rounded size 1400,800 nocrop
set output 'html/utilization.png'
set title "Persistent Workers & Utilization"
set xdata time
set timefmt "%Y-%m-%dT%H:%M:%S+00:00"
set xtics nomirror rotate timedate
set xlabel "time/date"
set xrange [(system("date -u -Iseconds -d '26 hours ago'")):(system("date -u -Iseconds"))]
set ylabel "Workers Online"
set ytics border nomirror numeric
# Not practical to lookup $DH_PFX from pw_lib.sh
set yrange [0:(system("grep -E '^[a-zA-Z0-9]+-[0-9]' dh_status.txt | wc -l") * 1.5)]
set y2label "Worker Utilization"
set y2tics border nomirror numeric
set y2range [0:100]
set datafile separator comma
set grid
plot 'utilization.csv' using 1:2 axis x1y1 title "Workers" pt 7 ps 2, \
'' using 1:((($3-$4)/$2)*100) axis x1y2 title "Utilization" with lines lw 2

50
mac_pw_pool/ci_env.sh Executable file
View File

@ -0,0 +1,50 @@
#!/bin/bash
# This script drops the caller into a bash shell inside an environment
# substantially similar to a Cirrus-CI task running on this host.
# The envars below may require adjustment to better fit them to
# current/ongoing development in podman's .cirrus.yml
set -eo pipefail
# Not running as the pool worker user
if [[ "$USER" == "ec2-user" ]]; then
PWINST=$(curl -sSLf http://instance-data/latest/meta-data/tags/instance/Name)
PWUSER=$PWINST-worker
if [[ ! -d "/Users/$PWUSER" ]]; then
echo "Warnin: Instance hasn't been setup. Assuming caller will tend to this."
sudo sysadminctl -addUser $PWUSER
fi
sudo install -o $PWUSER "${BASH_SOURCE[0]}" "/Users/$PWUSER/"
exec sudo su -c "/Users/$PWUSER/$(basename ${BASH_SOURCE[0]})" - $PWUSER
fi
# Export all CI-critical envars defined below
set -a
CIRRUS_SHELL="/bin/bash"
CIRRUS_TASK_ID="0123456789"
CIRRUS_WORKING_DIR="$HOME/ci/task-${CIRRUS_TASK_ID}"
GOPATH="$CIRRUS_WORKING_DIR/.go"
GOCACHE="$CIRRUS_WORKING_DIR/.go/cache"
GOENV="$CIRRUS_WORKING_DIR/.go/support"
CONTAINERS_MACHINE_PROVIDER="applehv"
MACHINE_IMAGE="https://fedorapeople.org/groups/podman/testing/applehv/arm64/fedora-coreos-38.20230925.dev.0-applehv.aarch64.raw.gz"
GINKGO_TAGS="remote exclude_graphdriver_btrfs btrfs_noversion exclude_graphdriver_devicemapper containers_image_openpgp remote"
DEBUG_MACHINE="1"
ORIGINAL_HOME="$HOME"
HOME="$HOME/ci"
TMPDIR="/private/tmp/ci"
mkdir -p "$TMPDIR" "$CIRRUS_WORKING_DIR"
# Drop caller into the CI-like environment
cd "$CIRRUS_WORKING_DIR"
bash -il

View File

@ -0,0 +1,20 @@
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Cirrus-CI Persistent Workers</title>
</head>
<body>
<center>
<a href="https://cirrus-ci.com/pool/1cf8c7f7d7db0b56aecd89759721d2e710778c523a8c91c7c3aaee5b15b48d05">
<img src="utilization.png">
</a>
<p>
<h3>
<a href="https://docs.google.com/document/d/1PX6UyqDDq8S72Ko9qe_K3zoV2XZNRQjGxPiWEkFmQQ4/edit">
Documentation
</a>
</h3>
</center>
</body>
</html>

View File

@ -0,0 +1,69 @@
#!/bin/bash
set -euo pipefail
cd $(dirname "${BASH_SOURCE[0]}")
SCRIPTNAME="$(basename ${BASH_SOURCE[0]})"
WEB_IMG="docker.io/library/nginx:latest"
CRONLOG="Cron.log"
CRONSCRIPT="Cron.sh"
KEEP_LINES=10000
REFRESH_REPO_EVERY=7 # days
# Do not use, these are needed to control script execution.
_CNTNAME=pw_pool_web
_FLOCKER="${_FLOCKER:-notlocked}"
_RESTARTED_SCRIPT="${_RESTARTED_SCRIPT:-0}"
if [[ ! -r "$CRONLOG" ]] || [[ ! -r "$CRONSCRIPT" ]] || [[ ! -d "../.git" ]]; then
echo "ERROR: $SCRIPTNAME not executing from correct directory" >> /dev/stderr
exit 1
fi
relaunch_web_container() {
# Assume code change or image update, restart container.
(
# Prevent podman and/or sub-processes from inheriting the lock FD.
# This would deadlock all future runs of this script or Cron.sh
# Can't use `flock --close ...` here because it "hangs" in this context.
for fd_nr in $(/bin/ls /proc/self/fd/); do
[[ $fd_nr -ge 3 ]] || \
continue
# Bash doesn't allow direct substitution of the FD number
eval "exec $fd_nr>&-"
done
set -x
podman run --replace --name "$_CNTNAME" -d --rm --pull=newer -p 8080:80 \
-v $HOME/devel/automation/mac_pw_pool/html:/usr/share/nginx/html:ro,Z \
$WEB_IMG
)
echo "$SCRIPTNAME restarted pw_poolweb container"
}
# Don't perform maintenance while $CRONSCRIPT is running
[[ "${_FLOCKER}" != "$CRONSCRIPT" ]] && exec env _FLOCKER="$CRONSCRIPT" flock -e -w 300 "$CRONSCRIPT" "$0" "$@" || :
echo "$SCRIPTNAME running at $(date -u -Iseconds)"
if ! ((_RESTARTED_SCRIPT)); then
today=$(date -u +%d)
if ((today%REFRESH_REPO_EVERY)); then
git remote update && git reset --hard origin/main
# maintain the same flock
echo "$SCRIPTNAME updatedd code after $REFRESH_REPO_EVERY days, restarting script..."
env _RESTARTED_SCRIPT=1 _FLOCKER=$_FLOCKER "$0" "$@"
exit $? # all done
fi
fi
tail -n $KEEP_LINES $CRONLOG > ${CRONLOG}.tmp && mv ${CRONLOG}.tmp $CRONLOG
echo "$SCRIPTNAME rotated log"
# Always restart web-container when code changes, otherwise only if required
if ((_RESTARTED_SCRIPT)); then
relaunch_web_container
else
podman container exists "$_CNTNAME" || relaunch_web_container
fi

126
mac_pw_pool/pw_lib.sh Normal file
View File

@ -0,0 +1,126 @@
# This library is intended to be sourced by other scripts inside this
# directory. All other usage contexts may lead to unintended outcomes.
# only the IDs differ. Assumes the sourcing script defines a `dbg()`
# function.
SCRIPT_FILENAME=$(basename "$0") # N/B: Caller's arg0, not this library file path.
SCRIPT_DIRPATH=$(dirname "$0")
LIB_DIRPATH=$(dirname "${BASH_SOURCE[0]}")
REPO_DIRPATH=$(realpath "$LIB_DIRPATH/../")
TEMPDIR=$(mktemp -d -p '' "${SCRIPT_FILENAME}_XXXXX.tmp")
trap "rm -rf '$TEMPDIR'" EXIT
# Dedicated host name prefix; Actual name will have a "-<X>" (number) appended.
# N/B: ${DH_PFX}-<X> _MUST_ match dedicated host names as listed in dh_status.txt
# using the regex ^[a-zA-Z0-9]+-[0-9] (see Utilization.gnuplot)
DH_PFX="MacM1"
# Only manage dedicated hosts with the following tag & value
DH_REQ_TAG="purpose"
DH_REQ_VAL="prod"
# Path to file recording the most recent state of each dedicated host.
# Format is simply one line per dedicated host, with it's name, instance id, start
# date/time separated by a space. Exceptional conditions are recorded as comments
# with the name and details. File is refreshed/overwritten each time script runs
# without any fatal/uncaught command-errors. Intended for reference by humans
# and/or other tooling.
DHSTATE="${PWSTATE:-$LIB_DIRPATH/dh_status.txt}"
# Similar to $DHSTATE but records the status of each instance. Format is
# instance name, setup status, listener status, # started tasks, # finished tasks,
# or the word 'error' indicating a fault accessing the remote worker logfile.
# Optionally, there may be a final comment field, beginning with a # and text
# suggesting where there may be a fault.
# Possible status field values are as follows:
# setup - started, complete, disabled, error
# listener - alive, dead, disabled, error
PWSTATE="${PWSTATE:-$LIB_DIRPATH/pw_status.txt}"
# At maximum possible creation-speed, there's aprox. 2-hours of time between
# an instance going down, until another can be up and running again. Since
# instances are all on shutdown/terminated on pre-set timers, it would hurt
# pool availability if multiple instances all went down at the same time.
# Therefore, host and instance creations will be staggered by according
# to this interval.
CREATE_STAGGER_HOURS=2
# Instance shutdown controls (assumes terminate-on-shutdown behavior)
PW_MAX_HOURS=24 # Since successful configuration
PW_MAX_TASKS=24 # Logged by listener (N/B: Log can be manipulated by tasks!)
PW_MIN_ALIVE=3 # Bypass enforcement of $PW_MAX_TASKS if <= alive/operating workers
# How long to wait for setup.sh to finish running (drop a .setup.done file)
# before forcibly terminating.
SETUP_MAX_SECONDS=2400 # Typical time ~10 minutes, use 2x safety-factor.
# Name of launch template. Current/default version will be used.
# https://us-east-1.console.aws.amazon.com/ec2/home?region=us-east-1#LaunchTemplates:
TEMPLATE_NAME="${TEMPLATE_NAME:-Cirrus${DH_PFX}PWinstance}"
# Path to scripts to copy/execute on Darwin instances
SETUP_SCRIPT="$LIB_DIRPATH/setup.sh"
SPOOL_SCRIPT="$LIB_DIRPATH/service_pool.sh"
SHDWN_SCRIPT="$LIB_DIRPATH/shutdown.sh"
CIENV_SCRIPT="$LIB_DIRPATH/ci_env.sh"
# Set to 1 to enable debugging
X_DEBUG="${X_DEBUG:-0}"
# AWS CLI command and general args
AWS="aws --no-paginate --output=json --color=off --no-cli-pager --no-cli-auto-prompt"
# Common ssh/scp arguments
SSH_ARGS="-o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o CheckHostIP=no -F /dev/null -o LogLevel=ERROR -o ConnectTimeout=13"
# ssh/scp commands to run w/ arguments
SSH="${SSH:-ssh -n $SSH_ARGS}" # N/B: default nulls stdin
SCP="${SCP:-scp -q $SSH_ARGS}"
# Indentation to prefix msg/warn/die messages with to assist humans understanding context.
_I="${_I:-}"
# Print details $1 (defaults to 1) calls above the caller in the stack.
# usage e.x. $(ctx 0) - print details about current function
# $(ctx) - print details about current function's caller
# $(ctx 2) - print details about current functions's caller's caller.
ctx() {
local above level
above=${1:-1}
level=$((1+$above))
script=$(basename ${BASH_SOURCE[$level]})
echo "($script:${FUNCNAME[$level]}():${BASH_LINENO[$above]})"
}
msg() { echo "${_I}${1:-No text message provided}"; }
warn() { echo "${1:-No warning message provided}" | awk -e '{print "'"${_I}"'WARNING: "$0}' >> /dev/stderr; }
die() { echo "${1:-No error message provided}" | awk -e '{print "'"${_I}"'ERROR: "$0}' >> /dev/stderr; exit 1; }
dbg() {
if ((X_DEBUG)); then
msg "${1:-No debug message provided} $(ctx 1)" | awk -e '{print "'"${_I}"'DEBUG: "$0}' >> /dev/stderr
fi
}
# Obtain a JSON string value by running the provided query filter (arg 1) on
# JSON file (arg 2). Return non-zero on jq error (1), or if value is empty
# or null (2). Otherwise print value and return 0.
jq_errf="$TEMPDIR/jq_error.output"
json_query() {
local value
local indent=" "
dbg "jq filter $1
$indent on $(basename $2) $(ctx)"
if ! value=$(jq -r "$1" "$2" 2>"$jq_errf"); then
dbg "$indent error: $(<$jq_errf)"
return 1
fi
if [[ -z "$value" ]] || [[ "$value" == "null" ]]; then
dbg "$indent result: Empty or null"
return 2
fi
dbg "$indent result: '$value'"
echo "$value"
return 0
}

View File

@ -0,0 +1,79 @@
#!/bin/bash
# Launch Cirrus-CI PW Pool listener & manager process.
# Intended to be called once from setup.sh on M1 Macs.
# Expects configuration filepath to be passed as the first argument.
# Expects the number of hours until shutdown (and self-termination)
# as the second argument.
set -o pipefail
msg() { echo "##### ${1:-No message message provided}"; }
die() { echo "ERROR: ${1:-No error message provided}"; exit 1; }
for varname in PWCFG PWUSER PWREADYURL PWREADY; do
varval="${!varname}"
[[ -n "$varval" ]] || \
die "Env. var. \$$varname is unset/empty."
done
[[ "$USER" == "ec2-user" ]] || \
die "Expecting to execute as 'ec2-user'."
# All operations assume this CWD
cd $HOME
# For whatever reason, when this script is run through ssh, the default
# environment isn't loaded automatically.
. /etc/profile
# This can be leftover under certain conditions
# shellcheck disable=SC2154
sudo pkill -u $PWUSER -f "cirrus worker run" || true
# Configuring a launchd agent to run the worker process is a major
# PITA and seems to require rebooting the instance. Work around
# this with a really hacky loop masquerading as a system service.
# envar exported to us
# shellcheck disable=SC2154
while [[ -r $PWCFG ]] && [[ "$PWREADY" == "true" ]]; do # Remove file or change tag to shutdown this "service"
# The $PWUSER has access to kill it's own listener, or it could crash.
if ! pgrep -u $PWUSER -f -q "cirrus worker run"; then
# FIXME: CI Tasks will execute as $PWUSER and ordinarily would have
# read access to $PWCFG file containing $POOLTOKEN. While not
# disastrous, it's desirable to not leak potentially sensitive
# values. Work around this by keeping the file unreadable by
# $PWUSER except for a brief period while starting up.
sudo chmod 0644 $PWCFG
msg "$(date -u -Iseconds) Starting PW pool listener as $PWUSER"
# This is intended for user's setup.log
# shellcheck disable=SC2024
sudo su -l $PWUSER -c "/opt/homebrew/bin/cirrus worker run --file $PWCFG &" >>setup.log 2>&1 &
sleep 10 # eek!
sudo chmod 0600 $PWCFG
fi
# This can fail on occasion for some reason
# envar exported to us
# shellcheck disable=SC2154
if ! PWREADY=$(curl -sSLf $PWREADYURL); then
PWREADY="recheck"
fi
# Avoid re-launch busy-wait
sleep 10
# Second-chance
if [[ "$PWREADY" == "recheck" ]] && ! PWREADY=$(curl -sSLf $PWREADYURL); then
msg "Failed twice to obtain PWPoolReady instance tag. Disabling listener."
rm -f "$PWCFG"
break
fi
done
set +e
msg "Configuration file not readable; PWPoolReady tag '$PWREADY'."
msg "Terminating $PWUSER PW pool listner process"
# N/B: This will _not_ stop the cirrus agent (i.e. a running task)
sudo pkill -u $PWUSER -f "cirrus worker run"

251
mac_pw_pool/setup.sh Normal file
View File

@ -0,0 +1,251 @@
#!/bin/bash
# Setup and launch Cirrus-CI PW Pool node. It must be called
# with the env. var. `$POOLTOKEN` set. It is assumed to be
# running on a fresh AWS EC2 mac2.metal instance as `ec2-user`
# The instance must have both "metadata" and "Allow tags in
# metadata" options enabled. The instance must set the
# "terminate" option for "shutdown behavior".
#
# This script should be called with a single argument string,
# of the label YAML to configure. For example "purpose: prod"
#
# N/B: Under special circumstances, this script (possibly with modifications)
# can be executed more than once. All operations which modify state/config.
# must be wrapped in conditional checks.
set -eo pipefail
GVPROXY_RELEASE_URL="https://github.com/containers/gvisor-tap-vsock/releases/latest/download/gvproxy-darwin"
STARTED_FILE="$HOME/.setup.started"
COMPLETION_FILE="$HOME/.setup.done"
# Ref: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html
PWNAME=$(curl -sSLf http://instance-data/latest/meta-data/tags/instance/Name)
PWREADYURL="http://instance-data/latest/meta-data/tags/instance/PWPoolReady"
PWREADY=$(curl -sSLf $PWREADYURL)
PWUSER=$PWNAME-worker
rm -f /private/tmp/*_cfg_*
PWCFG=$(mktemp /private/tmp/${PWNAME}_cfg_XXXXXXXX)
PWLOG="/private/tmp/${PWUSER}.log"
msg() { echo "##### ${1:-No message message provided}"; }
die() { echo "ERROR: ${1:-No error message provided}"; exit 1; }
die_if_empty() {
local tagname
tagname="$1"
[[ -n "$tagname" ]] || \
die "Unexpectedly empty instance '$tagname' tag, is metadata tag access enabled?"
}
[[ -n "$POOLTOKEN" ]] || \
die "Must be called with non-empty \$POOLTOKEN set."
[[ "$#" -ge 1 ]] || \
die "Must be called with a 'label: value' string argument"
echo "$1" | grep -i -q -E '^[a-z0-9]+:[ ]?[a-z0-9]+' || \
die "First argument must be a string in the format 'name: value'. Not: '$1'"
msg "Configuring pool worker for '$1' tasks."
[[ ! -r "$COMPLETION_FILE" ]] || \
die "Appears setup script already ran at '$(cat $COMPLETION_FILE)'"
[[ "$USER" == "ec2-user" ]] || \
die "Expecting to execute as 'ec2-user'."
die_if_empty PWNAME
die_if_empty PWREADY
[[ "$PWREADY" == "true" ]] || \
die "Found PWPoolReady tag not set 'true', aborting setup."
# All operations assume this CWD
cd $HOME
# Checked by instance launch script to monitor setup status & progress
msg $(date -u -Iseconds | tee "$STARTED_FILE")
msg "Configuring paths"
grep -q homebrew /etc/paths || \
echo -e "/opt/homebrew/bin\n/opt/homebrew/opt/coreutils/libexec/gnubin\n$(cat /etc/paths)" \
| sudo tee /etc/paths > /dev/null
# For whatever reason, when this script is run through ssh, the default
# environment isn't loaded automatically.
. /etc/profile
msg "Installing podman-machine, testing, and CI deps. (~5-10m install time)"
if [[ ! -x /usr/local/bin/gvproxy ]]; then
declare -a brew_taps
declare -a brew_formulas
brew_taps=(
# Required to use upstream vfkit
cfergeau/crc
# Required to use upstream krunkit
slp/krunkit
)
brew_formulas=(
# Necessary for worker-pool participation + task execution
cirruslabs/cli/cirrus
# Necessary for building podman|buildah|skopeo
go go-md2man coreutils pkg-config pstree gpgme
# Necessary to compress the podman repo tar
zstd
# Necessary for testing podman-machine
vfkit
# Necessary for podman-machine libkrun CI testing
krunkit
)
# msg() includes a ##### prefix, ensure this text is simply
# associated with the prior msg() output.
echo " Adding taps[] ${brew_taps[*]}"
echo " before installing formulas[] ${brew_formulas[*]}"
for brew_tap in "${brew_taps[@]}"; do
brew tap $brew_tap
done
brew install "${brew_formulas[@]}"
# Normally gvproxy is installed along with "podman" brew. CI Tasks
# on this instance will be running from source builds, so gvproxy must
# be install from upstream release.
curl -sSLfO "$GVPROXY_RELEASE_URL"
sudo install -o root -g staff -m 0755 gvproxy-darwin /usr/local/bin/gvproxy
rm gvproxy-darwin
fi
msg "Setting up hostname"
# Make host easier to identify from CI logs (default is some
# random internal EC2 dns name).
if [[ "$(uname -n)" != "$PWNAME" ]]; then
sudo hostname $PWNAME
sudo scutil --set HostName $PWNAME
sudo scutil --set ComputerName $PWNAME
fi
msg "Adding/Configuring PW User"
if ! id "$PWUSER" &> /dev/null; then
sudo sysadminctl -addUser $PWUSER
fi
msg "Setting up local storage volume for PW User"
if ! mount | grep -q "$PWUSER"; then
# User can't remove own pre-existing homedir crap during cleanup
sudo rm -rf /Users/$PWUSER/*
sudo rm -rf /Users/$PWUSER/.??*
# This is really clunky, but seems the best that Apple Inc. can support.
# Show what is being worked with to assist debugging
diskutil list virtual
local_storage_volume=$(diskutil list virtual | \
grep -m 1 -B 5 "InternalDisk" | \
grep -m 1 -E '^/dev/disk[0-9].+synthesized' | \
awk '{print $1}')
(
set -x
# Fail hard if $local_storage_volume is invalid, otherwise show details to assist debugging
diskutil info "$local_storage_volume"
# CI $TEMPDIR - critical for podman-machine storage performance
ci_tempdir="/private/tmp/ci"
mkdir -p "$ci_tempdir"
sudo diskutil apfs addVolume "$local_storage_volume" APFS "ci_tempdir" -mountpoint "$ci_tempdir"
sudo chown $PWUSER:staff "$ci_tempdir"
sudo chmod 1770 "$ci_tempdir"
# CI-user's $HOME - not critical but might as well make it fast while we're
# adding filesystems anyway.
ci_homedir="/Users/$PWUSER"
sudo diskutil apfs addVolume "$local_storage_volume" APFS "ci_homedir" -mountpoint "$ci_homedir"
sudo chown $PWUSER:staff "$ci_homedir"
sudo chmod 0750 "$ci_homedir"
df -h
)
# Disk indexing is useless on a CI system, and creates un-deletable
# files whereever $TEMPDIR happens to be pointing. Ignore any
# individual volume failures that have an unknown state.
sudo mdutil -a -i off || true
# User likely has pre-existing system processes trying to use
# the (now) over-mounted home directory.
sudo pkill -u $PWUSER || true
fi
msg "Setting up Rosetta"
# Rosetta 2 enables arm64 Mac to use Intel Apps. Only install if not present.
if ! arch -arch x86_64 /usr/bin/uname -m; then
sudo softwareupdate --install-rosetta --agree-to-license
echo -n "Confirming rosetta is functional"
if ! arch -arch x86_64 /usr/bin/uname -m; then
die "Rosetta installed but non-functional, see setup log for details."
fi
fi
msg "Restricting appstore/software install to admin-only"
# Abuse the symlink existance as a condition for running `sudo defaults write ...`
# since checking the state of those values is complex.
if [[ ! -L /usr/local/bin/softwareupdate ]]; then
# Ref: https://developer.apple.com/documentation/devicemanagement/softwareupdate
sudo defaults write com.apple.SoftwareUpdate restrict-software-update-require-admin-to-install -bool true
sudo defaults write com.apple.appstore restrict-store-require-admin-to-install -bool true
# Unf. interacting with the rosetta installer seems to bypass both of the
# above settings, even when run as a regular non-admin user. However, it's
# also desireable to limit use of the utility in a CI environment generally.
# Since /usr/sbin is read-only, but /usr/local is read-write and appears first
# in $PATH, deploy a really fragile hack as an imperfect workaround.
sudo ln -sf /usr/bin/false /usr/local/bin/softwareupdate
fi
# FIXME: Semi-secret POOLTOKEN value should not be in this file.
# ref: https://github.com/cirruslabs/cirrus-cli/discussions/662
cat << EOF | sudo tee $PWCFG > /dev/null
---
name: "$PWNAME"
token: "$POOLTOKEN"
labels:
$1
log:
file: "${PWLOG}"
security:
allowed-isolations:
none: {}
EOF
sudo chown ${USER}:staff $PWCFG
# Monitored by instance launch script
echo "# Log created $(date -u -Iseconds) - do not manually remove or modify!" > $PWLOG
sudo chown ${USER}:staff $PWLOG
sudo chmod g+rw $PWLOG
if ! pgrep -q -f service_pool.sh; then
# Allow service_pool.sh access to these values
export PWCFG
export PWUSER
export PWREADYURL
export PWREADY
msg "Spawning listener supervisor process."
/var/tmp/service_pool.sh </dev/null >>setup.log 2>&1 &
disown %-1
else
msg "Warning: Listener supervisor already running"
fi
# Monitored by instance launch script
date -u -Iseconds >> "$COMPLETION_FILE"

38
mac_pw_pool/shutdown.sh Normal file
View File

@ -0,0 +1,38 @@
#!/bin/bash
# Script intended to be called by automation only.
# Should never be called from any other context.
# Log on the off-chance it somehow helps somebody debug something one day
(
echo "Starting ${BASH_SOURCE[0]} at $(date -u -Iseconds)"
PWNAME=$(uname -n)
PWUSER=$PWNAME-worker
if id -u "$PWUSER" &> /dev/null; then
# Try to not reboot while a CI task is running.
# Cirrus-CI imposes a hard-timeout of 2-hours.
now=$(date -u +%s)
timeout_at=$((now+60*60*2))
echo "Waiting up to 2 hours for any pre-existing cirrus agent (i.e. running task)"
while pgrep -u $PWUSER -q -f "cirrus-ci-agent"; do
if [[ $(date -u +%s) -gt $timeout_at ]]; then
echo "Timeout waiting for cirrus-ci-agent to terminate"
break
fi
echo "Found cirrus-ci-agent still running, waiting..."
sleep 60
done
fi
echo "Initiating shutdown at $(date -u -Iseconds)"
# This script is run with a sleep in front of it
# as a workaround for darwin's shutdown-command
# terminal weirdness.
sudo shutdown -h now "Automatic instance recycling"
) < /dev/null >> setup.log 2>&1

View File

@ -4,21 +4,20 @@ Validate this file before commiting with (from repository root):
podman run -it \
-v ./renovate/defaults.json5:/usr/src/app/renovate.json5:z \
docker.io/renovate/renovate:latest \
ghcr.io/renovatebot/renovate:latest \
renovate-config-validator
and/or use the pre-commit hook: https://github.com/renovatebot/pre-commit-hooks
*/
{
"$schema": "https://docs.renovatebot.com/renovate-schema.json",
"description": "This is a basic preset intended\
for reuse to reduce the amount of boiler-plate\
configuration that otherwise would need to be\
duplicated. It should be referenced from other\
repositories renovate config under the 'extends'\
section as:\
github>containers/automation//renovate/defaults.json5\
section as: github>containers/automation//renovate/defaults.json5\
(optionally with a '#X.Y.Z' version-tag suffix).",
/*************************************************
@ -34,7 +33,7 @@ Validate this file before commiting with (from repository root):
":gitSignOff",
// Always rebase dep. update PRs from `main` when PR is stale
":rebaseStalePrs",
":rebaseStalePrs"
],
// The default setting is ambiguous, explicitly base schedules on UTC
@ -49,6 +48,7 @@ Validate this file before commiting with (from repository root):
// Default setting is an "empty" schedule. Explicitly set this
// such that security-alert PRs may be opened immediately.
"vulnerabilityAlerts": {
// Distinguish PRs from regular dependency updates
"labels": ["dependencies", "security"],
// Force-enable renovate management of deps. which are otherwise
@ -57,16 +57,13 @@ Validate this file before commiting with (from repository root):
// (last-match wins rule).
"enabled": true,
// Indirect dependencies are disabled by default for the `gomod` manager.
// However, for vulnerability updates we may want them even if they break
// during renovate's automatic top-level `go mod tidy`.
"packageRules": [
{
"matchManagers": ["gomod"],
"matchDepTypes": ["indirect"],
"enabled": true,
}
]
// Note: As of 2024-06-25 indirect golang dependency handling is
// broken in Renovate, and disabled by default. This affects
// vulnerabilityAlerts in that if the dep is 'indirect' no PR
// will ever open, it must be handled manually. Attempting
// to enable indirect deps (for golang) in this section will
// not work, it will always be overriden by the global golang
// indirect dep. setting.
},
// On a busy repo, automatic-rebasing will swamp the CI system.
@ -78,8 +75,12 @@ Validate this file before commiting with (from repository root):
***** Manager-specific configuration options *****
**************************************************/
"regexManagers": [
"customManagers": [
// Track the latest CI VM images by tag on the containers/automation_images
// repo. Propose updates when newer tag available compared to what is
// referenced in a repo's .cirrus.yml file.
{
"customType": "regex",
"fileMatch": "^.cirrus.yml$",
// Expected veresion format: c<automation_images IMG_SFX value>
// For example `c20230120t152650z-f37f36u2204`
@ -87,28 +88,49 @@ Validate this file before commiting with (from repository root):
"depNameTemplate": "containers/automation_images",
"datasourceTemplate": "github-tags",
"versioningTemplate": "loose",
"autoReplaceStringTemplate": "c{{{newVersion}}}",
"autoReplaceStringTemplate": "c{{{newVersion}}}"
},
// For skopeo and podman, manage the golangci-lint version as
// referenced in their Makefile.
{
"customType": "regex",
"fileMatch": "^Makefile$",
// make ignores whitespace around the value, make renovate do the same.
"matchStrings": ["GOLANGCI_LINT_VERSION\\s+:=\\s+(?<currentValue>.+)\\s*"],
"matchStrings": [
"GOLANGCI_LINT_VERSION\\s+:=\\s+(?<currentValue>.+)\\s*"
],
"depNameTemplate": "golangci/golangci-lint",
"datasourceTemplate": "github-releases",
"versioningTemplate": "semver-coerced",
// Podman's installer script will puke if there's a 'v' prefix, as represented
// in upstream golangci/golangci-lint releases.
"extractVersionTemplate": "v(?<version>.+)",
"extractVersionTemplate": "v(?<version>.+)"
}
],
/*************************************************
***** Language-specific configuration options ****
**************************************************/
// ***** ATTENTION WARNING CAUTION DANGER ***** //
// Go versions 1.21 and later will AUTO-UPDATE based on _module_
// _requirements_. ref: https://go.dev/doc/toolchain Because
// many different projects covered by this config, build under
// different distros and distro-versions, golang version consistency
// is desireable across build outputs. In golang 1.21 and later,
// it's possible to pin the version in each project using the
// toolchain go.mod directive. This should be done to prevent
// unwanted auto-updates.
// Ref: Upstream discussion https://github.com/golang/go/issues/65847
"constraints": {"go": "1.23"},
// N/B: LAST MATCHING RULE WINS, match statems are ANDed together.
// https://docs.renovatebot.com/configuration-options/#packagerules
"packageRules": [
/*************************************************
***** Rust-specific configuration options *****
*************************************************/
****** Rust-specific configuration options *******
**************************************************/
{
"matchCategories": ["rust"],
// Update both Cargo.toml and Cargo.lock when possible
@ -124,12 +146,12 @@ Validate this file before commiting with (from repository root):
"rangeStrategy": "bump",
// These packages roll updates far too often, slow them down.
// Ref: https://github.com/containers/netavark/issues/772
"schedule": ["after 1am and before 11am on the first day of the month"],
"schedule": ["after 1am and before 11am on the first day of the month"]
},
/*************************************************
***** Python-specific configuration options *****
*************************************************/
****** Python-specific configuration options *****
**************************************************/
{
"matchCategories": ["python"],
// Preserve (but continue to upgrade) any existing SemVer ranges.
@ -137,23 +159,17 @@ Validate this file before commiting with (from repository root):
},
/*************************************************
***** Golang-specific configuration options *****
*************************************************/
****** Golang-specific configuration options *****
**************************************************/
{
"matchCategories": ["golang"],
// disabled by default, safe to enable since "tidy" enforced by CI.
"postUpdateOptions": ["gomodTidy"],
// In case a version in use is retracted, allow going backwards.
// N/B: This is NOT compatible with pseudo versions, see below.
"rollbackPrs": false,
// Preserve (but continue to upgrade) any existing SemVer ranges.
"rangeStrategy": "replace",
// N/B: LAST MATCHING RULE WINS
// https://docs.renovatebot.com/configuration-options/#packagerules
"rangeStrategy": "replace"
},
// Golang pseudo-version packages will spam with every Commit ID change.
@ -161,7 +177,7 @@ Validate this file before commiting with (from repository root):
{
"matchCategories": ["golang"],
"matchUpdateTypes": ["digest"],
"schedule": ["after 1am and before 11am on the first day of the month"],
"schedule": ["after 1am and before 11am on the first day of the month"]
},
// Package version retraction (https://go.dev/ref/mod#go-mod-file-retract)
@ -174,6 +190,17 @@ Validate this file before commiting with (from repository root):
"allowedVersions": "!/v((1.0.0)|(1.0.1))$/"
},
// Skip updating the go.mod toolchain directive, humans will manage this.
{
"matchCategories": ["golang"],
"matchDepTypes": ["toolchain"],
"enabled": false
},
/*************************************************
************ CI configuration options ************
**************************************************/
// Github-action updates cannot consistently be tested in a PR.
// This is caused by an unfixable architecture-flaw: Execution
// context always depends on trigger, and we (obvious) can't know
@ -190,19 +217,13 @@ Validate this file before commiting with (from repository root):
// example, flagging an important TODO or FIXME item. Or, where CI VM
// images are split across multiple IMG_SFX values that all need to be updated.
{
"matchManagers": ["regex"],
"matchFileNames": [".cirrus.yml"], // full-path exact-match
"matchManagers": ["custom.regex"],
"matchFileNames": [".cirrus.yml"],
"groupName": "CI VM Image",
// Somebody(s) need to check image update PRs as soon as they open.
"reviewers": ["cevich"],
"reviewers": ["Luap99"],
// Don't wait, roll out CI VM Updates immediately
"schedule": ["at any time"],
"schedule": ["at any time"]
},
// Add CI:DOCS prefix to skip unnecessary tests for golangci updates in podman CI.
{
"matchPackageNames": ["golangci/golangci-lint"],
"commitMessagePrefix": "[CI:DOCS]",
},
],
]
}