1) Was calling Wait() on the child process, but also calling Wait4,
which would race, causing an occasional error or panic.
2) In testing (1), I observed occasional hangs. Tracing it down to a
SIGWINCH, which masked a SIGCHLD, causing it to hang. Both seem fixed.
Added a manual test script.
It specifies a HTTP URL which will return username&password which will
be used to authenticate access to the git repo.
This is mainly used for git repo accecpt dynamic password (for example
oauth bare token). Because the dynamic password might expire very soon,
so it's added to the main syncRepo loop.
Typical usage case is work with a sidecar called gce-node-auth on GKE,
it uses the GCE service account's oauth token as password to access
Cloud Source Repo.
Please see the repo below for how it worked.
https://github.com/cydu-cloud/gce-node-auth/blob/master/git-sync-with-gce-node-auth.yaml
This detects when it is running as pid 1, and becomes an init process.
Specifically this means handling SIGCHLD and reaping processes
(otherwise they become zombies) and forwarding signals to "real"
process.
We fork and re-exec ourselves so that we only get *this* SIGCHLD for
orphaned processes (re-parented to 1) and not the real events from
running things like git or ssh.
Old code used to exit at any error seen on first sync attempt. This
didn't prove useful in practice, so removing that special case.
This may make git-sync slower to recover after user fixes a
non-retryable error, as now flMaxSyncFailures are needed before the pod
fails. It may make sense in practice.
Fixes#161, in a different way than is proposed in PR #162.
* Create a git-sync user to run as, with an entry in /etc/passwd and
writable homedir
* Remove our own validation of key perms - let SSH do that.
* Update docs,