In my refactoring to use `go-git`'s `Tree` objects, I missed this edge case (that symlinks get resolved to be relative to the Git root, but our `Tree` object is a subdirectory).
This also finally adds `bashbrew context` as an explicit subcommand so that issues with this code are easier to test/debug (so we can generate the actual tarball and compare it to previous versions of it, versions generated by `git archive`, etc).
As-is, this currently generates verbatim identical checksums to 0cde8de57d/sources.sh (L90-L96) (by design). We'll wait to do any cache bust there until we implement `Dockerfile`/context filtering:
```console
$ bashbrew cat varnish:stable --format '{{ .TagEntry.GitCommit }} {{ .TagEntry.Directory }}'
0c295b528f28a98650fb2580eab6d34b30b165c4 stable/debian
$ git -C "$BASHBREW_CACHE/git" archive 0c295b528f28a98650fb2580eab6d34b30b165c4:stable/debian/ | ./tar-scrubber | sha256sum
3aef5ac859b23d65dfe5e9f2a47750e9a32852222829cfba762a870c1473fad6
$ bashbrew cat --format '{{ .ArchGitChecksum arch .TagEntry }}' varnish:stable
3aef5ac859b23d65dfe5e9f2a47750e9a32852222829cfba762a870c1473fad6
```
(Choosing `varnish:stable` there because it currently has [some 100% valid dangling symlinks](6b1c6ffedc/stable/debian/scripts) that tripped up my code beautifully 💕)
From a performance perspective (which was the original reason for looking into / implementing this), running the `meta-scripts/sources.sh` script against `--all` vs this, my local system gets ~18.5m vs ~4.5m (faster being this new pure-Go implementation).
Since Docker's image store can't represent these, we round trip them through our self-managed (or external) containerd image store, which also makes pushing more efficient.
In the case of base images (`debian`, `alpine`, `ubuntu`, etc), using a `Dockerfile` as our method of ingestion doesn't really buy us very much. It made sense at the time it was implemented ("all `Dockerfile`, all the time"), but at this point they're all some variation on `FROM scratch \n ADD foo.tar.xz / \n CMD ["/bin/some-shell"]`, and cannot reasonably be "rebuilt" when their base image changes (which is one of the key functions of the official images) since they _are_ the base images in question.
Functionally, consuming a tarball in this way isn't _that_ much different from consuming a raw tarball that's part of, say, an OCI image layout (https://github.com/opencontainers/image-spec/blob/v1.0.2/image-layout.md) -- it's some tarball plus some metadata about what to do with it.
For less trivial images, there's a significant difference (and I'm not proposing to use this for anything beyond simple one-layer base images), but for a single layer this would be basically identical.
As a more specific use case, the Debian `rootfs.tar.xz` files are currently [100% reproducible](https://github.com/debuerreotype/debuerreotype). Unfortunately, some of that gets lost when it gets imported into Docker, and thus it takes some additional effort to get from the Docker-generated rootfs back to the original debuerreotype-generated file.
This adds the ability to consume an OCI image directly, to go even further and have a 100% fully reproducible image digest as well, which makes it easier to trace a given published image back to the reproducible source generated by the upstream tooling (especially if a given image is also pushed by the maintainer elsewhere).
Here's an example `oci-debian` file I was using for testing this:
Maintainers: Foo (@bar)
GitRepo: https://github.com/tianon/docker-debian-artifacts.git
GitFetch: refs/heads/oci-arm32v5
Architectures: arm32v5
GitCommit: d6ac440e7760b6b16e3d3da6f2b56736b9c10065
Builder: oci-import
File: index.json
Tags: bullseye, bullseye-20221114, 11.5, 11, latest
Directory: bullseye/oci
Tags: bullseye-slim, bullseye-20221114-slim, 11.5-slim, 11-slim
Directory: bullseye/slim/oci