This means slightly more typing in "zero-value" cases (`nil` vs `dockerfile.Metadata{}`), but the tradeoff is that it's simpler to use and reason about (and all the struct members are pointer-type map/slice values anyhow, so copying the struct is still pretty cheap).
This also swaps the scanner error handling to return the partially parsed Metadata object alongside the scanner error -- the error already tells us the object isn't fully complete data, so it's fair/fine to return and will likely just be ignored by the caller instead. This also allows us to get to 100% code coverage. 👀
This also updates our "treat `oci-import` just like `FROM scratch`" code to *actually* parse `FROM scratch` so we can't accidentally cause "missing data" bugs there in the future, and I implemented that using `sync.OnceValues` which requires upgrading to Go 1.21, but IMO that's a worthwhile tradeoff (because `sync.OnceValues` makes that code so clean/simple).
This also finally adds `bashbrew context` as an explicit subcommand so that issues with this code are easier to test/debug (so we can generate the actual tarball and compare it to previous versions of it, versions generated by `git archive`, etc).
As-is, this currently generates verbatim identical checksums to 0cde8de57d/sources.sh (L90-L96) (by design). We'll wait to do any cache bust there until we implement `Dockerfile`/context filtering:
```console
$ bashbrew cat varnish:stable --format '{{ .TagEntry.GitCommit }} {{ .TagEntry.Directory }}'
0c295b528f28a98650fb2580eab6d34b30b165c4 stable/debian
$ git -C "$BASHBREW_CACHE/git" archive 0c295b528f28a98650fb2580eab6d34b30b165c4:stable/debian/ | ./tar-scrubber | sha256sum
3aef5ac859b23d65dfe5e9f2a47750e9a32852222829cfba762a870c1473fad6
$ bashbrew cat --format '{{ .ArchGitChecksum arch .TagEntry }}' varnish:stable
3aef5ac859b23d65dfe5e9f2a47750e9a32852222829cfba762a870c1473fad6
```
(Choosing `varnish:stable` there because it currently has [some 100% valid dangling symlinks](6b1c6ffedc/stable/debian/scripts) that tripped up my code beautifully 💕)
From a performance perspective (which was the original reason for looking into / implementing this), running the `meta-scripts/sources.sh` script against `--all` vs this, my local system gets ~18.5m vs ~4.5m (faster being this new pure-Go implementation).
In the case of base images (`debian`, `alpine`, `ubuntu`, etc), using a `Dockerfile` as our method of ingestion doesn't really buy us very much. It made sense at the time it was implemented ("all `Dockerfile`, all the time"), but at this point they're all some variation on `FROM scratch \n ADD foo.tar.xz / \n CMD ["/bin/some-shell"]`, and cannot reasonably be "rebuilt" when their base image changes (which is one of the key functions of the official images) since they _are_ the base images in question.
Functionally, consuming a tarball in this way isn't _that_ much different from consuming a raw tarball that's part of, say, an OCI image layout (https://github.com/opencontainers/image-spec/blob/v1.0.2/image-layout.md) -- it's some tarball plus some metadata about what to do with it.
For less trivial images, there's a significant difference (and I'm not proposing to use this for anything beyond simple one-layer base images), but for a single layer this would be basically identical.
As a more specific use case, the Debian `rootfs.tar.xz` files are currently [100% reproducible](https://github.com/debuerreotype/debuerreotype). Unfortunately, some of that gets lost when it gets imported into Docker, and thus it takes some additional effort to get from the Docker-generated rootfs back to the original debuerreotype-generated file.
This adds the ability to consume an OCI image directly, to go even further and have a 100% fully reproducible image digest as well, which makes it easier to trace a given published image back to the reproducible source generated by the upstream tooling (especially if a given image is also pushed by the maintainer elsewhere).
Here's an example `oci-debian` file I was using for testing this:
Maintainers: Foo (@bar)
GitRepo: https://github.com/tianon/docker-debian-artifacts.git
GitFetch: refs/heads/oci-arm32v5
Architectures: arm32v5
GitCommit: d6ac440e7760b6b16e3d3da6f2b56736b9c10065
Builder: oci-import
File: index.json
Tags: bullseye, bullseye-20221114, 11.5, 11, latest
Directory: bullseye/oci
Tags: bullseye-slim, bullseye-20221114-slim, 11.5-slim, 11-slim
Directory: bullseye/slim/oci
This command will, given a remote image reference, look up the list of platforms from it and match them to supported bashbrew architectures (providing content descriptors for each).
Also, refactor registry code to be more correct: previously, this couldn't fetch from Docker without `DOCKERHUB_PUBLIC_PROXY` (see `registry-1.docker.io` change) and was ignoring content digests. Now it works correctly with or without `DOCKERHUB_PUBLIC_PROXY`, verifies the size of every object it pulls, verifies the digest, _and_ should continue working with the in-progress Moby containerd-integration (where the local image ID becomes the digest of the manifest or index instead of the digest of the config blob as it is today).
This time, they are distinct implementations because the problem they are solving is inherently different.
For listing children of a given name, we *have* to walk the entire library (since we only have tag -> FROM mappings, not the reverse, which is fundamentally the question that "children" answers).
On the flip side, listing the parents of a given name is as straightforward as looking up the FROM values and walking until we can't anymore.
In my own testing, these new implementations are significantly more correct, and handle more edge cases (including things we couldn't support before like `bashbrew children --depth=1 scratch`, `bashbrew children mcr.microsoft.com/windows/servercore`, etc).
They also more correctly handle edge cases like tags that are `FROM` a "`SharedTag`" such that they don't walk up/down both sides of the tree (for example, `orientdb:3.2` -> `FROM eclipse-temurin:8-jdk`, which is both Windows *and* Linux, even though `orientdb:3.2` is Linux-only).