For now, these emulate the support that `git archive` has for them (namely, that they present as empty directories, which are otherwise impossible to create/represent in Git).
In Go 1.24, there's a new `go vet` rule that complains about `Example*` test functions that don't follow the documented naming convention. Combine that with `go test` running `go vet` by default, and you've got a perfect storm.
This renames our "example" tests to satisfy Go's naming convention.
This means slightly more typing in "zero-value" cases (`nil` vs `dockerfile.Metadata{}`), but the tradeoff is that it's simpler to use and reason about (and all the struct members are pointer-type map/slice values anyhow, so copying the struct is still pretty cheap).
This also swaps the scanner error handling to return the partially parsed Metadata object alongside the scanner error -- the error already tells us the object isn't fully complete data, so it's fair/fine to return and will likely just be ignored by the caller instead. This also allows us to get to 100% code coverage. 👀
This also updates our "treat `oci-import` just like `FROM scratch`" code to *actually* parse `FROM scratch` so we can't accidentally cause "missing data" bugs there in the future, and I implemented that using `sync.OnceValues` which requires upgrading to Go 1.21, but IMO that's a worthwhile tradeoff (because `sync.OnceValues` makes that code so clean/simple).
There were some very minor/subtle bugs in how I implemented continuation that wouldn't affect any real-world parsing we did, but still bothered me because I'm me. This fixes them (and further increases test coverage as a result).
In my refactoring to use `go-git`'s `Tree` objects, I missed this edge case (that symlinks get resolved to be relative to the Git root, but our `Tree` object is a subdirectory).
This also finally adds `bashbrew context` as an explicit subcommand so that issues with this code are easier to test/debug (so we can generate the actual tarball and compare it to previous versions of it, versions generated by `git archive`, etc).
As-is, this currently generates verbatim identical checksums to 0cde8de57d/sources.sh (L90-L96) (by design). We'll wait to do any cache bust there until we implement `Dockerfile`/context filtering:
```console
$ bashbrew cat varnish:stable --format '{{ .TagEntry.GitCommit }} {{ .TagEntry.Directory }}'
0c295b528f28a98650fb2580eab6d34b30b165c4 stable/debian
$ git -C "$BASHBREW_CACHE/git" archive 0c295b528f28a98650fb2580eab6d34b30b165c4:stable/debian/ | ./tar-scrubber | sha256sum
3aef5ac859b23d65dfe5e9f2a47750e9a32852222829cfba762a870c1473fad6
$ bashbrew cat --format '{{ .ArchGitChecksum arch .TagEntry }}' varnish:stable
3aef5ac859b23d65dfe5e9f2a47750e9a32852222829cfba762a870c1473fad6
```
(Choosing `varnish:stable` there because it currently has [some 100% valid dangling symlinks](6b1c6ffedc/stable/debian/scripts) that tripped up my code beautifully 💕)
From a performance perspective (which was the original reason for looking into / implementing this), running the `meta-scripts/sources.sh` script against `--all` vs this, my local system gets ~18.5m vs ~4.5m (faster being this new pure-Go implementation).
Since Docker's image store can't represent these, we round trip them through our self-managed (or external) containerd image store, which also makes pushing more efficient.
In the case of base images (`debian`, `alpine`, `ubuntu`, etc), using a `Dockerfile` as our method of ingestion doesn't really buy us very much. It made sense at the time it was implemented ("all `Dockerfile`, all the time"), but at this point they're all some variation on `FROM scratch \n ADD foo.tar.xz / \n CMD ["/bin/some-shell"]`, and cannot reasonably be "rebuilt" when their base image changes (which is one of the key functions of the official images) since they _are_ the base images in question.
Functionally, consuming a tarball in this way isn't _that_ much different from consuming a raw tarball that's part of, say, an OCI image layout (https://github.com/opencontainers/image-spec/blob/v1.0.2/image-layout.md) -- it's some tarball plus some metadata about what to do with it.
For less trivial images, there's a significant difference (and I'm not proposing to use this for anything beyond simple one-layer base images), but for a single layer this would be basically identical.
As a more specific use case, the Debian `rootfs.tar.xz` files are currently [100% reproducible](https://github.com/debuerreotype/debuerreotype). Unfortunately, some of that gets lost when it gets imported into Docker, and thus it takes some additional effort to get from the Docker-generated rootfs back to the original debuerreotype-generated file.
This adds the ability to consume an OCI image directly, to go even further and have a 100% fully reproducible image digest as well, which makes it easier to trace a given published image back to the reproducible source generated by the upstream tooling (especially if a given image is also pushed by the maintainer elsewhere).
Here's an example `oci-debian` file I was using for testing this:
Maintainers: Foo (@bar)
GitRepo: https://github.com/tianon/docker-debian-artifacts.git
GitFetch: refs/heads/oci-arm32v5
Architectures: arm32v5
GitCommit: d6ac440e7760b6b16e3d3da6f2b56736b9c10065
Builder: oci-import
File: index.json
Tags: bullseye, bullseye-20221114, 11.5, 11, latest
Directory: bullseye/oci
Tags: bullseye-slim, bullseye-20221114-slim, 11.5-slim, 11-slim
Directory: bullseye/slim/oci
See https://golang.org/doc/go1.12#text/template, especially:
> If a user-defined function called by a template panics, the panic is now caught and returned as an error by the `Execute` or `ExecuteTemplate` method.