To simplify deployment of the log validator, this allows wildcards
(using go's filepath.Glob) to be included in the file paths.
In order to detect new files, a new background goroutine polls the glob
patterns every minute for matches.
Because the "monitor" function is running in its own goroutine, a lock
is needed to ensure it's not trying to add new tailers while shutdown is
happening.
The main function in log-validator is overly big, so this refactors it
in preparation for adding support for file globs, which is started in
the (currently draft) #7134
This PR should be functionally a no-op, except for one change:
The 1-per-second ratelimiter is moved to be per-file, so it's
1-per-second-per-file. I think this more closely aligns with what we'd
want, as we could potentially miss that a file had bad lines in it if it
was overwhelmed by another log file at the same time.
For "ordinary" errors like "file not found" for some part of the config,
we would prefer to log an error and exit without logging about a panic
and printing a stack trace.
To achieve that, we want to call `defer AuditPanic()` once, at the top
of `cmd/boulder`'s main. That's so early that we haven't yet parsed the
config, which means we haven't yet initialized a logger. We compromise:
`AuditPanic` now calls `log.Get()`, which will retrieve the configured
logger if one has been set up, or will create a default one (which logs
to stderr/stdout).
AuditPanic and Fail/FailOnError now cooperate: Fail/FailOnError panic
with a special type, and AuditPanic checks for that type and prints a
simple message before exiting when it's present.
This PR also coincidentally fixes a bug: panicking didn't previously
cause the program to exit with nonzero status, because it recovered the
panic but then did not explicitly exit nonzero.
Fixes#6933
Previously, we allocated a buffer of 8192 bytes to write the stack trace
into. Presumably the intent was to set the _capacity_ to 8192 bytes but
leave the length at 0:
make([]byte, 0, 8192)
However, the code as written set the length to 8192 as well. This meant
that any time we logged a stack trace from a panic, if the stack trace
was less than 8192 bytes, we'd additionally log a bunch of zero bytes.
Setting the capacity was premature optimization anyhow.
Fixes#6931
When the "integration" build tag is set, reduce the stdout prefix to
just a short timestamp, log level, and process name. The other details
(e.g. date, datacenter, and hostname) are not relevant in CI, and only
serve to clutter the logs.
Part of https://github.com/letsencrypt/boulder/issues/6890
This moves command() from cmd/ into core.Command() and uses it from the
log and main package, ensuring we have a single implementation of
path.Base(os.Args[0]) instead of scattering that method around. I put it
in core/util.go as similar enough functions like the BuildID and
revision info already live there, though I am not entirely sure it's the
right place.
This came up in @aarongable's review of #6750, where he pointed out I
was manually hardcoding commands instead of using command() or similar.
The Debug, Info, Warn, and Err log levels are now helper functions that
call their respective <LogLevel>f function. Each <LogLevel>f function
takes a variadic amount of format specifiers to pass to auditAtLevel or
logAtLevel which in turn will perform a nil check on the variadic input
parameter 'a' and call/ignore an Sprintf accordingly.
Fixes#6131
- Log counts of Authzs where CAA was rechecked/reused.
- Move the CAA recheck duration to a single variable in the RA.
- Add new method `InfoObject` to our logger.
Fixes#6560
Part of #6623
Explicitly inform go vet about the names of our logging methods
which should be checked in the same way as fmt.Printf is. Although
go vet can often find such functions on its own, it can't find these
ones because log.Logger is an interface, not a struct.
In addition, fix several format string mistakes caught by go vet.
This uncovered a bug! The stdout logger was truncating the microseconds part
of its timestamp if the last digit was zero. Fixed that. Also coerced the
stdout logger to use UTC.
To run the checker over our integration test logs, I changed t.sh to use
an explicit name for the container that runs boulder during the tests,
and pulled logs from that container after the tests.
Previously, the reloader relied on user code to log errors and changes.
But this led to gaps (for instance the RA not logging reloads of the rate
limit file). Instead, make the reloader responsible for this.
This allowed removing the error callback, at the minor cost of removing
the status gauge for the ECDSA allowlist.
Honeycomb was emitting logs directly to stderr like this:
```
WARN: Missing API Key.
WARN: Dataset is ignored in favor of service name. Data will be sent to service name: boulder
```
Fix this by providing a fake API key and replacing "dataset" with "serviceName" in configs. Also add missing Honeycomb configs for crl-updater.
For stdout-only logger, include checksums and escape newlines.
Debug and Info messages still go to stdout.
Fix the CAA integration test, which asserted that stderr should be empty
when caa-log-checker finds a problem. That used to be the case because
we never logged to stderr, but now it is the case.
Update the logging docs.
Fixes#6324
Right now, Boulder expects to be able to connect to syslog, and panics
if it's not available. We'd like to be able to log to stdout/stderr as a
replacement for syslog.
- Add a detailed timestamp (down to microseconds, same as we collect in
prod via syslog).
- Remove the escape codes for colorizing output.
- Report the severity level numerically rather than with a letter prefix.
Add locking for stdout/stderr and syslog logs. Neither the [syslog] package
nor the [os] package document concurrency-safety, and the Go rule is: if
it's not documented to be concurrent-safe, it's not. Notably the [log.Logger]
package is documented to be concurrent-safe, and a look at its implementation
shows it uses a Mutex internally.
Remove places that use the singleton `blog.Get()`, and instead pass through
a logger from main in all the places that need it.
[syslog]: https://pkg.go.dev/log/syslog
[os]: https://pkg.go.dev/os
[log.Logger]: https://pkg.go.dev/log#Logger
- Fix cert-checker to use the syslog and stdout logging facilities it
reads from the config file instead of having them hard-coded to zero.
- Fix cert-checker to handle a nil response from mariadb if no records
are found.
- Fix comment in log.go to correctly describe when the initialize function
and therefore default values would be used.
Fixes#6067
We have decided that we don't like the if err := call(); err != nil
syntax, because it creates confusing scopes, but we have not cleaned up
all existing instances of that syntax. However, we have now found a
case where that syntax enables a bug: It caused readers to believe that
a later err = call() statement was assigning to an already-declared err
in the local scope, when in fact it was assigning to an
already-declared err in the parent scope of a closure. This caused our
ineffassign and staticcheck linters to be unable to analyze the
lifetime of the err variable, and so they did not complain when we
never checked the actual value of that error.
This change standardizes on the two-line error checking syntax
everywhere, so that we can more easily ensure that our linters are
correctly analyzing all error assignments.
If we can't write to stdout we prefer to panic immediately rather than
potentially lose logs we capture from redirecting stdout as a syslog backup.
A unit test is included to verify the panic behaviour. Prior to the `log` diff
in this branch the test failed because the non-nil `err` result from
`fmt.Printf` was being away:
```
=== RUN TestStdoutFailure
=== PAUSE TestStdoutFailure
=== CONT TestStdoutFailure
FAIL github.com/letsencrypt/boulder/log 0.011s
FAIL
```
After the `log` package diff in this branch is applied the test passes.
I additionally tested this end-to-end by redirecting stdout to a full
filesystem volume mounted into the Boulder docker image. It provoked the
expected panic when a component tried to write to stdout and the filesystem was
full.
Adds a CRC32-IEEE checksum to our log lines. At most this adds 8 bytes per line, and at least adds 2 bytes. Given this a relatively minor change I haven't bothered flagging it, although if we have anything in place that assumes the current structure of log lines we may want to add a flag in order to prevent immediate breakage before things can be altered.
Fixes#4474.
A very large number of the logger calls are of the form log.Function(fmt.Sprintf(...)).
Rather than sprinkling fmt.Sprintf at every logger call site, provide formatting versions
of the logger functions and call these directly with the format and arguments.
While here remove some unnecessary trailing newlines and calls to String/Error.
Use `t.Helper` and `t.Fatalf` instead of our own versions.
Remove some unused or single-user helpers.
Make the output of `AssetUnmarshaledEquals` clearer by showing one line per field.
In `log/log.go` when we fail to write to syslog we log the message to STDERR but without a newline. This causes all of the STDERR logs (when we reconnect to syslog) to be delivered where they go to be on a single line which causes some debugging pain.
This removes the config and code to output to statsd.
- Change `cmd.StatsAndLogging` to output a `Scope`, not a `Statter`.
- Remove the prefixing of component name (e.g. "VA") in front of stats; this was stripped by `autoProm` but now no longer needs to be.
- Delete vendored statsd client.
- Delete `MockStatter` (generated by gomock) and `mocks.Statter` (hand generated) in favor of mocking `metrics.Scope`, which is the interface we now use everywhere.
- Remove a few unused methods on `metrics.Scope`, and update its generated mock.
- Refactor `autoProm` and add `autoRegisterer`, which can be included in a `metrics.Scope`, avoiding global state. `autoProm` now registers everything with the `prometheus.Registerer` it is given.
- Change va_test.go's `setup()` to not return a stats object; instead the individual tests that care about stats override `va.stats` directly.
Fixes#2639, #2733.
* Fix all errcheck errors
* Add errcheck to test.sh
* Add a new sa.Rollback method to make handling errors in rollbacks easier.
This also causes a behavior change in the VA. If a HTTP connection is
abruptly closed after serving the headers for a non-200 response, the
reported error will be the read failure instead of the non-200.
- Remove error signatures from log methods. This means fewer places where errcheck will show ignored errors.
- Pull in latest cfssl to be compatible with errorless log messages.
- Reduce the number of message priorities we support to just those we actually use.
- AuditNotice -> AuditInfo
- Remove InfoObject (only one use, switched to Info)
- Remove EmergencyExit and related functions in favor of panic
- Remove SyslogWriter / AuditLogger separate types in favor of a single interface, Logger, that has all the logging methods on it.
- Merge mock log into logger. This allows us to unexport the internals but still override them in the mock.
- Shorten names to be compatible with Go style: New, Set, Get, Logger, NewMock, etc.
- Use a shorter log format for stdout logs.
- Remove "... Starting" log messages. We have better information in the "Versions" message logged at startup.
Motivation: The AuditLogger / SyslogWriter distinction was confusing and exposed internals only necessary for tests. Some components accepted one type and some accepted the other. This made it hard to consistently use mock loggers in tests. Also, the unnecessarily fat interface for AuditLogger made it hard to meaningfully mock out.
* Change panics in audit logger to log.Fatalf.
These errors can happen if the syslog apparatus is unavailable, so are not that
out of the ordinary. log.Fatalf produces less noise and makes it easier to see
what's wrong.
* Fix imports.
`godoc` associates the docs with the function or type listed after
the word "Example"; in this case "Errors" doesn't match anything in
github.com/letsencrypt/boulder/log, so the example doesn't appear in the
generated godoc.
Changes the function name to ExampleAuditLogger so it shows up under the
AuditLogger type.
Here's a screenshot:
https://monosnap.com/file/iXHqZHlnAjXnXPkAWUR5J1jMggOHLQ.png
This won't affect prod, where we use syslog. But for dev and test, where we log
to stdout, the colors may help us find errors and warnings in the logs.
This also adds the log level as an explicit part of the stdout logging.
Consolidate initialization of stats and logging from each main.go into cmd
package.
Define a new config parameter, `StdoutLevel`, that determines the maximum log
level that will be printed to stdout. It can be set to 6 to inhibit debug
messages, or 0 to print only emergency messages, or -1 to print no messages at
all.
Remove the existing config parameter `Tag`. Instead, choose the tag from the
basename of the currently running process. Previously all Boulder log messages
had the tag "boulder", but now they will be differentiated by process, like
"boulder-wfe".
Shorten the date format used in stdout logging, and add the current binary's
basename.
Consolidate setup function in audit-logger_test.go.
Note: Most CLI binaries now get their stats and logging from the parameters of
Action. However, a few of our binaries don't use our custom AppShell, and
instead use codegangsta/cli directly. For those binaries, we export the new
StatsAndLogging method from cmd.
Fixes https://github.com/letsencrypt/boulder/issues/852
Found when poking at the new containerized Travis CI build with
@rolandshoemaker. This was causing strange panics far away from here
when code would try to log audits when there was no syslog available.