Now that we're running in parallel, sometimes AWS eventual consistency
causes us problems. We now retry up to 3 times, sleeping 10 seconds in
between each run even when we aren't making progress.
If there is an error performing a task, we will reattempt it as long as
forward progress is still being made (i.e. at least one other task
completed successfully)
This makes everything more reliable (though we should still fix these
problems), but it also lays the groundwork for parallel execution.