* Point optimizer to tf.keras.optimizer.legacy.Optimizer to be compatible with Keras optimizer migration
* small fix
* add version control
* small fix
* Update discriminative_layer_training.py
* fix version control
* small fix
* move optimizer class to __init__.py
* small fix
* fix problems
* small fix
* Rename BaseOptimizer to KerasLegacyOptimizer
* exclude keras optimizer from type check
* fix import
Calling _get_decay once per variable can greatly slow down kernel
launches as it contains control flows. By putting decay into apply state
we only need to calcuate it once per step. The same optimization
technicle is applied in keras optimizer for learning rate scheduler.
* Remove `sequential_update` from AverageWrapper
In TF2.0, sequential_update is redundant. This allows
the removal of `tf.control_dependencies` from
average_wrapper and its subclasses: moving_average
and stochastic_weight_averaging.
* Revert "Remove `sequential_update` from AverageWrapper"
This reverts commit 7cf4201d83.
* Remove `tf.control_dependencies` from AverageWrapper
Add deprecation warning for `sequential_update`.
* Set type of sequential_update to Optional[bool]
`sequential_update` is no longer part of the optimizer's
configuration. Loading an older configuration throws the
DeprecationWarning.
* black format
* optimizers typing
* minor
* missing checking
* consistent to others with scale_fn
* correction of scale_fn
* missing name
* remove unused import
* revision based on comments
* add typecheck
* using typing.Type
* Keep all files being worked on in pull requests.
* Two files failed to be parsed.
* Formatted with black all the files not being worked on at the moment.
* Added e231 rule.
* Create new AverageWrapper optimizer
* SWA and MovingAverage extend AverageWrapper and implement
`average_op(var, average_var)`
* Add support for BatchNorm
* Add fit_bn in optimizers.utils