Merge pull request #1451 from dashpole/memcg

Propose solution to make memory cgroup events effective.
2018-03-08 10:48:55 -08:00 · 2018-03-08 10:48:55 -08:00 · 9bf594e91a
parent 17885f2747 f8494e93bc
commit 9bf594e91a
1 changed files with 12 additions and 6 deletions
--- a/contributors/design-proposals/node/kubelet-eviction.md
+++ b/contributors/design-proposals/node/kubelet-eviction.md
@ -191,6 +191,18 @@ signal.  If that signal is observed as being satisfied for longer than the
 specified period, the `kubelet` will initiate eviction to attempt to
 reclaim the resource that has met its eviction threshold.

+### Memory CGroup Notifications
+
+When the `kubelet` is started with `--experimental-kernel-memcg-notification=true`, 
+it will use cgroup events on the memory.usage_in_bytes file in order to trigger the eviction manager.
+With the addition of on-demand metrics, this permits the `kubelet` to trigger the eviction manager,
+collect metrics, and respond with evictions much quicker than using the sync loop alone.
+
+To do this, we periodically adjust the memory cgroup threshold based on total_inactive_file.  The eviction manager 
+periodically measures total_inactive_file, and sets the threshold for usage_in_bytes to mem_capacity - eviction_hard + 
+total_inactive_file.  This means that the threshold is crossed when usage_in_bytes - total_inactive_file 
+= mem_capacity - eviction_hard.
+
 ### Disk

 Let's assume the operator started the `kubelet` with the following:
@ -457,9 +469,3 @@ In general, it should be strongly recommended that `DaemonSet` not
 create `BestEffort` pods to avoid being identified as a candidate pod
 for eviction. Instead `DaemonSet` should ideally include Guaranteed pods only.

-## Known issues
-
-### kubelet may evict more pods than needed
-
-The pod eviction may evict more pods than needed due to stats collection timing gap. This can be mitigated by adding
-the ability to get root container stats on an on-demand basis (https://github.com/google/cadvisor/issues/1247) in the future.