community/meeting-notes-2016.md at ecfd8bbf586d7bcd9767171c4055f2cf6502bb85

14 KiB

Raw Blame History

2016-12-15

Agenda:

Demo by Datadog (rescheduled)
Kubernetes Metric Conventions: https://docs.google.com/document/d/1YVs02Li6QFCg8Th2Wa4z1u2NBlQHDp2dj3EdAt6uskE/edit#
Resource metrics API: looking towards beta
- https://docs.google.com/document/d/1t0G7OS6OP9qPndkkNROCu0pF3-vkDmzonmT-6gEWcx0/edit?ts=5852bda8

Notes:

Put metric convention document somewhere visible for reference
- https://github.com/kubernetes/community/tree/master/contributors/devel
Resource metrics API should be moved towards beta
- To be finalized after holiday break
- Working towards beta in 1.7
Custom metrics API:
- https://github.com/kubernetes/community/pull/152/files

2016-12-08

Warning: This meeting will be about logging. If you are not interested please skip.

Agenda

Restart LogDir proposal (https://github.com/kubernetes/kubernetes/pull/13010)
Alternative https://github.com/kubernetes/kubernetes/pull/33111

Meeting notes: https://gist.github.com/leahnp/463501f6dfe39f6f21ea5d3ebcb787d7

2016-12-01

Agenda

Heapster needs your help
- [sross] Need to come up with map of sinks to maintainers
  - Maybe consider dropping sinks without mainters
- [sross] need statement of plans for Heapster
  - [sross] putting into maintenance mode, what does maintenance mode entail, should we continue accepting sinks?
  - [piosz] to write something up and send out
[mwringe] what is plan for timeline for monitoring pipeline work
- [piosz] plan is starting work Q2 2017, unless anyone else can help
  - [piosz] major missing component is discovery summarizer
  - [sross] we (Red Hat) are willing to help out in this area

[Cancelled] 2016-11-24: Thanksgiving in US

[Cancelled] 2016-11-17: no meeting week

[Cancelled] 2016-11-10: Kubecon

[Cancelled] 2016-11-03

2016-10-27

Agenda

F2f meeting about monitoring in Seattle during KubeCon (on Monday Nov 7th)

2016-10-20

Warning: This meeting will be about logging. If you are not interested please skip.

Agenda

f2f meeting about logging in Seattle during KubeCon (probably on Monday Nov 7th)
- There is going to be a kubernetes dev summit (Nov 10th) meeting for logging
Group administrivia: frequency? Length? Topics?
Current state of logging in Kubernetes
What’s going on with logging?

Notes

Developers Summit - 45 minute unconference topic on the future of logging

moderated by Vishnu and Patrick
open to anyone who is attending the Kubernetes Developers Conference

Discussion of Face to Face meeting - Piotr and Patrick to sync up offline

Frequency: every three weeks, going to skip next week/push back one week next meeting is during KubeCon Developers Summit.

There will be an announcement for exactly when the next meeting is

Logging Discussion Topics:

logging volumes (proposal started by David Cowden - https://docs.google.com/document/d/1K2hh7nQ9glYzGE-5J7oKBB7oK3S_MKqwCISXZK-sB2Q/edit#)
hot loop logging and verbosity for scalability issues.
- how to detect spammy instances
- how to not let this wreck the cluster
general dissatisfaction with the logging facility
structured logging kubernetes wide for consistent consumption
application log type detection
- what metadata do we need to carry through a logging pipeline to id a source system (e.g. mysql, user application)
- what do logging vendors need supplied to aid in this

Current logging pipelines

fluentd direct to GCP or ES
fluentd to kafka to fluentd to ES

Action Items

Piotr & Patrick to determine f2f details
Try and get logging vendors to join the SIG

[Cancelled] 2016-10-13

2016-10-06

Agenda

No response from sig api machinery (moving to next meeting)
Continue discussion on monitoring architecture
- Agreed to versioned, well-defined API
- Rest API vs. Query Language
- A webhook model was suggested for the APIs (like Auth in Kube today)
  - [sross] has concerns over discoverability of webhooks
  - Webhook vs API server is largely an implementation question
  - will decide on discovery vs webhook for consumption once we get the API design in place
- [sross] will propose an API design for the custom metrics API and historical metrics API
Discuss roadmap
- Discussed briefly, please go read afterwards
- [sross] to lead push on custom metrics design/implementation for 1.5
- 1.5 API features will be mainly implemented in terms of Heapster
looking forward for one-click install of 3rd party monitoring (possibly Prometheus, but as an out of the box, one command setup; possible choices for deployment: helm, kpm)
Logging discussion feasibility conversation (ie: is this a reasonable location for having discussions about logging)
- This may be a reasonable place for logging discussions, if we explicitly note which meetings will discuss logging (and/or when logging will be discussed)
- May also just want to create a separate SIG
- [decarr] mentioned CRI discussion on logging and metrics
  - Outcome was that we should sync with SIG node on that, but it should probably stay more in SIG node

2016-09-29

Agenda

Discuss Kubernetes monitoring architecture proposal *

Notes

Main metrics pipeline used by Kubernetes components
Separate operator-defined monitoring pipeline for user-exposed monitoring
- Generally collects core metrics redundantly/independently
Should it be possible to implement the core metrics pipeline on top of the custom monitoring system
- As long as one implements the core metrics API, one could swap it out for scheduler etc.
Upstream Kubernetes would test against the stable core pipeline
Replaceable != Pluggable – the entire thing gets replaced in a custom scenario
Master Metrics API part of main Kubernetes API
- Should further APIs like for historic metrics also be in that group?
- Discussion for sig-apimachinery
Should Infrastore be part of core Kubernetes
- Provides historic time series data about the system
- Would require implementing a subset of a TSDB
- Not an implemented component, just an API
What are core metrics exactly?
- CPU, memory, disk
- What about network and ingress?
- Resource estimator would not read from master metrics API but collect information itself (e.g. from kubelet)

2016-09-22

Agenda

Mission statement: https://docs.google.com/document/d/15Q47xbYTGHEZ-wVULGSgOSD5Kq-OehJj-MEChVH1kqk/edit?usp=sharing
Kubesnap demo

Notes

Kubesnap demo by Andrzej Kuriata, Intel (slides):
- Daemon set in k8s
- Integration with Heapster
Mission Statement:
- Enough people to coordinate, but small enough to be focused
- List of people actually doing development/design in the scope of this sig
- Scratchpad before a meeting to set discussions of features before meeting
- Sig autoscaling discussed and committed to features/metrics in previous meetings
- A plan for an api for 1.5?

2016-09-15

Agenda

Presentation by Eric Lemoine (Mirantis): monitoring Kubernetes with Snap and Hindsight. Slides
Meeting frequency
Ownership SIG instrumentation vs SIG autoscaling
Discuss how to export pod labels for cAdvisor metrics (see kubernetes/kubernetes#32326)

Notes

Meeting frequency - defer until ownership clarified
Ownership SIG autoscaling vs instrumentation
- Triggering issue: https://github.com/kubernetes/kubernetes/issues/31784
- HPA is consumer of Master Metrics API (also kubectl top, scheduler, UI)
- Could potentially be relevant to monitoring as well
- Make distinction between metrics used by the cluster and metrics about the cluster
- One SIG lead cares about system level metrics, one about the external/monitoring side. Good setup for the SIG to handle both areas?
- Follow up with mission statement on the mailing list taking these things into account
Kube-state-metrics v0.2.0 was released with many more metrics:
- https://github.com/kubernetes/kube-state-metrics#metrics

2016-09-08

Agenda

Sylvain Boily showing their monitoring solution

Notes

Demo by Sylvain on their monitoring setup using InfluxDB+Grafana+Kapacitor
- Scraping metrics from Heapster, Eventer, and apiserver
Separation apiserver vs kube-state-metrics
- The apiserver exposes metrics on /metrics about the running state of the apiserver process
  - How man requests came in from clients? What was their latency?
  - Outbound latency to the etcd cluster?
- Kube-state-metrics aims to provide metrics on logical state of the entire Kubernetes cluster
  - How many deployments exist?
  - How many restarts did pod X have?
  - How many available/desired pods does a deployment have?
  - How much capacity does node X have?
Separation Heapster vs kube-state-metrics
- Heapster holds metrics about characteristics about things running on Kubernetes, used by other system components.
- Currently Heapster asks the Kubelet for cAdvisor metrics vs. kube-state-metrics collecting information from the apiserver
Should eventer information be consolidated with kube-state-metrics?
Should we look into the creation of a monitoring namespace / service for all other namespace to use?
Should monitoring be available out of the box with a k8s installation when done in a private datacenter ?

2016-09-01

Agenda

State of Kubernetes monitoring at Soundcloud (Matthias Rampke)
Future of kube-state-metrics
Application metric separation in cAdvisor (https://github.com/google/cadvisor/issues/1420)
...

Notes

Matthias Rampke giving an intro to their Kubernetes monitoring setup
- Currently running Prometheus generally outside of Kubernetes
  - Easy migration path from previous infrastructure
- Still using DNS as service discovery instead of Kubernetes API
- Sharded Prometheus servers by team for application monitoring
- Severe lack of metrics around Kubernetes cluster state itself
- Long-term vision (1yr): all services and their dependencies running inside of Kubernetes
  - Prometheus part of that via a standard configuration
  - Easy to spin up monitoring new components
People using Heapster as it gives them all metrics in one component
Something as easy to deploy as Heapster would be useful
Three sets of metrics
- Those useful only for monitoring (e.g. number of pods)
- Metrics for auto-scaling (CPU, custom app metrics)
- Those that fit both
Make Prometheus a first-class citizen/best practice for exposing custom auto-scaling metrics?
Overlap between auto-scaling and monitoring metrics seems generally fine
- storing them twice is okay, auto-scaling metrics are way fewer
Kube-state-metrics
- Keep it as a playground or fold it into controller manager?

2016-08-25

Notes

CoreOS would like to see
- more instrumentation as insight into cluster
- Remove orthogonal features in for example cadvisor
RedHat
- Good out-of-the-box solution for cluster observability, component interaction
- Collaboration with sig-autoscaling
SoundCloud:
- Prometheus originated at SoundCloud
- Bare metal kubernetes setup: separation of monitoring
- Separation of heapster and overall kubernetes architecture
- How are people instrumenting around kubernetes
Mirantis:
- Scalability of monitoring solutions
- More metadata from kubelet “stats” API: labels are missing for example
- Also interested in “Separation of heapster and overall kubernetes architecture” (from SoundCloud)
- Extended insight into OpenStack & Kubernetes
- During our scalability tests we want to measure k8s behaviour in some set of defined metrics
Intel:
- Integration of snap into kubernetes
- Help deliver monitoring goals

Where should guides for flavors of monitoring live?

→ ad hoc currently, not all the same

→ best practices in the community

Where are we and where do we want to do? → Google doc will be setup

Next meeting: Discuss google doc & Matthias from SoundCloud will give an insight of how they are using Prometheus to monitor Kubernetes and its pain points.

Next time will use Zoom as hangout limit is 10 participants.

Kubernetes monitoring architecture (~~requires joining https://groups.google.com/forum/#!forum/kubernetes-sig-node~~): https://docs.google.com/document/d/1HMvhhtV3Xow85iZdowJ7GMsryU6pvjOzruqcJYY9MMI/edit?ts=57b0eec1#heading=h.gav7ymlujqys

14 KiB Raw Blame History Unescape Escape

2016-12-15

2016-12-08

2016-12-01

Agenda

[Cancelled] 2016-11-24: Thanksgiving in US

[Cancelled] 2016-11-17: no meeting week

[Cancelled] 2016-11-10: Kubecon

[Cancelled] 2016-11-03

2016-10-27

Agenda

2016-10-20

Agenda

[Cancelled] 2016-10-13

2016-10-06

Agenda

2016-09-29

Agenda

Notes

2016-09-22

Agenda

Notes

2016-09-15

Agenda

Notes

2016-09-08

Agenda

Notes

2016-09-01

Agenda

Notes

2016-08-25

Notes

14 KiB

Raw Blame History