istio.io/archive/v0.3/blog/mixer-spof-myth.html

2 lines
17 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html><html lang="en" itemscope itemtype="https://schema.org/WebPage" style="overflow-y: scroll;"><head><meta charset="utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width, initial-scale=1"><meta name="title" content="Mixer and the SPOF Myth"><meta name="og:title" content="Mixer and the SPOF Myth"><meta name="og:image" content="/v0.3/img/logo.png"/><meta name="theme-color" content="#466BB0"/><meta name="description" content="Mixer's effect on reliability and latency"><meta name="og:description" content="Mixer's effect on reliability and latency"><title>Istioldie 0.3 / Mixer and the SPOF Myth</title><script> window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};ga.l=+new Date; ga('create', 'UA-98480406-2', 'auto'); ga('send', 'pageview'); </script> <script async src='https://www.google-analytics.com/analytics.js'></script><link rel="alternate" type="application/rss+xml" title="Istio Blog RSS" href="/v0.3/feed.xml"><link rel="shortcut icon" href="/v0.3/favicons/favicon.ico" ><link rel="apple-touch-icon" href="/v0.3/favicons/apple-touch-icon-180x180.png" sizes="180x180"><link rel="icon" type="image/png" href="/v0.3/favicons/favicon-16x16.png" sizes="16x16"><link rel="icon" type="image/png" href="/v0.3/favicons/favicon-32x32.png" sizes="32x32"><link rel="icon" type="image/png" href="/v0.3/favicons/android-36x36.png" sizes="36x36"><link rel="icon" type="image/png" href="/v0.3/favicons/android-48x48.png" sizes="48x48"><link rel="icon" type="image/png" href="/v0.3/favicons/android-72x72.png" sizes="72x72"><link rel="icon" type="image/png" href="/v0.3/favicons/android-96x196.png" sizes="96x196"><link rel="icon" type="image/png" href="/v0.3/favicons/android-144x144.png" sizes="144x144"><link rel="icon" type="image/png" href="/v0.3/favicons/android-192x192.png" sizes="192x192"><link rel="manifest" href="/v0.3/manifest.json"><meta name="apple-mobile-web-app-title" content="Istio"><meta name="application-name" content="Istio"><link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Roboto:400,100,100italic,300,300italic,400italic,500,500italic,700,700italic,900,900italic"><link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css"><link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css"><link rel="stylesheet" href="/v0.3/css/all.css"><link rel="stylesheet" href="/v0.3/css/prism.css"></head><body class="language-unknown"> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script><div class="nav-hero-container" style="z-index: 200000;"><nav id="header-nav" class="navbar navbar-inverse" role="navigation" style="z-index: 200000;"><div class="container"><div class="row"><div class="col-md-11 nofloat center-block "><div class="navbar-header"> <button type="button" class="hamburger navbar-toggle collapsed" data-toggle="collapse" data-target="#navbar-collapse-1" aria-expanded="false"> <span class="sr-only">Toggle navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <a class="navbar-brand" href="/v0.3/"><div> <img src="/v0.3/img/istio-logo.svg" alt="Istio Logo" height="54px"/> <span class="brand-name">Istioldie 0.3</span></div></a></div><div class="collapse navbar-collapse" id="navbar-collapse-1"><ul class="nav navbar-nav navbar-right"><li><a href="/v0.3/about" >About</a></li><li><a href="/v0.3/blog" class='current'>Blog</a></li><li><a href="/v0.3/docs/welcome" >Docs</a></li><li><a href="/v0.3/help" >Help</a></li><li><a href="/v0.3/community" >Community</a></li><li class="dropdown"> <a class="dropdown-toggle" data-toggle="dropdown" href=""> <i class='fa fa-lg fa-cog'></i> <span class="caret"></span> </a><ul class="dropdown-menu"><h6 class="dropdown-header">Other versions of this site</h6><li> <a href="https://istio.io">Current Release</a></li><li> <a href="https://preliminary.istio.io">Next Release</a></li><li> <a href="https://archive.istio.io">Older Releases</a></li></ul></li><li><form name="cse" id="searchbox_demo" class="navbar-form navbar-right" role="search"> <input type="hidden" name="cx" value="013699703217164175118:iwwf17ikgf4" /> <input type="hidden" name="ie" value="utf-8" /> <input type="hidden" name="hl" value="en" /><div class="form-group"><div class="input-group"> <input name="q" class="form-control search-box" type="text" size="30" /><div class="input-group-addon"> <span class="btn-search glyphicon glyphicon-search"></span></div></div></div></form> <script type="text/javascript" src="https://www.google.com/cse/brand?form=searchbox_demo"></script></li></ul></div></div></div></div></nav></div><div class="container"><div class="row"><div class="col-sm-12 col-md-10 col-lg-7 nofloat center-block markdown"><article class="post-wrapper"><h1>Mixer and the SPOF Myth</h1><div class="postdate"> Posted on Thursday, December 07 2017.</div><div id="toc" class="toc"></div><div class="content"><p>As <a href="/v0.3/docs/concepts/policy-and-control/mixer.html">Mixer</a> is in the request path, it is natural to question how it impacts overall system availability and latency. A common refrain we hear when people first glance at Istio architecture diagrams is “Isnt this just introducing a single point of failure?”</p><p>In this post, well dig deeper and cover the design principles that underpin Mixer and the surprising fact Mixer actually increases overall mesh availability and reduces average request latency.</p><p>Istios use of Mixer has two main benefits in terms of overall system availability and latency:</p><ul><li><p><strong>Increased SLO</strong>. Mixer insulates proxies and services from infrastructure backend failures, enabling higher effective mesh availability. The mesh as a whole tends to experience a lower rate of failure when interacting with the infrastructure backends than if Mixer were not present.</p></li><li><p><strong>Reduced Latency</strong>. Through aggressive use of shared multi-level caches and sharding, Mixer reduces average observed latencies across the mesh.</p></li></ul><p>Well explain this in more detail below.</p><h2 id="how-we-got-here">How we got here</h2><p>For many years at Google, weve been using an internal API &amp; service management system to handle the many APIs exposed by Google. This system has been fronting the worlds biggest services (Google Maps, YouTube, Gmail, etc) and sustains a peak rate of hundreds of millions of QPS. Although this system has served us well, it had problems keeping up with Googles rapid growth, and it became clear that a new architecture was needed in order to tamp down ballooning operational costs.</p><p>In 2014, we started an initiative to create a replacement architecture that would scale better. The result has proven extremely successful and has been gradually deployed throughout Google, saving in the process millions of dollars a month in ops costs.</p><p>The older system was built around a centralized fleet of fairly heavy proxies into which all incoming traffic would flow, before being forwarded to the services where the real work was done. The newer architecture jettisons the shared proxy design and instead consists of a very lean and efficient distributed sidecar proxy sitting next to service instances, along with a shared fleet of sharded control plane intermediaries:</p><figure><img src="./img/mixer-spof-myth-1.svg" alt="Google System Topology" title="Google System Topology" /><figcaption>Google's API &amp; Service Management System</figcaption></figure><p>Look familiar? Of course: its just like Istio! Istio was conceived as a second generation of this distributed proxy architecture. We took the core lessons from this internal system, generalized many of the concepts by working with our partners, and created Istio.</p><h2 id="architecture-recap">Architecture recap</h2><p>As shown in the diagram below, Mixer sits between the mesh and the infrastructure backends that support it:</p><figure><img src="./img/mixer-spof-myth-2.svg" alt="Istio Topology" title="Istio Topology" /><figcaption>Istio Topology</figcaption></figure><p>The Envoy sidecar logically calls Mixer before each request to perform precondition checks, and after each request to report telemetry. The sidecar has local caching such that a relatively large percentage of precondition checks can be performed from cache. Additionally, the sidecar buffers outgoing telemetry such that it only actually needs to call Mixer once for every several thousands requests. Whereas precondition checks are synchronous to request processing, telemetry reports are done asynchronously with a fire-and-forget pattern.</p><p>At a high level, Mixer provides:</p><ul><li><p><strong>Backend Abstraction</strong>. Mixer insulates the Istio components and services within the mesh from the implementation details of individual infrastructure backends.</p></li><li><p><strong>Intermediation</strong>. Mixer allows operators to have fine-grained control over all interactions between the mesh and the infrastructure backends.</p></li></ul><p>However, even beyond these purely functional aspects, Mixer has other characteristics that provide the system with additional benefits.</p><h2 id="mixer-slo-booster">Mixer: SLO booster</h2><p>Contrary to the claim that Mixer is a SPOF and can therefore lead to mesh outages, we believe it in fact improves the effective availability of a mesh. How can that be? There are three basic characteristics at play:</p><ul><li><p><strong>Statelessness</strong>. Mixer is stateless in that it doesnt manage any persistent storage of its own.</p></li><li><p><strong>Hardening</strong>. Mixer proper is designed to be a highly reliable component. The design intent is to achieve &gt; 99.999% uptime for any individual Mixer instance.</p></li><li><p><strong>Caching and Buffering</strong>. Mixer is designed to accumulate a large amount of transient ephemeral state.</p></li></ul><p>The sidecar proxies that sit next to each service instance in the mesh must necessarily be frugal in terms of memory consumption, which constrains the possible amount of local caching and buffering. Mixer, however, lives independently and can use considerably larger caches and output buffers. Mixer thus acts as a highly-scaled and highly-available second-level cache for the sidecars.</p><p>Mixers expected availability is considerably higher than most infrastructure backends (those often have availability of perhaps 99.9%). Its local caches and buffers help mask infrastructure backend failures by being able to continue operating even when a backend has become unresponsive.</p><h2 id="mixer-latency-slasher">Mixer: Latency slasher</h2><p>As we explained above, the Istio sidecars generally have fairly effective first-level caching. They can serve the majority of their traffic from cache. Mixer provides a much greater shared pool of second-level cache, which helps Mixer contribute to a lower average per-request latency.</p><p>While its busy cutting down latency, Mixer is also inherently cutting down the number of calls your mesh makes to infrastructure backends. Depending on how youre paying for these backends, this might end up saving you some cash by cutting down the effective QPS to the backends.</p><h2 id="work-ahead">Work ahead</h2><p>We have opportunities ahead to continue improving the system in many ways.</p><h3 id="config-canaries">Config canaries</h3><p>Mixer is highly scaled so it is generally resistant to individual instance failures. However, Mixer is still susceptible to cascading failures in the case when a poison configuration is deployed which causes all Mixer instances to crash basically at the same time (yeah, that would be a bad day). To prevent this from happening, config changes can be canaried to a small set of Mixer instances, and then more broadly rolled out.</p><p>Mixer doesnt yet do canarying of config changes, but we expect this to come online as part of Istios ongoing work on reliable config distribution.</p><h3 id="cache-tuning">Cache tuning</h3><p>We have yet to fine-tune the sizes of the sidecar and Mixer caches. This work will focus on achieving the highest performance possible using the least amount of resources.</p><h3 id="cache-sharing">Cache sharing</h3><p>At the moment, each Mixer instance operates independently of all other instances. A request handled by one Mixer instance will not leverage data cached in a different instance. We will eventually experiment with a distributed cache such as memcached or Redis in order to provide a much larger mesh-wide shared cache, and further reduce the number of calls to infrastructure backends.</p><h3 id="sharding">Sharding</h3><p>In very large meshes, the load on Mixer can be great. There can be a large number of Mixer instances, each straining to keep caches primed to satisfy incoming traffic. We expect to eventually introduce intelligent sharding such that Mixer instances become slightly specialized in handling particular data streams in order to increase the likelihood of cache hits. In other words, sharding helps improve cache efficiency by routing related traffic to the same Mixer instance over time, rather than randomly dispatching to any available Mixer instance.</p><h2 id="conclusion">Conclusion</h2><p>Practical experience at Google showed that the model of a slim sidecar proxy and a large shared caching control plane intermediary hits a sweet spot, delivering excellent perceived availability and latency. Weve taken the lessons learned there and applied them to create more sophisticated and effective caching, prefetching, and buffering strategies in Istio. Weve also optimized the communication protocols to reduce overhead when a cache miss does occur.</p><p>Mixer is still young. As of Istio 0.3, we havent really done significant performance work within Mixer itself. This means when a request misses the sidecar cache, we spend more time in Mixer to respond to requests than we should. Were doing a lot of work to improve this in coming months to reduce the overhead that Mixer imparts in the synchronous precondition check case.</p><p>We hope this post makes you appreciate the inherent benefits that Mixer brings to Istio. Dont hesitate to post comments or questions to <a href="https://groups.google.com/forum/#!forum/istio-integrations">istio-integrations@</a>.</p></div><div class="content-attribution"> The Mixer Crew</div></article></div></div></div><footer><div class="container"><div class="row"><div class="col-lg-2 col-md-2 col-sm-2"></div><div class="col-lg-3 col-md-3 col-sm-3 col-xs-12 center-block"><ul><li><a class="header" href="/v0.3/docs/welcome">Docs</a></li><li><a href="/v0.3/docs/concepts">Concepts</a></li><li><a href="/v0.3/docs/setup">Setup</a></li><li><a href="/v0.3/docs/tasks">Tasks</a></li><li><a href="/v0.3/docs/guides">Guides</a></li><li><a href="/v0.3/docs/reference">Reference</a></li></ul></div><div class="col-lg-3 col-md-3 col-sm-3 col-xs-12 center-block"><ul><li><a class="header" href="/v0.3/help">Help</a></li><li><a href="/v0.3/faq">FAQ</a></li><li><a href="/v0.3/glossary">Glossary</a></li><li><a href="/v0.3/troubleshooting">Troubleshooting</a></li><li><a href="/v0.3/bugs">Report Bugs</a></li><li><a href="https://github.com/istio/istio.github.io/issues/new?title=Issue with _posts/2017-12-07-mixer-spof-myth.md">Doc Bugs & Gaps</a></li><li><a href="https://github.com/istio/istio.github.io/edit/master/_posts/2017-12-07-mixer-spof-myth.md">Edit This Page</a></li></ul></div><div class="col-lg-3 col-md-3 col-sm-3 col-xs-12 center-block"><ul><li> <a class="header" href="/v0.3/community">Community</a></li><li> <a href="https://groups.google.com/forum/#!forum/istio-users" target="_blank" rel="noopener">User</a> | <a href="https://groups.google.com/forum/#!forum/istio-dev" target="_blank" rel="noopener">Dev Mailing Lists</a></li><li><a href="https://twitter.com/IstioMesh" target="_blank" rel="noopener">Twitter</a></li><li><a href="https://stackoverflow.com/questions/tagged/istio" target="_blank" rel="noopener">Stack Overflow</a></li><li><a href="https://github.com/istio/community" target="_blank" rel="noopener">GitHub</a></li><li><a href="https://github.com/istio/community/blob/master/WORKING-GROUPS.md" target="_blank" rel="noopener">Working Groups</a></li></ul></div><div class="col-lg-1 col-md-1 col-sm-1"></div></div><div class="row"><p class="description small text-center"> Istio 0.3, Copyright &copy; 2017 Istio Authors<br> Archived on 08-Dec-2017</p></div></div></footer><script src="https://cdnjs.cloudflare.com/ajax/libs/jquery-validate/1.15.0/jquery.validate.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.form/4.2.1/jquery.form.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery-visible/1.2.0/jquery.visible.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/jqueryui/1.12.1/jquery-ui.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/slick-carousel/1.6.0/slick.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/clipboard.js/1.7.1/clipboard.min.js"></script> <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script> <script src="/v0.3/js/common.min.js"></script> <script src="/v0.3/js/search.js"></script> <script src="/v0.3/js/prism.min.js"></script></body></html>