Compare commits

...

15 Commits

Author SHA1 Message Date
Knative Automation 1898a2a10c
[release-1.16] Upgrade to latest dependencies (#8488)
upgrade to latest dependencies

bumping knative.dev/hack 30344ae...b5e4ff8:
  > b5e4ff8 [release-1.16] Update GKE version to 1.29 (# 415)
  > 6cb0feb [release-1.16] Refactor release script to gh CLI (# 410)
bumping knative.dev/hack/schema 30344ae...b5e4ff8:
  > b5e4ff8 [release-1.16] Update GKE version to 1.29 (# 415)
  > 6cb0feb [release-1.16] Refactor release script to gh CLI (# 410)
bumping knative.dev/reconciler-test 09111f0...f4bd4f5:
  > f4bd4f5 upgrade to latest dependencies (# 779)

Signed-off-by: Knative Automation <automation@knative.team>
2025-02-20 12:17:48 +00:00
Knative Prow Robot d59d3b68f8
[release-1.16] Scheduler: Resync reserved periodically to keep state consistent (#8452)
Scheduler: Resync reserved periodically to keep state consistent

Add resyncReserved removes deleted vPods from reserved to keep the
state consistent when leadership changes (Promote / Demote).

`initReserved` is not enough since the vPod lister can be stale.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>
2025-02-11 15:22:50 +00:00
Knative Prow Robot aae7a34a35
[release-1.16] Add `sinks.knative.dev` to namespaced ClusterRole (#8434)
Add `sinks.knative.dev` to namespaced ClusterRole

These are roles that users can use to give their developers access
to Knative Eventing resources and we're missing the sinks group.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>
2025-01-31 07:32:57 +00:00
Knative Prow Robot 7289df9b75
[release-1.16] fix: remove duplicated observedGeneration from jobsinks.sinks.knative.dev (#8423)
fix: remove duplicated observedGeneration from jobsinks.sinks.knative.dev

Co-authored-by: Fabian-K <fabiankajzar@googlemail.com>
2025-01-22 18:28:10 +00:00
Knative Prow Robot 2a46ff568f
[release-1.16] Reduce mt-broker-controller memory usage with namespaced endpoint informer (#8422)
* Reduce mt-broker-controller memory usage with namespaced endpoint informer

Currently, the mt-broker-controller is using a cluster-wide endpoints
informer but it actually only uses endpoints in the "SYSTEM_NAMESPACE".

Using the namespaced informer factory ensures that the watcher
is only watching endpoints in the `knative-eventing` (also known as
`SYSTEM_NAMESPACE`) namespace.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Start informer

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>
2025-01-22 17:20:11 +00:00
Knative Automation bb92b8c414
[release-1.16] Upgrade to latest dependencies (#8409)
upgrade to latest dependencies

bumping knative.dev/hack/schema 05b2fb3...30344ae:
  > 30344ae Export KO_FLAGS for consuming scripts (# 402)
bumping knative.dev/hack 05b2fb3...30344ae:
  > 30344ae Export KO_FLAGS for consuming scripts (# 402)

Signed-off-by: Knative Automation <automation@knative.team>
2025-01-15 06:53:03 +00:00
Knative Prow Robot 7da3cee603
[release-1.16] Scheduler: LastOrdinal based on replicas instead of FreeCap (#8394)
Scheduler: LastOrdinal based on replicas instead of FreeCap

When scaling down and compacting, basing the last ordinal on the
free capacity structure leads to have a lastOrdinal off by one since
`FreeCap` might contain the free capacity for unschedulable pods.

We will have to continue including unschduelable pods in FreeCap
because it might happen that a pod becomes unscheduleble for external
reasons like node gets shutdown for pods with lower ordinals
and the pod need to be rescheduled and during that time period
we want to consider those when compacting; once all vpods that
were on that pod that is gone get rescheduled, FreeCap will only
include scheduleable pods.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>
2024-12-19 14:30:11 +00:00
Knative Prow Robot ee786eeee8
[release-1.16] Register eventshub image for JobSink (#8391)
Register eventshub image for JobSink

The package must be registered so that ImageProducer can map it to the
right image and replace it in the final yaml.

Co-authored-by: Martin Gencur <mgencur@redhat.com>
2024-12-19 06:10:10 +00:00
Knative Prow Robot 852ae3ba28
[release-1.16] Remove conversion webhook config in EventPolicy CRD (#8381)
Remove conversion webhook config in EventPolicy CRD

As we don't have multiple EP versions yet, we don't need the conversion webhook configuration in the EventPolicy CRD

Co-authored-by: Christoph Stäbler <cstabler@redhat.com>
2024-12-12 10:47:53 +00:00
Knative Prow Robot 9740b12837
[release-1.16] MT-Broker: return retriable status code based on the state to leverage retries (#8367)
* MT-Broker: return appropriate status code based on the state to leverage retries

The ingress or filter deployments were returning 400 even in the case
where a given resource (like trigger, broker, subscription) wasn't
found, however, this is a common case where the lister cache
hasn't caught up with the latest state.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Fix unit tests

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>
2024-12-03 13:27:44 +00:00
Knative Prow Robot 96ab579ab5
[release-1.16] fix: rename `job-sink` to `job_sink` (#8339)
fix: rename `job-sink` to `job_sink`

Co-authored-by: Yates <yates.lyc@gmail.com>
2024-11-21 16:22:02 +00:00
Knative Prow Robot 526343206b
[release-1.16] JobSink: Delete secrets associated with jobs when jobs are deleted (#8332)
* JobSink: Delete secrets associated with jobs when jobs are deleted

As reported in https://github.com/knative/eventing/issues/8323 old
JobSink secrets lead to processing old events again while new events
are lost.

Using OwnerReference and k8s garbage collection, now a secret created
for a given event is bound to a given Job lifecycle, so that when a job
is deleted, the associated secret will be deleted.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Fix jobsink name generator + add unit and fuzz tests

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Fix e2e test

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Lint

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>
2024-11-19 09:01:00 +00:00
Knative Prow Robot a16468471b
[release-1.16] Add jobsinks-addressable-resolver cluster role (#8302)
Add jobsinks-addressable-resolver cluster role

This will ensure that alld ServiceAccount that are bound to
"addressable-resolver" ClusterRole can read JobSinks.

Fixes issues like this for SinkBindings:
```
{"level":"error","ts":"2024-11-04T08:06:16.160Z","logger":"eventing-webhook","caller":"sinkbinding/sinkbinding.go:87",
"msg":"Failed to get Addressable from Destination:
%!w(*fmt.wrapError=&{failed to get lister for
sinks.knative.dev/v1alpha1,
Resource=jobsinks: jobsinks.sinks.knative.dev is forbidden:
User \"system:serviceaccount:knative-eventing:eventing-webhook\"
cannot list resource \"jobsinks\" in API group \"sinks.knative.dev\"
```

Co-authored-by: Martin Gencur <mgencur@redhat.com>
2024-11-19 07:19:59 +00:00
Knative Prow Robot d0c035b338
[release-1.16] Add observedGeneration in JobSink OpenAPI schema (#8300)
Add observedGeneration in JobSink OpenAPI schema

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>
2024-11-04 12:08:55 +00:00
Knative Prow Robot 6db9011037
[release-1.16] Schduler: MAXFILLUP strategy will spread vreplicas across multiple pods (#8291)
* Schduler: MAXFILLUP strategy will spread vreplicas across multiple pods

the MAXFILLUP algorithm was using an affinity strategy, meaning that
it would prioritize adding new vreplicas to pods with the same resources.

However, the downside is that if one pod goes down or gets
re-scheduled the entire resource would be down and not produce
events. By spreading replicas across multiple real replicas we would
guarantee better availability.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Remove configurable HA scheduler, fix reserved replicas logic

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Log reserved

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Handle unschedulables pods and always start from reserved no matter what is placements

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Add reserved + overcommit

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Add benchmark + reduce OrdinalFromPodName calls

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Handle unschedulable pods

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>
2024-10-30 06:43:49 +00:00
57 changed files with 1671 additions and 6316 deletions

View File

@ -20,6 +20,7 @@ import (
"context" "context"
"crypto/md5" //nolint:gosec "crypto/md5" //nolint:gosec
"crypto/tls" "crypto/tls"
"encoding/hex"
"fmt" "fmt"
"log" "log"
"net/http" "net/http"
@ -65,7 +66,7 @@ import (
"knative.dev/eventing/pkg/utils" "knative.dev/eventing/pkg/utils"
) )
const component = "job-sink" const component = "job_sink"
func main() { func main() {
@ -231,11 +232,11 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
return return
} }
id := toIdHashLabelValue(event.Source(), event.ID()) jobName := toJobName(ref.Name, event.Source(), event.ID())
logger.Debug("Getting job for event", zap.String("URI", r.RequestURI), zap.String("id", id)) logger.Debug("Getting job for event", zap.String("URI", r.RequestURI), zap.String("jobName", jobName))
jobs, err := h.k8s.BatchV1().Jobs(js.GetNamespace()).List(r.Context(), metav1.ListOptions{ jobs, err := h.k8s.BatchV1().Jobs(js.GetNamespace()).List(r.Context(), metav1.ListOptions{
LabelSelector: jobLabelSelector(ref, id), LabelSelector: jobLabelSelector(ref, jobName),
Limit: 1, Limit: 1,
}) })
if err != nil { if err != nil {
@ -256,56 +257,21 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
return return
} }
jobName := kmeta.ChildName(ref.Name, id)
logger.Debug("Creating secret for event", zap.String("URI", r.RequestURI), zap.String("jobName", jobName))
jobSinkUID := js.GetUID()
or := metav1.OwnerReference{
APIVersion: sinksv.SchemeGroupVersion.String(),
Kind: sinks.JobSinkResource.Resource,
Name: js.GetName(),
UID: jobSinkUID,
Controller: ptr.Bool(true),
BlockOwnerDeletion: ptr.Bool(false),
}
secret := &corev1.Secret{
TypeMeta: metav1.TypeMeta{},
ObjectMeta: metav1.ObjectMeta{
Name: jobName,
Namespace: ref.Namespace,
Labels: map[string]string{
sinks.JobSinkIDLabel: id,
sinks.JobSinkNameLabel: ref.Name,
},
OwnerReferences: []metav1.OwnerReference{or},
},
Immutable: ptr.Bool(true),
Data: map[string][]byte{"event": eventBytes},
Type: corev1.SecretTypeOpaque,
}
_, err = h.k8s.CoreV1().Secrets(ref.Namespace).Create(r.Context(), secret, metav1.CreateOptions{})
if err != nil && !apierrors.IsAlreadyExists(err) {
logger.Warn("Failed to create secret", zap.Error(err))
w.Header().Add("Reason", err.Error())
w.WriteHeader(http.StatusInternalServerError)
return
}
logger.Debug("Creating job for event", zap.String("URI", r.RequestURI), zap.String("jobName", jobName))
job := js.Spec.Job.DeepCopy() job := js.Spec.Job.DeepCopy()
job.Name = jobName job.Name = jobName
if job.Labels == nil { if job.Labels == nil {
job.Labels = make(map[string]string, 4) job.Labels = make(map[string]string, 4)
} }
job.Labels[sinks.JobSinkIDLabel] = id job.Labels[sinks.JobSinkIDLabel] = jobName
job.Labels[sinks.JobSinkNameLabel] = ref.Name job.Labels[sinks.JobSinkNameLabel] = ref.Name
job.OwnerReferences = append(job.OwnerReferences, or) job.OwnerReferences = append(job.OwnerReferences, metav1.OwnerReference{
APIVersion: sinksv.SchemeGroupVersion.String(),
Kind: sinks.JobSinkResource.Resource,
Name: js.GetName(),
UID: js.GetUID(),
Controller: ptr.Bool(true),
BlockOwnerDeletion: ptr.Bool(false),
})
var mountPathName string var mountPathName string
for i := range job.Spec.Template.Spec.Containers { for i := range job.Spec.Template.Spec.Containers {
found := false found := false
@ -346,14 +312,66 @@ func (h *Handler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
}) })
} }
_, err = h.k8s.BatchV1().Jobs(ref.Namespace).Create(r.Context(), job, metav1.CreateOptions{}) logger.Debug("Creating job for event",
if err != nil { zap.String("URI", r.RequestURI),
zap.String("jobName", jobName),
zap.Any("job", job),
)
createdJob, err := h.k8s.BatchV1().Jobs(ref.Namespace).Create(r.Context(), job, metav1.CreateOptions{})
if err != nil && !apierrors.IsAlreadyExists(err) {
logger.Warn("Failed to create job", zap.Error(err)) logger.Warn("Failed to create job", zap.Error(err))
w.Header().Add("Reason", err.Error()) w.Header().Add("Reason", err.Error())
w.WriteHeader(http.StatusInternalServerError) w.WriteHeader(http.StatusInternalServerError)
return return
} }
if apierrors.IsAlreadyExists(err) {
logger.Debug("Job already exists", zap.String("URI", r.RequestURI), zap.String("jobName", jobName))
}
secret := &corev1.Secret{
TypeMeta: metav1.TypeMeta{},
ObjectMeta: metav1.ObjectMeta{
Name: jobName,
Namespace: ref.Namespace,
Labels: map[string]string{
sinks.JobSinkIDLabel: jobName,
sinks.JobSinkNameLabel: ref.Name,
},
OwnerReferences: []metav1.OwnerReference{
{
APIVersion: "batch/v1",
Kind: "Job",
Name: createdJob.Name,
UID: createdJob.UID,
Controller: ptr.Bool(true),
BlockOwnerDeletion: ptr.Bool(false),
},
},
},
Immutable: ptr.Bool(true),
Data: map[string][]byte{"event": eventBytes},
Type: corev1.SecretTypeOpaque,
}
logger.Debug("Creating secret for event",
zap.String("URI", r.RequestURI),
zap.String("jobName", jobName),
zap.Any("secret.metadata", secret.ObjectMeta),
)
_, err = h.k8s.CoreV1().Secrets(ref.Namespace).Create(r.Context(), secret, metav1.CreateOptions{})
if err != nil && !apierrors.IsAlreadyExists(err) {
logger.Warn("Failed to create secret", zap.Error(err))
w.Header().Add("Reason", err.Error())
w.WriteHeader(http.StatusInternalServerError)
return
}
if apierrors.IsAlreadyExists(err) {
logger.Debug("Secret already exists", zap.String("URI", r.RequestURI), zap.String("jobName", jobName))
}
w.Header().Add("Location", locationHeader(ref, event.Source(), event.ID())) w.Header().Add("Location", locationHeader(ref, event.Source(), event.ID()))
w.WriteHeader(http.StatusAccepted) w.WriteHeader(http.StatusAccepted)
@ -391,8 +409,7 @@ func (h *Handler) handleGet(ctx context.Context, w http.ResponseWriter, r *http.
eventSource := parts[6] eventSource := parts[6]
eventID := parts[8] eventID := parts[8]
id := toIdHashLabelValue(eventSource, eventID) jobName := toJobName(ref.Name, eventSource, eventID)
jobName := kmeta.ChildName(ref.Name, id)
job, err := h.k8s.BatchV1().Jobs(ref.Namespace).Get(r.Context(), jobName, metav1.GetOptions{}) job, err := h.k8s.BatchV1().Jobs(ref.Namespace).Get(r.Context(), jobName, metav1.GetOptions{})
if err != nil { if err != nil {
@ -445,6 +462,7 @@ func jobLabelSelector(ref types.NamespacedName, id string) string {
return fmt.Sprintf("%s=%s,%s=%s", sinks.JobSinkIDLabel, id, sinks.JobSinkNameLabel, ref.Name) return fmt.Sprintf("%s=%s,%s=%s", sinks.JobSinkIDLabel, id, sinks.JobSinkNameLabel, ref.Name)
} }
func toIdHashLabelValue(source, id string) string { func toJobName(js string, source, id string) string {
return utils.ToDNS1123Subdomain(fmt.Sprintf("%s", md5.Sum([]byte(fmt.Sprintf("%s-%s", source, id))))) //nolint:gosec h := md5.Sum([]byte(source + id)) //nolint:gosec
return kmeta.ChildName(js+"-", utils.ToDNS1123Subdomain(hex.EncodeToString(h[:])))
} }

106
cmd/jobsink/main_test.go Normal file
View File

@ -0,0 +1,106 @@
/*
Copyright 2024 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package main
import (
"testing"
"k8s.io/apimachinery/pkg/api/validation"
"knative.dev/eventing/pkg/utils"
)
type testCase struct {
JobSinkName string
Source string
Id string
}
func TestToJobName(t *testing.T) {
testcases := []testCase{
{
JobSinkName: "job-sink-success",
Source: "mysource3/myservice",
Id: "2234-5678",
},
{
JobSinkName: "a",
Source: "0",
Id: "0",
},
}
for _, tc := range testcases {
t.Run(tc.JobSinkName+"_"+tc.Source+"_"+tc.Id, func(t *testing.T) {
if errs := validation.NameIsDNS1035Label(tc.JobSinkName, false); len(errs) != 0 {
t.Errorf("Invalid JobSinkName: %v", errs)
}
name := toJobName(tc.JobSinkName, tc.Source, tc.Id)
doubleName := toJobName(tc.JobSinkName, tc.Source, tc.Id)
if name != doubleName {
t.Errorf("Before: %q, after: %q", name, doubleName)
}
if got := utils.ToDNS1123Subdomain(name); got != name {
t.Errorf("ToDNS1123Subdomain(Want) returns a different result, Want: %q, Got: %q", name, got)
}
if errs := validation.NameIsDNS1035Label(name, false); len(errs) != 0 {
t.Errorf("toJobName produced invalid name %q given %q, %q, %q: errors: %#v", name, tc.JobSinkName, tc.Source, tc.Id, errs)
}
})
}
}
func FuzzToJobName(f *testing.F) {
testcases := []testCase{
{
JobSinkName: "job-sink-success",
Source: "mysource3/myservice",
Id: "2234-5678",
},
{
JobSinkName: "a",
Source: "0",
Id: "0",
},
}
for _, tc := range testcases {
f.Add(tc.JobSinkName, tc.Source, tc.Id) // Use f.Add to provide a seed corpus
}
f.Fuzz(func(t *testing.T, js, source, id string) {
if errs := validation.NameIsDNSLabel(js, false); len(errs) != 0 {
t.Skip("Prerequisite: invalid jobsink name")
}
name := toJobName(js, source, id)
doubleName := toJobName(js, source, id)
if name != doubleName {
t.Errorf("Before: %q, after: %q", name, doubleName)
}
if got := utils.ToDNS1123Subdomain(name); got != name {
t.Errorf("ToDNS1123Subdomain(Want) returns a different result, Want: %q, Got: %q", name, got)
}
if errs := validation.NameIsDNSLabel(name, false); len(errs) != 0 {
t.Errorf("toJobName produced invalid name %q given %q, %q, %q: errors: %#v", name, js, source, id, errs)
}
})
}

View File

@ -209,11 +209,3 @@ spec:
- knative - knative
- eventing - eventing
scope: Namespaced scope: Namespaced
conversion:
strategy: Webhook
webhook:
conversionReviewVersions: ["v1", "v1beta1"]
clientConfig:
service:
name: eventing-webhook
namespace: knative-eventing

View File

@ -144,3 +144,25 @@ rules:
- get - get
- list - list
- watch - watch
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: jobsinks-addressable-resolver
labels:
duck.knative.dev/addressable: "true"
app.kubernetes.io/version: devel
app.kubernetes.io/name: knative-eventing
# Do not use this role directly. These rules will be added to the "addressable-resolver" role.
rules:
- apiGroups:
- sinks.knative.dev
resources:
- jobsinks
- jobsinks/status
verbs:
- get
- list
- watch

View File

@ -79,6 +79,19 @@ rules:
--- ---
kind: ClusterRole kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1 apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: knative-sinks-namespaced-admin
labels:
rbac.authorization.k8s.io/aggregate-to-admin: "true"
app.kubernetes.io/version: devel
app.kubernetes.io/name: knative-eventing
rules:
- apiGroups: ["sinks.knative.dev"]
resources: ["*"]
verbs: ["*"]
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata: metadata:
name: knative-eventing-namespaced-edit name: knative-eventing-namespaced-edit
labels: labels:
@ -86,7 +99,7 @@ metadata:
app.kubernetes.io/version: devel app.kubernetes.io/version: devel
app.kubernetes.io/name: knative-eventing app.kubernetes.io/name: knative-eventing
rules: rules:
- apiGroups: ["eventing.knative.dev", "messaging.knative.dev", "sources.knative.dev", "flows.knative.dev", "bindings.knative.dev"] - apiGroups: ["eventing.knative.dev", "messaging.knative.dev", "sources.knative.dev", "flows.knative.dev", "bindings.knative.dev", "sinks.knative.dev"]
resources: ["*"] resources: ["*"]
verbs: ["create", "update", "patch", "delete"] verbs: ["create", "update", "patch", "delete"]
--- ---
@ -99,6 +112,6 @@ metadata:
app.kubernetes.io/version: devel app.kubernetes.io/version: devel
app.kubernetes.io/name: knative-eventing app.kubernetes.io/name: knative-eventing
rules: rules:
- apiGroups: ["eventing.knative.dev", "messaging.knative.dev", "sources.knative.dev", "flows.knative.dev", "bindings.knative.dev"] - apiGroups: ["eventing.knative.dev", "messaging.knative.dev", "sources.knative.dev", "flows.knative.dev", "bindings.knative.dev", "sinks.knative.dev"]
resources: ["*"] resources: ["*"]
verbs: ["get", "list", "watch"] verbs: ["get", "list", "watch"]

6
go.mod
View File

@ -43,10 +43,10 @@ require (
k8s.io/apiserver v0.30.3 k8s.io/apiserver v0.30.3
k8s.io/client-go v0.30.3 k8s.io/client-go v0.30.3
k8s.io/utils v0.0.0-20240711033017-18e509b52bc8 k8s.io/utils v0.0.0-20240711033017-18e509b52bc8
knative.dev/hack v0.0.0-20241010131451-05b2fb30cb4d knative.dev/hack v0.0.0-20250220110655-b5e4ff820460
knative.dev/hack/schema v0.0.0-20241010131451-05b2fb30cb4d knative.dev/hack/schema v0.0.0-20250220110655-b5e4ff820460
knative.dev/pkg v0.0.0-20241021183759-9b9d535af5ad knative.dev/pkg v0.0.0-20241021183759-9b9d535af5ad
knative.dev/reconciler-test v0.0.0-20241015093232-09111f0f1364 knative.dev/reconciler-test v0.0.0-20250217113355-f4bd4f5199d4
sigs.k8s.io/yaml v1.4.0 sigs.k8s.io/yaml v1.4.0
) )

12
go.sum
View File

@ -839,14 +839,14 @@ k8s.io/kube-openapi v0.0.0-20240808142205-8e686545bdb8 h1:1Wof1cGQgA5pqgo8MxKPtf
k8s.io/kube-openapi v0.0.0-20240808142205-8e686545bdb8/go.mod h1:Os6V6dZwLNii3vxFpxcNaTmH8LJJBkOTg1N0tOA0fvA= k8s.io/kube-openapi v0.0.0-20240808142205-8e686545bdb8/go.mod h1:Os6V6dZwLNii3vxFpxcNaTmH8LJJBkOTg1N0tOA0fvA=
k8s.io/utils v0.0.0-20240711033017-18e509b52bc8 h1:pUdcCO1Lk/tbT5ztQWOBi5HBgbBP1J8+AsQnQCKsi8A= k8s.io/utils v0.0.0-20240711033017-18e509b52bc8 h1:pUdcCO1Lk/tbT5ztQWOBi5HBgbBP1J8+AsQnQCKsi8A=
k8s.io/utils v0.0.0-20240711033017-18e509b52bc8/go.mod h1:OLgZIPagt7ERELqWJFomSt595RzquPNLL48iOWgYOg0= k8s.io/utils v0.0.0-20240711033017-18e509b52bc8/go.mod h1:OLgZIPagt7ERELqWJFomSt595RzquPNLL48iOWgYOg0=
knative.dev/hack v0.0.0-20241010131451-05b2fb30cb4d h1:aCfX7kwkvgGxXXGbso5tLqdwQmzBkJ9d+EIRwksKTvk= knative.dev/hack v0.0.0-20250220110655-b5e4ff820460 h1:N82WjXiv6RlXnA+qV4cA2tUbTnE3B6C3BWE+dcM/F9A=
knative.dev/hack v0.0.0-20241010131451-05b2fb30cb4d/go.mod h1:R0ritgYtjLDO9527h5vb5X6gfvt5LCrJ55BNbVDsWiY= knative.dev/hack v0.0.0-20250220110655-b5e4ff820460/go.mod h1:R0ritgYtjLDO9527h5vb5X6gfvt5LCrJ55BNbVDsWiY=
knative.dev/hack/schema v0.0.0-20241010131451-05b2fb30cb4d h1:N+UlBE8F8LJUh/m6cYSwzqdqNg65BD9jbWoWO9nfqEA= knative.dev/hack/schema v0.0.0-20250220110655-b5e4ff820460 h1:/f7BC9meo4clcGFDTTm2FjTeGg+rTswG/C033/Ih+JE=
knative.dev/hack/schema v0.0.0-20241010131451-05b2fb30cb4d/go.mod h1:jRH/sx6mwwuMVhvJgnzSaoYA1N4qaIkJa+zxEGtVA5I= knative.dev/hack/schema v0.0.0-20250220110655-b5e4ff820460/go.mod h1:jRH/sx6mwwuMVhvJgnzSaoYA1N4qaIkJa+zxEGtVA5I=
knative.dev/pkg v0.0.0-20241021183759-9b9d535af5ad h1:Nrjtr2H168rJeamH4QdyLMV1lEKHejNhaj1ymgQMfLk= knative.dev/pkg v0.0.0-20241021183759-9b9d535af5ad h1:Nrjtr2H168rJeamH4QdyLMV1lEKHejNhaj1ymgQMfLk=
knative.dev/pkg v0.0.0-20241021183759-9b9d535af5ad/go.mod h1:StJI72GWcm/iErmk4RqFJiOo8RLbVqPbHxUqeVwAzeo= knative.dev/pkg v0.0.0-20241021183759-9b9d535af5ad/go.mod h1:StJI72GWcm/iErmk4RqFJiOo8RLbVqPbHxUqeVwAzeo=
knative.dev/reconciler-test v0.0.0-20241015093232-09111f0f1364 h1:DIc+vbaFKOSGktPXJ1MaXIXoDjlmUIXQkHiZaPcYGbQ= knative.dev/reconciler-test v0.0.0-20250217113355-f4bd4f5199d4 h1:1/0+amRPHW+a2mpcLuK+B6GRG1vfFAKdpUXkMiV31D4=
knative.dev/reconciler-test v0.0.0-20241015093232-09111f0f1364/go.mod h1:PVRnK/YQo9s3foRtut00oAxvCPc9f/qV2PApZh/rMPw= knative.dev/reconciler-test v0.0.0-20250217113355-f4bd4f5199d4/go.mod h1:4QLrovEwvj/Xx5YGEcNTDAEsRK8fEW+xud1ie4HbmcU=
rsc.io/binaryregexp v0.2.0/go.mod h1:qTv7/COck+e2FymRvadv62gMdZztPaShugOCi3I+8D8= rsc.io/binaryregexp v0.2.0/go.mod h1:qTv7/COck+e2FymRvadv62gMdZztPaShugOCi3I+8D8=
rsc.io/quote/v3 v3.1.0/go.mod h1:yEA65RcK8LyAZtP9Kv3t0HmxON59tX3rD+tICJqUlj0= rsc.io/quote/v3 v3.1.0/go.mod h1:yEA65RcK8LyAZtP9Kv3t0HmxON59tX3rD+tICJqUlj0=
rsc.io/sampler v1.3.0/go.mod h1:T1hPZKmBbMNahiBKFy5HrXp6adAjACjK9JXDnKaTXpA= rsc.io/sampler v1.3.0/go.mod h1:T1hPZKmBbMNahiBKFy5HrXp6adAjACjK9JXDnKaTXpA=

View File

@ -35,4 +35,4 @@ wait_until_pods_running knative-eventing || fail_test "Pods in knative-eventing
header "Running tests" header "Running tests"
go test -tags=e2e -v -timeout=30m -run="${test_name}" "${test_dir}" || fail_test "Test(s) failed" go test -tags=e2e -v -timeout=30m -parallel=12 -run="${test_name}" "${test_dir}" || fail_test "Test(s) failed"

View File

@ -26,6 +26,8 @@ import (
"net/http" "net/http"
"time" "time"
apierrors "k8s.io/apimachinery/pkg/api/errors"
messagingv1 "knative.dev/eventing/pkg/apis/messaging/v1" messagingv1 "knative.dev/eventing/pkg/apis/messaging/v1"
messaginginformers "knative.dev/eventing/pkg/client/informers/externalversions/messaging/v1" messaginginformers "knative.dev/eventing/pkg/client/informers/externalversions/messaging/v1"
"knative.dev/eventing/pkg/reconciler/broker/resources" "knative.dev/eventing/pkg/reconciler/broker/resources"
@ -178,19 +180,17 @@ func (h *Handler) ServeHTTP(writer http.ResponseWriter, request *http.Request) {
} }
trigger, err := h.getTrigger(triggerRef) trigger, err := h.getTrigger(triggerRef)
if apierrors.IsNotFound(err) {
h.logger.Info("Unable to find the Trigger", zap.Error(err), zap.Any("triggerRef", triggerRef))
writer.WriteHeader(http.StatusNotFound)
return
}
if err != nil { if err != nil {
h.logger.Info("Unable to get the Trigger", zap.Error(err), zap.Any("triggerRef", triggerRef)) h.logger.Info("Unable to get the Trigger", zap.Error(err), zap.Any("triggerRef", triggerRef))
writer.WriteHeader(http.StatusBadRequest) writer.WriteHeader(http.StatusBadRequest)
return return
} }
subscription, err := h.getSubscription(features, trigger)
if err != nil {
h.logger.Info("Unable to get the Subscription of the Trigger", zap.Error(err), zap.Any("triggerRef", triggerRef))
writer.WriteHeader(http.StatusInternalServerError)
return
}
event, err := cehttp.NewEventFromHTTPRequest(request) event, err := cehttp.NewEventFromHTTPRequest(request)
if err != nil { if err != nil {
h.logger.Warn("failed to extract event from request", zap.Error(err)) h.logger.Warn("failed to extract event from request", zap.Error(err))
@ -216,6 +216,18 @@ func (h *Handler) ServeHTTP(writer http.ResponseWriter, request *http.Request) {
if features.IsOIDCAuthentication() { if features.IsOIDCAuthentication() {
h.logger.Debug("OIDC authentication is enabled") h.logger.Debug("OIDC authentication is enabled")
subscription, err := h.getSubscription(features, trigger)
if apierrors.IsNotFound(err) {
h.logger.Info("Unable to find the Subscription for trigger", zap.Error(err), zap.Any("triggerRef", triggerRef))
writer.WriteHeader(http.StatusNotFound)
return
}
if err != nil {
h.logger.Info("Unable to get the Subscription of the Trigger", zap.Error(err), zap.Any("triggerRef", triggerRef))
writer.WriteHeader(http.StatusInternalServerError)
return
}
audience := FilterAudience audience := FilterAudience
if subscription.Status.Auth == nil || subscription.Status.Auth.ServiceAccountName == nil { if subscription.Status.Auth == nil || subscription.Status.Auth.ServiceAccountName == nil {
@ -266,6 +278,11 @@ func (h *Handler) handleDispatchToReplyRequest(ctx context.Context, trigger *eve
} }
broker, err := h.brokerLister.Brokers(brokerNamespace).Get(brokerName) broker, err := h.brokerLister.Brokers(brokerNamespace).Get(brokerName)
if apierrors.IsNotFound(err) {
h.logger.Info("Unable to get the Broker", zap.Error(err))
writer.WriteHeader(http.StatusNotFound)
return
}
if err != nil { if err != nil {
h.logger.Info("Unable to get the Broker", zap.Error(err)) h.logger.Info("Unable to get the Broker", zap.Error(err))
writer.WriteHeader(http.StatusBadRequest) writer.WriteHeader(http.StatusBadRequest)
@ -311,6 +328,11 @@ func (h *Handler) handleDispatchToDLSRequest(ctx context.Context, trigger *event
brokerNamespace = trigger.Namespace brokerNamespace = trigger.Namespace
} }
broker, err := h.brokerLister.Brokers(brokerNamespace).Get(brokerName) broker, err := h.brokerLister.Brokers(brokerNamespace).Get(brokerName)
if apierrors.IsNotFound(err) {
h.logger.Info("Unable to get the Broker", zap.Error(err))
writer.WriteHeader(http.StatusNotFound)
return
}
if err != nil { if err != nil {
h.logger.Info("Unable to get the Broker", zap.Error(err)) h.logger.Info("Unable to get the Broker", zap.Error(err))
writer.WriteHeader(http.StatusBadRequest) writer.WriteHeader(http.StatusBadRequest)
@ -331,6 +353,9 @@ func (h *Handler) handleDispatchToDLSRequest(ctx context.Context, trigger *event
Audience: broker.Status.DeadLetterSinkAudience, Audience: broker.Status.DeadLetterSinkAudience,
} }
} }
if target == nil {
return
}
reportArgs := &ReportArgs{ reportArgs := &ReportArgs{
ns: trigger.Namespace, ns: trigger.Namespace,

View File

@ -27,11 +27,12 @@ import (
"testing" "testing"
"time" "time"
"knative.dev/eventing/pkg/eventingtls"
filteredFactory "knative.dev/pkg/client/injection/kube/informers/factory/filtered" filteredFactory "knative.dev/pkg/client/injection/kube/informers/factory/filtered"
"knative.dev/pkg/configmap" "knative.dev/pkg/configmap"
"knative.dev/pkg/system" "knative.dev/pkg/system"
"knative.dev/eventing/pkg/eventingtls"
messagingv1 "knative.dev/eventing/pkg/apis/messaging/v1" messagingv1 "knative.dev/eventing/pkg/apis/messaging/v1"
"knative.dev/eventing/pkg/reconciler/broker/resources" "knative.dev/eventing/pkg/reconciler/broker/resources"
@ -64,10 +65,11 @@ import (
eventpolicyinformerfake "knative.dev/eventing/pkg/client/injection/informers/eventing/v1alpha1/eventpolicy/fake" eventpolicyinformerfake "knative.dev/eventing/pkg/client/injection/informers/eventing/v1alpha1/eventpolicy/fake"
subscriptioninformerfake "knative.dev/eventing/pkg/client/injection/informers/messaging/v1/subscription/fake" subscriptioninformerfake "knative.dev/eventing/pkg/client/injection/informers/messaging/v1/subscription/fake"
// Fake injection client
_ "knative.dev/eventing/pkg/client/injection/informers/eventing/v1alpha1/eventpolicy/fake"
_ "knative.dev/pkg/client/injection/kube/client/fake" _ "knative.dev/pkg/client/injection/kube/client/fake"
_ "knative.dev/pkg/client/injection/kube/informers/factory/filtered/fake" _ "knative.dev/pkg/client/injection/kube/informers/factory/filtered/fake"
// Fake injection client
_ "knative.dev/eventing/pkg/client/injection/informers/eventing/v1alpha1/eventpolicy/fake"
) )
const ( const (
@ -121,7 +123,7 @@ func TestReceiver(t *testing.T) {
expectedStatus: http.StatusBadRequest, expectedStatus: http.StatusBadRequest,
}, },
"Path too long": { "Path too long": {
request: httptest.NewRequest(http.MethodPost, "/triggers/test-namespace/test-trigger/extra", nil), request: httptest.NewRequest(http.MethodPost, "/triggers/test-namespace/test-trigger/uuid/extra/extra", nil),
expectedStatus: http.StatusBadRequest, expectedStatus: http.StatusBadRequest,
}, },
"Path without prefix": { "Path without prefix": {
@ -130,7 +132,7 @@ func TestReceiver(t *testing.T) {
}, },
"Trigger.Get fails": { "Trigger.Get fails": {
// No trigger exists, so the Get will fail. // No trigger exists, so the Get will fail.
expectedStatus: http.StatusBadRequest, expectedStatus: http.StatusNotFound,
}, },
"Trigger doesn't have SubscriberURI": { "Trigger doesn't have SubscriberURI": {
triggers: []*eventingv1.Trigger{ triggers: []*eventingv1.Trigger{

View File

@ -23,6 +23,7 @@ import (
"strings" "strings"
"time" "time"
apierrors "k8s.io/apimachinery/pkg/api/errors"
"k8s.io/utils/ptr" "k8s.io/utils/ptr"
opencensusclient "github.com/cloudevents/sdk-go/observability/opencensus/v2/client" opencensusclient "github.com/cloudevents/sdk-go/observability/opencensus/v2/client"
@ -226,6 +227,11 @@ func (h *Handler) ServeHTTP(writer http.ResponseWriter, request *http.Request) {
} }
broker, err := h.getBroker(brokerName, brokerNamespace) broker, err := h.getBroker(brokerName, brokerNamespace)
if apierrors.IsNotFound(err) {
h.Logger.Warn("Failed to retrieve broker", zap.Error(err))
writer.WriteHeader(http.StatusNotFound)
return
}
if err != nil { if err != nil {
h.Logger.Warn("Failed to retrieve broker", zap.Error(err)) h.Logger.Warn("Failed to retrieve broker", zap.Error(err))
writer.WriteHeader(http.StatusBadRequest) writer.WriteHeader(http.StatusBadRequest)
@ -315,7 +321,7 @@ func (h *Handler) receive(ctx context.Context, headers http.Header, event *cloud
channelAddress, err := h.getChannelAddress(brokerObj) channelAddress, err := h.getChannelAddress(brokerObj)
if err != nil { if err != nil {
h.Logger.Warn("could not get channel address from broker", zap.Error(err)) h.Logger.Warn("could not get channel address from broker", zap.Error(err))
return http.StatusBadRequest, kncloudevents.NoDuration return http.StatusInternalServerError, kncloudevents.NoDuration
} }
opts := []kncloudevents.SendOption{ opts := []kncloudevents.SendOption{

View File

@ -26,11 +26,12 @@ import (
"testing" "testing"
"time" "time"
"knative.dev/eventing/pkg/eventingtls"
filteredconfigmapinformer "knative.dev/pkg/client/injection/kube/informers/core/v1/configmap/filtered/fake" filteredconfigmapinformer "knative.dev/pkg/client/injection/kube/informers/core/v1/configmap/filtered/fake"
filteredFactory "knative.dev/pkg/client/injection/kube/informers/factory/filtered" filteredFactory "knative.dev/pkg/client/injection/kube/informers/factory/filtered"
"knative.dev/pkg/system" "knative.dev/pkg/system"
"knative.dev/eventing/pkg/eventingtls"
"github.com/cloudevents/sdk-go/v2/client" "github.com/cloudevents/sdk-go/v2/client"
"github.com/cloudevents/sdk-go/v2/event" "github.com/cloudevents/sdk-go/v2/event"
cehttp "github.com/cloudevents/sdk-go/v2/protocol/http" cehttp "github.com/cloudevents/sdk-go/v2/protocol/http"
@ -54,10 +55,11 @@ import (
brokerinformerfake "knative.dev/eventing/pkg/client/injection/informers/eventing/v1/broker/fake" brokerinformerfake "knative.dev/eventing/pkg/client/injection/informers/eventing/v1/broker/fake"
eventpolicyinformerfake "knative.dev/eventing/pkg/client/injection/informers/eventing/v1alpha1/eventpolicy/fake" eventpolicyinformerfake "knative.dev/eventing/pkg/client/injection/informers/eventing/v1alpha1/eventpolicy/fake"
// Fake injection client
_ "knative.dev/eventing/pkg/client/injection/informers/eventing/v1alpha1/eventpolicy/fake"
_ "knative.dev/pkg/client/injection/kube/client/fake" _ "knative.dev/pkg/client/injection/kube/client/fake"
_ "knative.dev/pkg/client/injection/kube/informers/factory/filtered/fake" _ "knative.dev/pkg/client/injection/kube/informers/factory/filtered/fake"
// Fake injection client
_ "knative.dev/eventing/pkg/client/injection/informers/eventing/v1alpha1/eventpolicy/fake"
) )
const ( const (
@ -223,9 +225,9 @@ func TestHandler_ServeHTTP(t *testing.T) {
method: nethttp.MethodPost, method: nethttp.MethodPost,
uri: "/ns/name", uri: "/ns/name",
body: getValidEvent(), body: getValidEvent(),
statusCode: nethttp.StatusBadRequest, statusCode: nethttp.StatusInternalServerError,
handler: handler(), handler: handler(),
reporter: &mockReporter{StatusCode: nethttp.StatusBadRequest, EventDispatchTimeReported: false}, reporter: &mockReporter{StatusCode: nethttp.StatusInternalServerError, EventDispatchTimeReported: false},
defaulter: broker.TTLDefaulter(logger, 100), defaulter: broker.TTLDefaulter(logger, 100),
brokers: []*eventingv1.Broker{ brokers: []*eventingv1.Broker{
withUninitializedAnnotations(makeBroker("name", "ns")), withUninitializedAnnotations(makeBroker("name", "ns")),

View File

@ -25,11 +25,11 @@ import (
"k8s.io/client-go/tools/cache" "k8s.io/client-go/tools/cache"
"knative.dev/pkg/apis" "knative.dev/pkg/apis"
configmapinformer "knative.dev/pkg/client/injection/kube/informers/core/v1/configmap" configmapinformer "knative.dev/pkg/client/injection/kube/informers/core/v1/configmap"
endpointsinformer "knative.dev/pkg/client/injection/kube/informers/core/v1/endpoints"
"knative.dev/pkg/configmap" "knative.dev/pkg/configmap"
"knative.dev/pkg/controller" "knative.dev/pkg/controller"
"knative.dev/pkg/injection/clients/dynamicclient" "knative.dev/pkg/injection/clients/dynamicclient"
secretinformer "knative.dev/pkg/injection/clients/namespacedkube/informers/core/v1/secret" secretinformer "knative.dev/pkg/injection/clients/namespacedkube/informers/core/v1/secret"
namespacedinformerfactory "knative.dev/pkg/injection/clients/namespacedkube/informers/factory"
"knative.dev/pkg/logging" "knative.dev/pkg/logging"
pkgreconciler "knative.dev/pkg/reconciler" pkgreconciler "knative.dev/pkg/reconciler"
"knative.dev/pkg/resolver" "knative.dev/pkg/resolver"
@ -69,7 +69,12 @@ func NewController(
logger := logging.FromContext(ctx) logger := logging.FromContext(ctx)
brokerInformer := brokerinformer.Get(ctx) brokerInformer := brokerinformer.Get(ctx)
subscriptionInformer := subscriptioninformer.Get(ctx) subscriptionInformer := subscriptioninformer.Get(ctx)
endpointsInformer := endpointsinformer.Get(ctx)
endpointsInformer := namespacedinformerfactory.Get(ctx).Core().V1().Endpoints()
if err := controller.StartInformers(ctx.Done(), endpointsInformer.Informer()); err != nil {
logger.Fatalw("Failed to start namespaced endpoints informer", zap.Error(err))
}
configmapInformer := configmapinformer.Get(ctx) configmapInformer := configmapinformer.Get(ctx)
secretInformer := secretinformer.Get(ctx) secretInformer := secretinformer.Get(ctx)
eventPolicyInformer := eventpolicyinformer.Get(ctx) eventPolicyInformer := eventpolicyinformer.Get(ctx)

View File

@ -28,10 +28,6 @@ import (
utilrand "k8s.io/apimachinery/pkg/util/rand" utilrand "k8s.io/apimachinery/pkg/util/rand"
clientgotesting "k8s.io/client-go/testing" clientgotesting "k8s.io/client-go/testing"
"k8s.io/utils/ptr" "k8s.io/utils/ptr"
fakeeventingclient "knative.dev/eventing/pkg/client/injection/client/fake"
jobsinkreconciler "knative.dev/eventing/pkg/client/injection/reconciler/sinks/v1alpha1/jobsink"
. "knative.dev/eventing/pkg/reconciler/testing/v1"
. "knative.dev/eventing/pkg/reconciler/testing/v1alpha1"
"knative.dev/pkg/apis" "knative.dev/pkg/apis"
duckv1 "knative.dev/pkg/apis/duck/v1" duckv1 "knative.dev/pkg/apis/duck/v1"
v1 "knative.dev/pkg/client/injection/ducks/duck/v1/addressable" v1 "knative.dev/pkg/client/injection/ducks/duck/v1/addressable"
@ -40,6 +36,12 @@ import (
logtesting "knative.dev/pkg/logging/testing" logtesting "knative.dev/pkg/logging/testing"
"knative.dev/pkg/network" "knative.dev/pkg/network"
. "knative.dev/pkg/reconciler/testing" . "knative.dev/pkg/reconciler/testing"
"knative.dev/eventing/pkg/apis/sinks/v1alpha1"
fakeeventingclient "knative.dev/eventing/pkg/client/injection/client/fake"
jobsinkreconciler "knative.dev/eventing/pkg/client/injection/reconciler/sinks/v1alpha1/jobsink"
. "knative.dev/eventing/pkg/reconciler/testing/v1"
. "knative.dev/eventing/pkg/reconciler/testing/v1alpha1"
) )
const ( const (
@ -198,6 +200,36 @@ func TestReconcile(t *testing.T) {
}, },
}, },
}, },
{
Name: "Successful reconciliation, observed generation",
Key: testKey,
Objects: []runtime.Object{
NewJobSink(jobSinkName, testNamespace,
func(sink *v1alpha1.JobSink) {
sink.Generation = 4242
},
WithJobSinkJob(testJob("")),
WithInitJobSinkConditions),
},
WantErr: false,
WantCreates: []runtime.Object{
testJob("test-jobSinkml6mm"),
},
WantStatusUpdates: []clientgotesting.UpdateActionImpl{
{
Object: NewJobSink(jobSinkName, testNamespace,
WithJobSinkJob(testJob("")),
WithJobSinkAddressableReady(),
WithJobSinkJobStatusSelector(),
WithJobSinkAddress(&jobSinkAddressable),
func(sink *v1alpha1.JobSink) {
sink.Generation = 4242
sink.Status.ObservedGeneration = 4242
},
WithJobSinkEventPoliciesReadyBecauseOIDCDisabled()),
},
},
},
} }
logger := logtesting.TestLogger(t) logger := logtesting.TestLogger(t)

View File

@ -1,147 +1,72 @@
# Knative Eventing Multi-Tenant Scheduler with High-Availability # Knative Eventing Multi-Tenant Scheduler with High-Availability
An eventing source instance (for example, [KafkaSource](https://github.com/knative-extensions/eventing-kafka/tree/main/pkg/source), [RedisStreamSource](https://github.com/knative-extensions/eventing-redis/tree/main/source), etc) gets materialized as a virtual pod (**vpod**) and can be scaled up and down by increasing or decreasing the number of virtual pod replicas (**vreplicas**). A vreplica corresponds to a resource in the source that can replicated for maximum distributed processing (for example, number of consumers running in a consumer group). An eventing source instance (for example, KafkaSource, etc) gets materialized as a virtual pod (*
*vpod**) and can be scaled up and down by increasing or decreasing the number of virtual pod
replicas (**vreplicas**). A vreplica corresponds to a resource in the source that can replicated for
maximum distributed processing (for example, number of consumers running in a consumer group).
The vpod multi-tenant [scheduler](#1scheduler) is responsible for placing vreplicas onto real Kubernetes pods. Each pod is limited in capacity and can hold a maximum number of vreplicas. The scheduler takes a list of (source, # of vreplicas) tuples and computes a set of Placements. Placement info are added to the source status. The vpod multi-tenant [scheduler](#scheduler) is responsible for placing vreplicas onto real
Kubernetes pods. Each pod is limited in capacity and can hold a maximum number of vreplicas. The
scheduler takes a list of (source, # of vreplicas) tuples and computes a set of Placements.
Placement info are added to the source status.
Scheduling strategies rely on pods having a sticky identity (StatefulSet replicas) and the current [State](#4state-collector) of the cluster. Scheduling strategies rely on pods having a sticky identity (StatefulSet replicas) and the
current [State](#state-collector) of the cluster.
When a vreplica cannot be scheduled it is added to the list of pending vreplicas. The [Autoscaler](#3autoscaler) monitors this list and allocates more pods for placing it.
To support high-availability the scheduler distributes vreplicas uniformly across failure domains such as zones/nodes/pods containing replicas from a StatefulSet.
## General Scheduler Requirements
1. High Availability: Vreplicas for a source must be evenly spread across domains to reduce impact of failure when a zone/node/pod goes unavailable for scheduling.*
2. Equal event consumption: Vreplicas for a source must be evenly spread across adapter pods to provide an equal rate of processing events. For example, Kafka broker spreads partitions equally across pods so if vreplicas arent equally spread, pods with fewer vreplicas will consume events slower than others.
3. Pod spread not more than available resources: Vreplicas for a source must be evenly spread across pods such that the total number of pods with placements does not exceed the number of resources available from the source (for example, number of Kafka partitions for the topic it's consuming from). Else, the additional pods have no resources (Kafka partitions) to consume events from and could waste Kubernetes resources.
* Note: StatefulSet anti-affinity rules guarantee new pods to be scheduled on a new zone and node.
## Components: ## Components:
### 1.Scheduler ### Scheduler
The scheduling framework has a pluggable architecture where plugins are registered and compiled into the scheduler. It allows many scheduling features to be implemented as plugins, while keeping the scheduling "core" simple and maintainable.
Scheduling happens in a series of stages: The scheduler allocates as many as vreplicas as possible into the lowest possible StatefulSet
ordinal
number before triggering the autoscaler when no more capacity is left to schedule vpods.
1. **Filter**: These plugins (predicates) are used to filter out pods where a vreplica cannot be placed. If any filter plugin marks the pod as infeasible, the remaining plugins will not be called for that pod. A vreplica is marked as unschedulable if no pods pass all the filters. ### Autoscaler
2. **Score**: These plugins (priorities) provide a score to each pod that has passed the filtering phase. Scheduler will then select the pod with the highest weighted scores sum. The autoscaler scales up pod replicas of the statefulset adapter when there are vreplicas pending to
be scheduled, and scales down if there are unused pods.
Scheduler must be Knative generic with its core functionality implemented as core plugins. Anything specific to an eventing source will be implemented as separate plugins (for example, number of Kafka partitions) ### State Collector
It allocates one vreplica at a time by filtering and scoring schedulable pods. Current state information about the cluster is collected after placing each vreplica and during
intervals. Cluster information include computing the free capacity for each pod, list of schedulable
pods (unschedulable pods are pods that are marked for eviction for compacting, number of pods (
stateful set replicas), total number of vreplicas in each pod for each vpod (spread).
A vreplica can be unschedulable for several reasons such as pods not having enough capacity, constraints cannot be fulfilled, etc. ### Evictor
### 2.Descheduler Autoscaler periodically attempts to compact veplicas into a smaller number of free replicas with
lower ordinals. Vreplicas placed on higher ordinal pods are evicted and rescheduled to pods with a
Similar to scheduler but has its own set of priorities (no predicates today). lower ordinal using the same scheduling strategies.
### 3.Autoscaler
The autoscaler scales up pod replicas of the statefulset adapter when there are vreplicas pending to be scheduled, and scales down if there are unused pods. It takes into consideration a scaling factor that is based on number of domains for HA.
### 4.State Collector
Current state information about the cluster is collected after placing each vreplica and during intervals. Cluster information include computing the free capacity for each pod, list of schedulable pods (unschedulable pods are pods that are marked for eviction for compacting, and pods that are on unschedulable nodes (cordoned or unreachable nodes), number of pods (stateful set replicas), number of available nodes, number of zones, a node to zone map, total number of vreplicas in each pod for each vpod (spread), total number of vreplicas in each node for each vpod (spread), total number of vreplicas in each zone for each vpod (spread), etc.
### 5.Reservation
Scheduler also tracks vreplicas that have been placed (ie. scheduled) but haven't been committed yet to its vpod status. These reserved veplicas are taken into consideration when computing cluster's state for scheduling the next vreplica.
### 6.Evictor
Autoscaler periodically attempts to compact veplicas into a smaller number of free replicas with lower ordinals. Vreplicas placed on higher ordinal pods are evicted and rescheduled to pods with a lower ordinal using the same scheduling strategies.
## Scheduler Profile
### Predicates:
1. **PodFitsResources**: check if a pod has enough capacity [CORE]
2. **NoMaxResourceCount**: check if total number of placement pods exceed available resources [KAFKA]. It has an argument `NumPartitions` to configure the plugin with the total number of Kafka partitions.
3. **EvenPodSpread**: check if resources are evenly spread across pods [CORE]. It has an argument `MaxSkew` to configure the plugin with an allowed skew factor.
### Priorities:
1. **AvailabilityNodePriority**: make sure resources are evenly spread across nodes [CORE]. It has an argument `MaxSkew` to configure the plugin with an allowed skew factor.
2. **AvailabilityZonePriority**: make sure resources are evenly spread across zones [CORE]. It has an argument `MaxSkew` to configure the plugin with an allowed skew factor.
3. **LowestOrdinalPriority**: make sure vreplicas are placed on free smaller ordinal pods to minimize resource usage [CORE]
**Example ConfigMap for config-scheduler:**
```
data:
predicates: |+
[
{"Name": "PodFitsResources"},
{"Name": "NoMaxResourceCount",
"Args": "{\"NumPartitions\": 100}"},
{"Name": "EvenPodSpread",
"Args": "{\"MaxSkew\": 2}"}
]
priorities: |+
[
{"Name": "AvailabilityZonePriority",
"Weight": 10,
"Args": "{\"MaxSkew\": 2}"},
{"Name": "LowestOrdinalPriority",
"Weight": 2}
]
```
## Descheduler Profile:
### Priorities:
1. **RemoveWithAvailabilityNodePriority**: make sure resources are evenly spread across nodes [CORE]
2. **RemoveWithAvailabilityZonePriority**: make sure resources are evenly spread across zones [CORE]
3. **HighestOrdinalPriority**: make sure vreps are removed from higher ordinal pods to minimize resource usage [CORE]
**Example ConfigMap for config-descheduler:**
```
data:
priorities: |+
[
{"Name": "RemoveWithEvenPodSpreadPriority",
"Weight": 10,
"Args": "{\"MaxSkew\": 2}"},
{"Name": "RemoveWithAvailabilityZonePriority",
"Weight": 10,
"Args": "{\"MaxSkew\": 2}"},
{"Name": "RemoveWithHighestOrdinalPriority",
"Weight": 2}
]
```
## Normal Operation ## Normal Operation
1. **Busy scheduler**: 1. **Busy scheduler**:
Scheduler can be very busy allocating the best placements for multiple eventing sources at a time using the scheduler predicates and priorities configured. During this time, the cluster could see statefulset replicas increasing, as the autoscaler computes how many more pods are needed to complete scheduling successfully. Also, the replicas could be decreasing during idle time, either caused by less events flowing through the system, or the evictor compacting vreplicas placements into a smaller number of pods or the deletion of event sources. The current placements are stored in the eventing source's status field for observability. Scheduler can be very busy allocating the best placements for multiple eventing sources at a time
using the scheduler predicates and priorities configured. During this time, the cluster could see
statefulset replicas increasing, as the autoscaler computes how many more pods are needed to
complete scheduling successfully. Also, the replicas could be decreasing during idle time, either
caused by less events flowing through the system, or the evictor compacting vreplicas placements
into a smaller number of pods or the deletion of event sources. The current placements are stored in
the eventing source's status field for observability.
2. **Software upgrades**: 2. **Software upgrades**:
We can expect periodic software version upgrades or fixes to be performed on the Kubernetes cluster running the scheduler or on the Knative framework installed. Either of these scenarios could involve graceful rebooting of nodes and/or reapplying of controllers, adapters and other resources. We can expect periodic software version upgrades or fixes to be performed on the Kubernetes cluster
running the scheduler or on the Knative framework installed. Either of these scenarios could involve
graceful rebooting of nodes and/or reapplying of controllers, adapters and other resources.
All existing vreplica placements will still be valid and no rebalancing will be done by the vreplica scheduler. All existing vreplica placements will still be valid and no rebalancing will be done by the vreplica
(For Kafka, its broker may trigger a rebalancing of partitions due to consumer group member changes.) scheduler.
(For Kafka, its broker may trigger a rebalancing of partitions due to consumer group member
TODO: Measure latencies in events processing using a performance tool (KPerf eventing). changes.)
3. **No more cluster resources**: 3. **No more cluster resources**:
When there are no resources available on existing nodes in the cluster to schedule more pods and the autoscaler continues to scale up replicas, the new pods are left in a Pending state till cluster size is increased. Nothing to do for the scheduler until then. When there are no resources available on existing nodes in the cluster to schedule more pods and the
autoscaler continues to scale up replicas, the new pods are left in a Pending state till cluster
size is increased. Nothing to do for the scheduler until then.
## Disaster Recovery ## Disaster Recovery
@ -149,91 +74,14 @@ Some failure scenarios are described below:
1. **Pod failure**: 1. **Pod failure**:
When a pod/replica in a StatefulSet goes down due to some reason (but its node and zone are healthy), a new replica is spun up by the StatefulSet with the same pod identity (pod can come up on a different node) almost immediately. When a pod/replica in a StatefulSet goes down due to some reason (but its node and zone are
healthy), a new replica is spun up by the StatefulSet with the same pod identity (pod can come up on
a different node) almost immediately.
All existing vreplica placements will still be valid and no rebalancing will be done by the vreplica scheduler. All existing vreplica placements will still be valid and no rebalancing will be done by the vreplica
(For Kafka, its broker may trigger a rebalancing of partitions due to consumer group member changes.) scheduler.
(For Kafka, its broker may trigger a rebalancing of partitions due to consumer group member
TODO: Measure latencies in events processing using a performance tool (KPerf eventing). changes.)
2. **Node failure (graceful)**:
When a node is rebooted for upgrades etc, running pods on the node will be evicted (drained), gracefully terminated and rescheduled on a different node. The drained node will be marked as unschedulable by K8 (`node.Spec.Unschedulable` = True) after its cordoning.
```
k describe node knative-worker4
Name: knative-worker4
CreationTimestamp: Mon, 30 Aug 2021 11:13:11 -0400
Taints: none
Unschedulable: true
```
All existing vreplica placements will still be valid and no rebalancing will be done by the vreplica scheduler.
(For Kafka, its broker may trigger a rebalancing of partitions due to consumer group member changes.)
TODO: Measure latencies in events processing using a performance tool (KPerf eventing).
New vreplicas will not be scheduled on pods running on this cordoned node.
3. **Node failure (abrupt)**:
When a node goes down unexpectedly due to some physical machine failure (network isolation/ loss, CPU issue, power loss, etc), the node controller does the following few steps
Pods running on the failed node receives a NodeNotReady Warning event
```
k describe pod kafkasource-mt-adapter-5 -n knative-eventing
Name: kafkasource-mt-adapter-5
Namespace: knative-eventing
Priority: 0
Node: knative-worker4/172.18.0.3
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 11m default-scheduler Successfully assigned knative-eventing/kafkasource-mt-adapter-5 to knative-worker4
Normal Pulled 11m kubelet Container image
Normal Created 11m kubelet Created container receive-adapter
Normal Started 11m kubelet Started container receive-adapter
Warning NodeNotReady 3m48s node-controller Node is not ready
```
Failing node is tainted with the following Key:Condition: by the node controller if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds)
```
k describe node knative-worker4
Name: knative-worker4
Taints: node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule
Unschedulable: false
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal NodeNotSchedulable 5m42s kubelet Node knative-worker4 status is now: NodeNotSchedulable
Normal NodeSchedulable 2m31s kubelet Node knative-worker4 status is now: NodeSchedulable
```
```
k get nodes
NAME STATUS ROLES AGE VERSION
knative-control-plane Ready control-plane,master 7h23m v1.21.1
knative-worker Ready <none> 7h23m v1.21.1
knative-worker2 Ready <none> 7h23m v1.21.1
knative-worker3 Ready <none> 7h23m v1.21.1
knative-worker4 NotReady <none> 7h23m v1.21.1
```
After a timeout period (`pod-eviction-timeout` == 5 mins (default)), the pods move to the Terminating state.
Since statefulset now has a `terminationGracePeriodSeconds: 0` setting, the terminating pods are immediately restarted on another functioning Node. A new replica is spun up with the same ordinal.
During the time period of the failing node being unreachable (~5mins), vreplicas placed on that pod arent available to process work from the eventing source. (Theory) Consumption rate goes down and Kafka eventually triggers rebalancing of partitions. Also, KEDA will scale up the number of consumers to resolve the processing lag. A scale up will cause the Eventing scheduler to rebalance the total vreplicas for that source on available running pods.
4. **Zone failure**:
All nodes running in the failing zone will be unavailable for scheduling. Nodes will either be tainted with `unreachable` or Speced as `Unschedulable`
See node failure scenarios above for what happens to vreplica placements.
## References: ## References:
@ -246,7 +94,6 @@ See node failure scenarios above for what happens to vreplica placements.
* https://medium.com/tailwinds-navigator/kubernetes-tip-how-statefulsets-behave-differently-than-deployments-when-node-fails-d29e36bca7d5 * https://medium.com/tailwinds-navigator/kubernetes-tip-how-statefulsets-behave-differently-than-deployments-when-node-fails-d29e36bca7d5
* https://kubernetes.io/docs/concepts/architecture/nodes/#node-controller * https://kubernetes.io/docs/concepts/architecture/nodes/#node-controller
--- ---
To learn more about Knative, please visit the To learn more about Knative, please visit the

View File

@ -14,5 +14,5 @@ See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
*/ */
// The scheduler is responsible for placing virtual pod (VPod) replicas within real pods. // Package scheduler is responsible for placing virtual pod (VPod) replicas within real pods.
package scheduler package scheduler

View File

@ -1,88 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package factory
import (
"fmt"
state "knative.dev/eventing/pkg/scheduler/state"
)
// RegistryFP is a collection of all available filter plugins.
type RegistryFP map[string]state.FilterPlugin
// RegistrySP is a collection of all available scoring plugins.
type RegistrySP map[string]state.ScorePlugin
var (
FilterRegistry = make(RegistryFP)
ScoreRegistry = make(RegistrySP)
)
// Register adds a new plugin to the registry. If a plugin with the same name
// exists, it returns an error.
func RegisterFP(name string, factory state.FilterPlugin) error {
if _, ok := FilterRegistry[name]; ok {
return fmt.Errorf("a filter plugin named %v already exists", name)
}
FilterRegistry[name] = factory
return nil
}
// Unregister removes an existing plugin from the registry. If no plugin with
// the provided name exists, it returns an error.
func UnregisterFP(name string) error {
if _, ok := FilterRegistry[name]; !ok {
return fmt.Errorf("no filter plugin named %v exists", name)
}
delete(FilterRegistry, name)
return nil
}
func GetFilterPlugin(name string) (state.FilterPlugin, error) {
if f, exist := FilterRegistry[name]; exist {
return f, nil
}
return nil, fmt.Errorf("no fitler plugin named %v exists", name)
}
// Register adds a new plugin to the registry. If a plugin with the same name
// exists, it returns an error.
func RegisterSP(name string, factory state.ScorePlugin) error {
if _, ok := ScoreRegistry[name]; ok {
return fmt.Errorf("a score plugin named %v already exists", name)
}
ScoreRegistry[name] = factory
return nil
}
// Unregister removes an existing plugin from the registry. If no plugin with
// the provided name exists, it returns an error.
func UnregisterSP(name string) error {
if _, ok := ScoreRegistry[name]; !ok {
return fmt.Errorf("no score plugin named %v exists", name)
}
delete(ScoreRegistry, name)
return nil
}
func GetScorePlugin(name string) (state.ScorePlugin, error) {
if f, exist := ScoreRegistry[name]; exist {
return f, nil
}
return nil, fmt.Errorf("no score plugin named %v exists", name)
}

View File

@ -17,7 +17,6 @@ limitations under the License.
package scheduler package scheduler
import ( import (
"k8s.io/apimachinery/pkg/util/sets"
duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1" duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1"
) )
@ -29,24 +28,3 @@ func GetTotalVReplicas(placements []duckv1alpha1.Placement) int32 {
} }
return r return r
} }
// GetPlacementForPod returns the placement corresponding to podName
func GetPlacementForPod(placements []duckv1alpha1.Placement, podName string) *duckv1alpha1.Placement {
for i := 0; i < len(placements); i++ {
if placements[i].PodName == podName {
return &placements[i]
}
}
return nil
}
// GetPodCount returns the number of pods with the given placements
func GetPodCount(placements []duckv1alpha1.Placement) int {
set := sets.NewString()
for _, p := range placements {
if p.VReplicas > 0 {
set.Insert(p.PodName)
}
}
return set.Len()
}

View File

@ -62,95 +62,3 @@ func TestGetTotalVReplicas(t *testing.T) {
}) })
} }
} }
func TestGetPlacementForPod(t *testing.T) {
ps1 := []duckv1alpha1.Placement{{PodName: "p", VReplicas: 2}}
ps2 := []duckv1alpha1.Placement{{PodName: "p", VReplicas: 2}, {PodName: "p2", VReplicas: 4}}
testCases := []struct {
name string
podName string
placements []duckv1alpha1.Placement
expected *duckv1alpha1.Placement
}{
{
name: "nil placements",
podName: "p",
placements: nil,
expected: nil,
},
{
name: "empty placements",
podName: "p",
placements: []duckv1alpha1.Placement{},
expected: nil,
},
{
name: "one placement",
placements: ps1,
podName: "p",
expected: &ps1[0],
}, {
name: "mayne placements",
placements: ps2,
podName: "p2",
expected: &ps2[1],
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
got := GetPlacementForPod(tc.placements, tc.podName)
if got != tc.expected {
t.Errorf("got %v, want %v", got, tc.expected)
}
})
}
}
func TestPodCount(t *testing.T) {
testCases := []struct {
name string
placements []duckv1alpha1.Placement
expected int
}{
{
name: "nil placements",
placements: nil,
expected: 0,
},
{
name: "empty placements",
placements: []duckv1alpha1.Placement{},
expected: 0,
},
{
name: "one pod",
placements: []duckv1alpha1.Placement{{PodName: "d", VReplicas: 2}},
expected: 1,
},
{
name: "two pods",
placements: []duckv1alpha1.Placement{
{PodName: "p1", VReplicas: 2},
{PodName: "p2", VReplicas: 6},
{PodName: "p1", VReplicas: 6}},
expected: 2,
},
{
name: "three pods, one with no vreplicas",
placements: []duckv1alpha1.Placement{
{PodName: "p1", VReplicas: 2},
{PodName: "p2", VReplicas: 6},
{PodName: "p1", VReplicas: 0}},
expected: 2,
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
got := GetPodCount(tc.placements)
if got != tc.expected {
t.Errorf("got %v, want %v", got, tc.expected)
}
})
}
}

View File

@ -1,111 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package availabilitynodepriority
import (
"context"
"encoding/json"
"math"
"strings"
"k8s.io/apimachinery/pkg/types"
"knative.dev/eventing/pkg/scheduler/factory"
state "knative.dev/eventing/pkg/scheduler/state"
"knative.dev/pkg/logging"
)
// AvailabilityNodePriority is a score plugin that favors pods that create an even spread of resources across nodes for HA
type AvailabilityNodePriority struct {
}
// Verify AvailabilityNodePriority Implements ScorePlugin Interface
var _ state.ScorePlugin = &AvailabilityNodePriority{}
// Name of the plugin
const Name = state.AvailabilityNodePriority
const (
ErrReasonInvalidArg = "invalid arguments"
ErrReasonNoResource = "node does not exist"
)
func init() {
factory.RegisterSP(Name, &AvailabilityNodePriority{})
}
// Name returns name of the plugin
func (pl *AvailabilityNodePriority) Name() string {
return Name
}
// Score invoked at the score extension point. The "score" returned in this function is higher for nodes that create an even spread across nodes.
func (pl *AvailabilityNodePriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) {
logger := logging.FromContext(ctx).With("Score", pl.Name())
var score uint64 = 0
spreadArgs, ok := args.(string)
if !ok {
logger.Errorf("Scoring args %v for priority %q are not valid", args, pl.Name())
return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
skewVal := state.AvailabilityNodePriorityArgs{}
decoder := json.NewDecoder(strings.NewReader(spreadArgs))
decoder.DisallowUnknownFields()
if err := decoder.Decode(&skewVal); err != nil {
return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
if states.Replicas > 0 { //need at least a pod to compute spread
var skew int32
_, nodeName, err := states.GetPodInfo(state.PodNameFromOrdinal(states.StatefulSetName, podID))
if err != nil {
return score, state.NewStatus(state.Error, ErrReasonNoResource)
}
currentReps := states.NodeSpread[key][nodeName] //get #vreps on this node
for otherNodeName := range states.NodeToZoneMap { //compare with #vreps on other nodes
if otherNodeName != nodeName {
otherReps := states.NodeSpread[key][otherNodeName]
if skew = (currentReps + 1) - otherReps; skew < 0 {
skew = skew * int32(-1)
}
//logger.Infof("Current Node %v with %d and Other Node %v with %d causing skew %d", nodeName, currentReps, otherNodeName, otherReps, skew)
if skew > skewVal.MaxSkew {
logger.Infof("Pod %d in node %v will cause an uneven node spread %v with other node %v", podID, nodeName, states.NodeSpread[key], otherNodeName)
}
score = score + uint64(skew)
}
}
score = math.MaxUint64 - score //lesser skews get higher score
}
return score, state.NewStatus(state.Success)
}
// ScoreExtensions of the Score plugin.
func (pl *AvailabilityNodePriority) ScoreExtensions() state.ScoreExtensions {
return pl
}
// NormalizeScore invoked after scoring all pods.
func (pl *AvailabilityNodePriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status {
return nil
}

View File

@ -1,257 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package availabilitynodepriority
import (
"fmt"
"math"
"reflect"
"testing"
"github.com/stretchr/testify/assert"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
listers "knative.dev/eventing/pkg/reconciler/testing/v1"
"knative.dev/eventing/pkg/scheduler"
state "knative.dev/eventing/pkg/scheduler/state"
tscheduler "knative.dev/eventing/pkg/scheduler/testing"
kubeclient "knative.dev/pkg/client/injection/kube/client/fake"
)
const (
testNs = "test-ns"
sfsName = "statefulset-name"
vpodName = "source-name"
vpodNamespace = "source-namespace"
numNodes = 3
)
func TestScore(t *testing.T) {
testCases := []struct {
name string
state *state.State
vpod types.NamespacedName
replicas int32
podID int32
expected *state.Status
expScore uint64
args interface{}
}{
{
name: "no vpods, no pods",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: sfsName, Replicas: 0,
NodeSpread: map[types.NamespacedName]map[string]int32{}},
replicas: 0,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: 0,
args: "{\"MaxSkew\": 2}",
},
{
name: "no vpods, no pods, bad arg",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: sfsName, Replicas: 0,
ZoneSpread: map[types.NamespacedName]map[string]int32{}},
replicas: 0,
podID: 0,
expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg),
expScore: 0,
args: "{\"MaxSkewness\": 2}",
},
{
name: "no vpods, no pods, no resource",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: sfsName, Replicas: 1,
ZoneSpread: map[types.NamespacedName]map[string]int32{}},
replicas: 0,
podID: 1,
expected: state.NewStatus(state.Error, ErrReasonNoResource),
expScore: 0,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, one zone, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 1,
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"node0": 5,
},
},
},
replicas: 1,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 18,
args: "{\"MaxSkew\": 2}",
},
{
name: "two vpods, one zone, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 1,
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"node0": 5,
},
{Name: vpodName + "-1", Namespace: vpodNamespace + "-1"}: {
"node1": 4,
},
},
},
replicas: 1,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 18,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, two zones, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 2, NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"node0": 5,
"node1": 5,
"node2": 3,
},
}},
replicas: 2,
podID: 1,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 10,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, three zones, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 3, NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"node0": 5,
"node1": 4,
"node2": 3,
},
}},
replicas: 3,
podID: 1,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 7,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, five pods, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 5, NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"node0": 8,
"node1": 4,
"node2": 3,
},
}},
replicas: 5,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 20,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, four pods, unknown zone",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{
StatefulSetName: sfsName,
Replicas: 4,
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"node0": 8,
"node1": 4,
"node2": 3,
},
},
SchedulablePods: []int32{0, 1, 2}, //Pod 3 in unknown zone not included in schedulable pods
NumNodes: 4, //Includes unknown zone
},
replicas: 4,
podID: 3,
expected: state.NewStatus(state.Success), //Not failing the plugin
expScore: math.MaxUint64 - 12,
args: "{\"MaxSkew\": 2}",
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t)
var plugin = &AvailabilityNodePriority{}
name := plugin.Name()
assert.Equal(t, name, state.AvailabilityNodePriority)
nodelist := make([]*v1.Node, 0)
podlist := make([]runtime.Object, 0)
for i := int32(0); i < numNodes; i++ {
nodeName := "node" + fmt.Sprint(i)
zoneName := "zone" + fmt.Sprint(i)
node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
nodelist = append(nodelist, node)
}
nodeName := "node" + fmt.Sprint(numNodes) //Node in unknown zone
node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNodeNoLabel(nodeName), metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
nodelist = append(nodelist, node)
for i := int32(0); i < tc.replicas; i++ {
nodeName := "node" + fmt.Sprint(i)
podName := sfsName + "-" + fmt.Sprint(i)
pod, err := kubeclient.Get(ctx).CoreV1().Pods(testNs).Create(ctx, tscheduler.MakePod(testNs, podName, nodeName), metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
podlist = append(podlist, pod)
}
nodeToZoneMap := make(map[string]string)
for i := 0; i < len(nodelist); i++ {
node := nodelist[i]
zoneName, ok := node.GetLabels()[scheduler.ZoneLabel]
if ok && zoneName != "" {
nodeToZoneMap[node.Name] = zoneName
} else {
nodeToZoneMap[node.Name] = scheduler.UnknownZone
}
}
lsp := listers.NewListers(podlist)
tc.state.PodLister = lsp.GetPodLister().Pods(testNs)
tc.state.NodeToZoneMap = nodeToZoneMap
score, status := plugin.Score(ctx, tc.args, tc.state, tc.state.SchedulablePods, tc.vpod, tc.podID)
if score != tc.expScore {
t.Errorf("unexpected score, got %v, want %v", score, tc.expScore)
}
if !reflect.DeepEqual(status, tc.expected) {
t.Errorf("unexpected status, got %v, want %v", status, tc.expected)
}
})
}
}

View File

@ -1,115 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package availabilityzonepriority
import (
"context"
"encoding/json"
"math"
"strings"
"k8s.io/apimachinery/pkg/types"
"knative.dev/eventing/pkg/scheduler/factory"
state "knative.dev/eventing/pkg/scheduler/state"
"knative.dev/pkg/logging"
)
// AvailabilityZonePriority is a score plugin that favors pods that create an even spread of resources across zones for HA
type AvailabilityZonePriority struct {
}
// Verify AvailabilityZonePriority Implements ScorePlugin Interface
var _ state.ScorePlugin = &AvailabilityZonePriority{}
// Name of the plugin
const Name = state.AvailabilityZonePriority
const (
ErrReasonInvalidArg = "invalid arguments"
ErrReasonNoResource = "zone does not exist"
)
func init() {
factory.RegisterSP(Name, &AvailabilityZonePriority{})
}
// Name returns name of the plugin
func (pl *AvailabilityZonePriority) Name() string {
return Name
}
// Score invoked at the score extension point. The "score" returned in this function is higher for zones that create an even spread across zones.
func (pl *AvailabilityZonePriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) {
logger := logging.FromContext(ctx).With("Score", pl.Name())
var score uint64 = 0
spreadArgs, ok := args.(string)
if !ok {
logger.Errorf("Scoring args %v for priority %q are not valid", args, pl.Name())
return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
skewVal := state.AvailabilityZonePriorityArgs{}
decoder := json.NewDecoder(strings.NewReader(spreadArgs))
decoder.DisallowUnknownFields()
if err := decoder.Decode(&skewVal); err != nil {
return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
if states.Replicas > 0 { //need at least a pod to compute spread
var skew int32
zoneMap := make(map[string]struct{})
for _, zoneName := range states.NodeToZoneMap {
zoneMap[zoneName] = struct{}{}
}
zoneName, _, err := states.GetPodInfo(state.PodNameFromOrdinal(states.StatefulSetName, podID))
if err != nil {
return score, state.NewStatus(state.Error, ErrReasonNoResource)
}
currentReps := states.ZoneSpread[key][zoneName] //get #vreps on this zone
for otherZoneName := range zoneMap { //compare with #vreps on other zones
if otherZoneName != zoneName {
otherReps := states.ZoneSpread[key][otherZoneName]
if skew = (currentReps + 1) - otherReps; skew < 0 {
skew = skew * int32(-1)
}
//logger.Infof("Current Zone %v with %d and Other Zone %v with %d causing skew %d", zoneName, currentReps, otherZoneName, otherReps, skew)
if skew > skewVal.MaxSkew { //score low
logger.Infof("Pod %d in zone %v will cause an uneven zone spread %v with other zone %v", podID, zoneName, states.ZoneSpread[key], otherZoneName)
}
score = score + uint64(skew)
}
}
score = math.MaxUint64 - score //lesser skews get higher score
}
return score, state.NewStatus(state.Success)
}
// ScoreExtensions of the Score plugin.
func (pl *AvailabilityZonePriority) ScoreExtensions() state.ScoreExtensions {
return pl
}
// NormalizeScore invoked after scoring all pods.
func (pl *AvailabilityZonePriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status {
return nil
}

View File

@ -1,260 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package availabilityzonepriority
import (
"fmt"
"math"
"reflect"
"testing"
"github.com/stretchr/testify/assert"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
listers "knative.dev/eventing/pkg/reconciler/testing/v1"
"knative.dev/eventing/pkg/scheduler"
state "knative.dev/eventing/pkg/scheduler/state"
tscheduler "knative.dev/eventing/pkg/scheduler/testing"
kubeclient "knative.dev/pkg/client/injection/kube/client/fake"
)
const (
testNs = "test-ns"
sfsName = "statefulset-name"
vpodName = "source-name"
vpodNamespace = "source-namespace"
numZones = 3
numNodes = 6
)
func TestScore(t *testing.T) {
testCases := []struct {
name string
state *state.State
vpod types.NamespacedName
replicas int32
podID int32
expected *state.Status
expScore uint64
args interface{}
}{
{
name: "no vpods, no pods",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: sfsName, Replicas: 0,
ZoneSpread: map[types.NamespacedName]map[string]int32{}},
replicas: 0,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: 0,
args: "{\"MaxSkew\": 2}",
},
{
name: "no vpods, no pods, bad arg",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: sfsName, Replicas: 0,
ZoneSpread: map[types.NamespacedName]map[string]int32{}},
replicas: 0,
podID: 0,
expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg),
expScore: 0,
args: "{\"MaxSkewness\": 2}",
},
{
name: "no vpods, no pods, no resource",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: sfsName, Replicas: 1,
ZoneSpread: map[types.NamespacedName]map[string]int32{}},
replicas: 0,
podID: 1,
expected: state.NewStatus(state.Error, ErrReasonNoResource),
expScore: 0,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, one zone, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 1,
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"zone0": 5,
},
},
},
replicas: 1,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 18,
args: "{\"MaxSkew\": 2}",
},
{
name: "two vpods, one zone, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 1,
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"zone0": 5,
},
{Name: vpodName + "-1", Namespace: vpodNamespace + "-1"}: {
"zone1": 4,
},
},
},
replicas: 1,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 18,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, two zones, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 2, ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"zone0": 5,
"zone1": 5,
"zone2": 3,
},
}},
replicas: 2,
podID: 1,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 10,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, three zones, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 3, ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"zone0": 5,
"zone1": 4,
"zone2": 3,
},
}},
replicas: 3,
podID: 1,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 7,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, five pods, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 5, ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"zone0": 8,
"zone1": 4,
"zone2": 3,
},
}},
replicas: 5,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 20,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, seven nodes/pods, unknown zone",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{
StatefulSetName: sfsName,
Replicas: 7,
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"zone0": 8,
"zone1": 4,
"zone2": 3,
},
},
SchedulablePods: []int32{0, 1, 2, 3, 4, 5}, //Pod 6 in unknown zone not included in schedulable pods
NumZones: 4, //Includes unknown zone
},
replicas: 7,
podID: 6,
expected: state.NewStatus(state.Success), //Not failing the plugin
expScore: math.MaxUint64 - 12,
args: "{\"MaxSkew\": 2}",
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t)
var plugin = &AvailabilityZonePriority{}
name := plugin.Name()
assert.Equal(t, name, state.AvailabilityZonePriority)
nodelist := make([]*v1.Node, 0)
podlist := make([]runtime.Object, 0)
for i := int32(0); i < numZones; i++ {
for j := int32(0); j < numNodes/numZones; j++ {
nodeName := "node" + fmt.Sprint((j*((numNodes/numZones)+1))+i)
zoneName := "zone" + fmt.Sprint(i)
node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
nodelist = append(nodelist, node)
}
}
nodeName := "node" + fmt.Sprint(numNodes) //Node in unknown zone
node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNodeNoLabel(nodeName), metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
nodelist = append(nodelist, node)
for i := int32(0); i < tc.replicas; i++ {
nodeName := "node" + fmt.Sprint(i)
podName := sfsName + "-" + fmt.Sprint(i)
pod, err := kubeclient.Get(ctx).CoreV1().Pods(testNs).Create(ctx, tscheduler.MakePod(testNs, podName, nodeName), metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
podlist = append(podlist, pod)
}
nodeToZoneMap := make(map[string]string)
for i := 0; i < len(nodelist); i++ {
node := nodelist[i]
zoneName, ok := node.GetLabels()[scheduler.ZoneLabel]
if ok && zoneName != "" {
nodeToZoneMap[node.Name] = zoneName
} else {
nodeToZoneMap[node.Name] = scheduler.UnknownZone
}
}
lsp := listers.NewListers(podlist)
tc.state.PodLister = lsp.GetPodLister().Pods(testNs)
tc.state.NodeToZoneMap = nodeToZoneMap
score, status := plugin.Score(ctx, tc.args, tc.state, tc.state.SchedulablePods, tc.vpod, tc.podID)
if score != tc.expScore {
t.Errorf("unexpected score, got %v, want %v", score, tc.expScore)
}
if !reflect.DeepEqual(status, tc.expected) {
t.Errorf("unexpected status, got %v, want %v", status, tc.expected)
}
})
}
}

View File

@ -1,151 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package evenpodspread
import (
"context"
"encoding/json"
"math"
"strings"
"k8s.io/apimachinery/pkg/types"
"knative.dev/eventing/pkg/scheduler/factory"
state "knative.dev/eventing/pkg/scheduler/state"
"knative.dev/pkg/logging"
)
// EvenPodSpread is a filter or score plugin that picks/favors pods that create an equal spread of resources across pods
type EvenPodSpread struct {
}
// Verify EvenPodSpread Implements FilterPlugin and ScorePlugin Interface
var _ state.FilterPlugin = &EvenPodSpread{}
var _ state.ScorePlugin = &EvenPodSpread{}
// Name of the plugin
const (
Name = state.EvenPodSpread
ErrReasonInvalidArg = "invalid arguments"
ErrReasonUnschedulable = "pod will cause an uneven spread"
)
func init() {
factory.RegisterFP(Name, &EvenPodSpread{})
factory.RegisterSP(Name, &EvenPodSpread{})
}
// Name returns name of the plugin
func (pl *EvenPodSpread) Name() string {
return Name
}
// Filter invoked at the filter extension point.
func (pl *EvenPodSpread) Filter(ctx context.Context, args interface{}, states *state.State, key types.NamespacedName, podID int32) *state.Status {
logger := logging.FromContext(ctx).With("Filter", pl.Name())
spreadArgs, ok := args.(string)
if !ok {
logger.Errorf("Filter args %v for predicate %q are not valid", args, pl.Name())
return state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
skewVal := state.EvenPodSpreadArgs{}
decoder := json.NewDecoder(strings.NewReader(spreadArgs))
decoder.DisallowUnknownFields()
if err := decoder.Decode(&skewVal); err != nil {
return state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
if states.Replicas > 0 { //need at least a pod to compute spread
currentReps := states.PodSpread[key][state.PodNameFromOrdinal(states.StatefulSetName, podID)] //get #vreps on this podID
var skew int32
for _, otherPodID := range states.SchedulablePods { //compare with #vreps on other pods
if otherPodID != podID {
otherReps := states.PodSpread[key][state.PodNameFromOrdinal(states.StatefulSetName, otherPodID)]
if otherReps == 0 && states.Free(otherPodID) <= 0 { //other pod fully occupied by other vpods - so ignore
continue
}
if skew = (currentReps + 1) - otherReps; skew < 0 {
skew = skew * int32(-1)
}
//logger.Infof("Current Pod %d with %d and Other Pod %d with %d causing skew %d", podID, currentReps, otherPodID, otherReps, skew)
if skew > skewVal.MaxSkew {
logger.Infof("Unschedulable! Pod %d will cause an uneven spread %v with other pod %v", podID, states.PodSpread[key], otherPodID)
return state.NewStatus(state.Unschedulable, ErrReasonUnschedulable)
}
}
}
}
return state.NewStatus(state.Success)
}
// Score invoked at the score extension point. The "score" returned in this function is higher for pods that create an even spread across pods.
func (pl *EvenPodSpread) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) {
logger := logging.FromContext(ctx).With("Score", pl.Name())
var score uint64 = 0
spreadArgs, ok := args.(string)
if !ok {
logger.Errorf("Scoring args %v for priority %q are not valid", args, pl.Name())
return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
skewVal := state.EvenPodSpreadArgs{}
decoder := json.NewDecoder(strings.NewReader(spreadArgs))
decoder.DisallowUnknownFields()
if err := decoder.Decode(&skewVal); err != nil {
return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
if states.Replicas > 0 { //need at least a pod to compute spread
currentReps := states.PodSpread[key][state.PodNameFromOrdinal(states.StatefulSetName, podID)] //get #vreps on this podID
var skew int32
for _, otherPodID := range states.SchedulablePods { //compare with #vreps on other pods
if otherPodID != podID {
otherReps := states.PodSpread[key][state.PodNameFromOrdinal(states.StatefulSetName, otherPodID)]
if otherReps == 0 && states.Free(otherPodID) == 0 { //other pod fully occupied by other vpods - so ignore
continue
}
if skew = (currentReps + 1) - otherReps; skew < 0 {
skew = skew * int32(-1)
}
//logger.Infof("Current Pod %d with %d and Other Pod %d with %d causing skew %d", podID, currentReps, otherPodID, otherReps, skew)
if skew > skewVal.MaxSkew {
logger.Infof("Pod %d will cause an uneven spread %v with other pod %v", podID, states.PodSpread[key], otherPodID)
}
score = score + uint64(skew)
}
}
score = math.MaxUint64 - score //lesser skews get higher score
}
return score, state.NewStatus(state.Success)
}
// ScoreExtensions of the Score plugin.
func (pl *EvenPodSpread) ScoreExtensions() state.ScoreExtensions {
return pl
}
// NormalizeScore invoked after scoring all pods.
func (pl *EvenPodSpread) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status {
return nil
}

View File

@ -1,198 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package evenpodspread
import (
"math"
"reflect"
"testing"
"github.com/stretchr/testify/assert"
"k8s.io/apimachinery/pkg/types"
state "knative.dev/eventing/pkg/scheduler/state"
tscheduler "knative.dev/eventing/pkg/scheduler/testing"
)
func TestFilter(t *testing.T) {
testCases := []struct {
name string
state *state.State
vpod types.NamespacedName
podID int32
expScore uint64
expected *state.Status
onlyFilter bool
args interface{}
}{
{
name: "no vpods, no pods",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: "pod-name", Replicas: 0, PodSpread: map[types.NamespacedName]map[string]int32{}},
podID: 0,
expected: state.NewStatus(state.Success),
expScore: 0,
args: "{\"MaxSkew\": 2}",
},
{
name: "no vpods, no pods, bad arg",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: "pod-name", Replicas: 0, PodSpread: map[types.NamespacedName]map[string]int32{}},
podID: 0,
expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg),
expScore: 0,
args: "{\"MaxSkewness\": 2}",
},
{
name: "one vpod, one pod, same pod filter",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", Replicas: 1,
SchedulablePods: []int32{int32(0)},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
},
},
},
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64,
args: "{\"MaxSkew\": 2}",
},
{
name: "two vpods, one pod, same pod filter",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", Replicas: 1,
SchedulablePods: []int32{int32(0)},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
},
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: {
"pod-name-0": 4,
},
},
},
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, two pods,same pod filter",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", Replicas: 2,
SchedulablePods: []int32{int32(0), int32(1)},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
"pod-name-1": 5,
},
}},
podID: 1,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 1,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, five pods, same pod filter",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", Replicas: 5,
SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3), int32(4)},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
"pod-name-1": 4,
"pod-name-2": 3,
"pod-name-3": 4,
"pod-name-4": 5,
},
}},
podID: 1,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 3,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, five pods, same pod filter unschedulable",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", Replicas: 5,
SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3), int32(4)},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 7,
"pod-name-1": 4,
"pod-name-2": 3,
"pod-name-3": 4,
"pod-name-4": 5,
},
}},
podID: 0,
expected: state.NewStatus(state.Unschedulable, ErrReasonUnschedulable),
onlyFilter: true,
args: "{\"MaxSkew\": 2}",
},
{
name: "two vpods, two pods, one pod full",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", Replicas: 2,
SchedulablePods: []int32{int32(0), int32(1)},
FreeCap: []int32{int32(3), int32(0)},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
},
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: {
"pod-name-0": 2,
"pod-name-1": 10,
},
},
},
podID: 0,
expected: state.NewStatus(state.Success),
onlyFilter: true,
args: "{\"MaxSkew\": 2}",
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t)
var plugin = &EvenPodSpread{}
name := plugin.Name()
assert.Equal(t, name, state.EvenPodSpread)
status := plugin.Filter(ctx, tc.args, tc.state, tc.vpod, tc.podID)
if !reflect.DeepEqual(status, tc.expected) {
t.Errorf("unexpected status, got %v, want %v", status, tc.expected)
}
if !tc.onlyFilter {
score, status := plugin.Score(ctx, tc.args, tc.state, tc.state.SchedulablePods, tc.vpod, tc.podID)
if !reflect.DeepEqual(status, tc.expected) {
t.Errorf("unexpected state, got %v, want %v", status, tc.expected)
}
if score != tc.expScore {
t.Errorf("unexpected score, got %v, want %v", score, tc.expScore)
}
if !reflect.DeepEqual(status, tc.expected) {
t.Errorf("unexpected status, got %v, want %v", status, tc.expected)
}
}
})
}
}

View File

@ -1,61 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package lowestordinalpriority
import (
"context"
"math"
"k8s.io/apimachinery/pkg/types"
"knative.dev/eventing/pkg/scheduler/factory"
state "knative.dev/eventing/pkg/scheduler/state"
)
// LowestOrdinalPriority is a score plugin that favors pods that have a lower ordinal
type LowestOrdinalPriority struct {
}
// Verify LowestOrdinalPriority Implements ScorePlugin Interface
var _ state.ScorePlugin = &LowestOrdinalPriority{}
// Name of the plugin
const Name = state.LowestOrdinalPriority
func init() {
factory.RegisterSP(Name, &LowestOrdinalPriority{})
}
// Name returns name of the plugin
func (pl *LowestOrdinalPriority) Name() string {
return Name
}
// Score invoked at the score extension point. The "score" returned in this function is higher for pods with lower ordinal values.
func (pl *LowestOrdinalPriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) {
score := math.MaxUint64 - uint64(podID) //lower ordinals get higher score
return score, state.NewStatus(state.Success)
}
// ScoreExtensions of the Score plugin.
func (pl *LowestOrdinalPriority) ScoreExtensions() state.ScoreExtensions {
return pl
}
// NormalizeScore invoked after scoring all pods.
func (pl *LowestOrdinalPriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status {
return nil
}

View File

@ -1,114 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package lowestordinalpriority
import (
"math"
"reflect"
"testing"
"github.com/stretchr/testify/assert"
"k8s.io/apimachinery/pkg/types"
state "knative.dev/eventing/pkg/scheduler/state"
tscheduler "knative.dev/eventing/pkg/scheduler/testing"
)
func TestScore(t *testing.T) {
testCases := []struct {
name string
state *state.State
podID int32
expScore uint64
expected *state.Status
}{
{
name: "no vpods",
state: &state.State{LastOrdinal: -1},
podID: 0,
expScore: math.MaxUint64,
expected: state.NewStatus(state.Success),
},
{
name: "one vpods free",
state: &state.State{LastOrdinal: 0},
podID: 0,
expScore: math.MaxUint64,
expected: state.NewStatus(state.Success),
},
{
name: "two vpods free",
state: &state.State{LastOrdinal: 0},
podID: 1,
expScore: math.MaxUint64 - 1,
expected: state.NewStatus(state.Success),
},
{
name: "one vpods not free",
state: &state.State{LastOrdinal: 1},
podID: 0,
expScore: math.MaxUint64,
expected: state.NewStatus(state.Success),
},
{
name: "one vpods not free",
state: &state.State{LastOrdinal: 1},
podID: 1,
expScore: math.MaxUint64 - 1,
expected: state.NewStatus(state.Success),
},
{
name: "many vpods, no gaps",
state: &state.State{LastOrdinal: 1},
podID: 2,
expScore: math.MaxUint64 - 2,
expected: state.NewStatus(state.Success),
},
{
name: "many vpods, with gaps",
state: &state.State{LastOrdinal: 2},
podID: 0,
expScore: math.MaxUint64,
expected: state.NewStatus(state.Success),
},
{
name: "many vpods, with gaps",
state: &state.State{LastOrdinal: 2},
podID: 1000,
expScore: math.MaxUint64 - 1000,
expected: state.NewStatus(state.Success),
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t)
var plugin = &LowestOrdinalPriority{}
var args interface{}
name := plugin.Name()
assert.Equal(t, name, state.LowestOrdinalPriority)
score, status := plugin.Score(ctx, args, tc.state, tc.state.SchedulablePods, types.NamespacedName{}, tc.podID)
if score != tc.expScore {
t.Errorf("unexpected score, got %v, want %v", score, tc.expScore)
}
if !reflect.DeepEqual(status, tc.expected) {
t.Errorf("unexpected status, got %v, want %v", status, tc.expected)
}
})
}
}

View File

@ -1,61 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package podfitsresources
import (
"context"
"k8s.io/apimachinery/pkg/types"
"knative.dev/eventing/pkg/scheduler/factory"
state "knative.dev/eventing/pkg/scheduler/state"
"knative.dev/pkg/logging"
)
// PodFitsResources is a plugin that filters pods that do not have sufficient free capacity for a vreplica to be placed on it
type PodFitsResources struct {
}
// Verify PodFitsResources Implements FilterPlugin Interface
var _ state.FilterPlugin = &PodFitsResources{}
// Name of the plugin
const Name = state.PodFitsResources
const (
ErrReasonUnschedulable = "pod at full capacity"
)
func init() {
factory.RegisterFP(Name, &PodFitsResources{})
}
// Name returns name of the plugin
func (pl *PodFitsResources) Name() string {
return Name
}
// Filter invoked at the filter extension point.
func (pl *PodFitsResources) Filter(ctx context.Context, args interface{}, states *state.State, key types.NamespacedName, podID int32) *state.Status {
logger := logging.FromContext(ctx).With("Filter", pl.Name())
if len(states.FreeCap) == 0 || states.Free(podID) > 0 { //vpods with no placements or pods with positive free cap
return state.NewStatus(state.Success)
}
logger.Infof("Unschedulable! Pod %d has no free capacity %v", podID, states.FreeCap)
return state.NewStatus(state.Unschedulable, ErrReasonUnschedulable)
}

View File

@ -1,96 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package podfitsresources
import (
"reflect"
"testing"
"github.com/stretchr/testify/assert"
"k8s.io/apimachinery/pkg/types"
state "knative.dev/eventing/pkg/scheduler/state"
tscheduler "knative.dev/eventing/pkg/scheduler/testing"
)
func TestFilter(t *testing.T) {
testCases := []struct {
name string
state *state.State
podID int32
expected *state.Status
err error
}{
{
name: "no vpods",
state: &state.State{Capacity: 10, FreeCap: []int32{}, LastOrdinal: -1},
podID: 0,
expected: state.NewStatus(state.Success),
},
{
name: "one vpods free",
state: &state.State{Capacity: 10, FreeCap: []int32{int32(9)}, LastOrdinal: 0},
podID: 0,
expected: state.NewStatus(state.Success),
},
{
name: "one vpods free",
state: &state.State{Capacity: 10, FreeCap: []int32{int32(10)}, LastOrdinal: 0},
podID: 1,
expected: state.NewStatus(state.Success),
},
{
name: "one vpods not free",
state: &state.State{Capacity: 10, FreeCap: []int32{int32(0)}, LastOrdinal: 0},
podID: 0,
expected: state.NewStatus(state.Unschedulable, ErrReasonUnschedulable),
},
{
name: "many vpods, no gaps",
state: &state.State{Capacity: 10, FreeCap: []int32{int32(0), int32(5), int32(5)}, LastOrdinal: 2},
podID: 0,
expected: state.NewStatus(state.Unschedulable, ErrReasonUnschedulable),
},
{
name: "many vpods, with gaps",
state: &state.State{Capacity: 10, FreeCap: []int32{int32(9), int32(10), int32(5), int32(10)}, LastOrdinal: 2},
podID: 0,
expected: state.NewStatus(state.Success),
},
{
name: "many vpods, with gaps and reserved vreplicas",
state: &state.State{Capacity: 10, FreeCap: []int32{int32(4), int32(10), int32(5), int32(0)}, LastOrdinal: 2},
podID: 3,
expected: state.NewStatus(state.Unschedulable, ErrReasonUnschedulable),
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t)
var plugin = &PodFitsResources{}
var args interface{}
name := plugin.Name()
assert.Equal(t, name, state.PodFitsResources)
status := plugin.Filter(ctx, args, tc.state, types.NamespacedName{}, tc.podID)
if !reflect.DeepEqual(status, tc.expected) {
t.Errorf("unexpected state, got %v, want %v", status, tc.expected)
}
})
}
}

View File

@ -1,113 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package removewithavailabilitynodepriority
import (
"context"
"encoding/json"
"math"
"strings"
"k8s.io/apimachinery/pkg/types"
"knative.dev/eventing/pkg/scheduler/factory"
state "knative.dev/eventing/pkg/scheduler/state"
"knative.dev/pkg/logging"
)
// RemoveWithAvailabilityNodePriority is a score plugin that favors pods that create an even spread of resources across nodes for HA
type RemoveWithAvailabilityNodePriority struct {
}
// Verify RemoveWithAvailabilityNodePriority Implements ScorePlugin Interface
var _ state.ScorePlugin = &RemoveWithAvailabilityNodePriority{}
// Name of the plugin
const Name = state.RemoveWithAvailabilityNodePriority
const (
ErrReasonInvalidArg = "invalid arguments"
ErrReasonNoResource = "node does not exist"
)
func init() {
factory.RegisterSP(Name, &RemoveWithAvailabilityNodePriority{})
}
// Name returns name of the plugin
func (pl *RemoveWithAvailabilityNodePriority) Name() string {
return Name
}
// Score invoked at the score extension point. The "score" returned in this function is higher for nodes that create an even spread across nodes.
func (pl *RemoveWithAvailabilityNodePriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) {
logger := logging.FromContext(ctx).With("Score", pl.Name())
var score uint64 = 0
spreadArgs, ok := args.(string)
if !ok {
logger.Errorf("Scoring args %v for priority %q are not valid", args, pl.Name())
return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
skewVal := state.AvailabilityNodePriorityArgs{}
decoder := json.NewDecoder(strings.NewReader(spreadArgs))
decoder.DisallowUnknownFields()
if err := decoder.Decode(&skewVal); err != nil {
return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
if states.Replicas > 0 { //need at least a pod to compute spread
var skew int32
_, nodeName, err := states.GetPodInfo(state.PodNameFromOrdinal(states.StatefulSetName, podID))
if err != nil {
return score, state.NewStatus(state.Error, ErrReasonNoResource)
}
currentReps := states.NodeSpread[key][nodeName] //get #vreps on this node
for otherNodeName := range states.NodeToZoneMap { //compare with #vreps on other pods
if otherNodeName != nodeName {
otherReps, ok := states.NodeSpread[key][otherNodeName]
if !ok {
continue //node does not exist in current placement, so move on
}
if skew = (currentReps - 1) - otherReps; skew < 0 {
skew = skew * int32(-1)
}
//logger.Infof("Current Node %v with %d and Other Node %v with %d causing skew %d", nodeName, currentReps, otherNodeName, otherReps, skew)
if skew > skewVal.MaxSkew { //score low
logger.Infof("Pod %d in node %v will cause an uneven node spread %v with other node %v", podID, nodeName, states.NodeSpread[key], otherNodeName)
}
score = score + uint64(skew)
}
}
score = math.MaxUint64 - score //lesser skews get higher score
}
return score, state.NewStatus(state.Success)
}
// ScoreExtensions of the Score plugin.
func (pl *RemoveWithAvailabilityNodePriority) ScoreExtensions() state.ScoreExtensions {
return pl
}
// NormalizeScore invoked after scoring all pods.
func (pl *RemoveWithAvailabilityNodePriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status {
return nil
}

View File

@ -1,231 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package removewithavailabilitynodepriority
import (
"fmt"
"math"
"reflect"
"testing"
"github.com/stretchr/testify/assert"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
listers "knative.dev/eventing/pkg/reconciler/testing/v1"
"knative.dev/eventing/pkg/scheduler"
state "knative.dev/eventing/pkg/scheduler/state"
tscheduler "knative.dev/eventing/pkg/scheduler/testing"
kubeclient "knative.dev/pkg/client/injection/kube/client/fake"
)
const (
testNs = "test-ns"
sfsName = "statefulset-name"
vpodName = "source-name"
vpodNamespace = "source-namespace"
numZones = 3
numNodes = 6
)
func TestScore(t *testing.T) {
testCases := []struct {
name string
state *state.State
vpod types.NamespacedName
replicas int32
podID int32
expected *state.Status
expScore uint64
args interface{}
}{
{
name: "no vpods, no pods",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: sfsName, Replicas: 0,
NodeSpread: map[types.NamespacedName]map[string]int32{}},
replicas: 0,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: 0,
args: "{\"MaxSkew\": 2}",
},
{
name: "no vpods, no pods, bad arg",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: sfsName, Replicas: 0,
NodeSpread: map[types.NamespacedName]map[string]int32{}},
replicas: 0,
podID: 0,
expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg),
expScore: 0,
args: "{\"MaxSkewness\": 2}",
},
{
name: "no vpods, no pods, no resource",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: sfsName, Replicas: 1,
NodeSpread: map[types.NamespacedName]map[string]int32{}},
replicas: 0,
podID: 1,
expected: state.NewStatus(state.Error, ErrReasonNoResource),
expScore: 0,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, one node, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 1,
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"node0": 5,
},
},
},
replicas: 1,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64,
args: "{\"MaxSkew\": 2}",
},
{
name: "two vpods, one node, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 1,
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"node0": 5,
},
{Name: vpodName + "-1", Namespace: vpodNamespace + "-1"}: {
"node1": 4,
},
},
},
replicas: 1,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, two nodes, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 2, NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"node0": 5,
"node1": 5,
"node2": 3,
},
}},
replicas: 2,
podID: 1,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 2,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, three nodes, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 3, NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"node0": 5,
"node1": 4,
"node2": 3,
},
}},
replicas: 3,
podID: 1,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 2,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, five pods, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 5, NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"node0": 8,
"node1": 4,
"node2": 3,
},
}},
replicas: 5,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 7,
args: "{\"MaxSkew\": 2}",
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t)
var plugin = &RemoveWithAvailabilityNodePriority{}
name := plugin.Name()
assert.Equal(t, name, state.RemoveWithAvailabilityNodePriority)
nodelist := make([]*v1.Node, 0)
podlist := make([]runtime.Object, 0)
for i := int32(0); i < numZones; i++ {
for j := int32(0); j < numNodes/numZones; j++ {
nodeName := "node" + fmt.Sprint((j*((numNodes/numZones)+1))+i)
zoneName := "zone" + fmt.Sprint(i)
node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
nodelist = append(nodelist, node)
}
}
for i := int32(0); i < tc.replicas; i++ {
nodeName := "node" + fmt.Sprint(i)
podName := sfsName + "-" + fmt.Sprint(i)
pod, err := kubeclient.Get(ctx).CoreV1().Pods(testNs).Create(ctx, tscheduler.MakePod(testNs, podName, nodeName), metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
podlist = append(podlist, pod)
}
nodeToZoneMap := make(map[string]string)
for i := 0; i < len(nodelist); i++ {
node := nodelist[i]
zoneName, ok := node.GetLabels()[scheduler.ZoneLabel]
if !ok {
continue //ignore node that doesn't have zone info (maybe a test setup or control node)
}
nodeToZoneMap[node.Name] = zoneName
}
lsp := listers.NewListers(podlist)
tc.state.PodLister = lsp.GetPodLister().Pods(testNs)
tc.state.NodeToZoneMap = nodeToZoneMap
score, status := plugin.Score(ctx, tc.args, tc.state, tc.state.SchedulablePods, tc.vpod, tc.podID)
if score != tc.expScore {
t.Errorf("unexpected score, got %v, want %v", score, tc.expScore)
}
if !reflect.DeepEqual(status, tc.expected) {
t.Errorf("unexpected status, got %v, want %v", status, tc.expected)
}
})
}
}

View File

@ -1,118 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package removewithavailabilityzonepriority
import (
"context"
"encoding/json"
"math"
"strings"
"k8s.io/apimachinery/pkg/types"
"knative.dev/eventing/pkg/scheduler/factory"
state "knative.dev/eventing/pkg/scheduler/state"
"knative.dev/pkg/logging"
)
// RemoveWithAvailabilityZonePriority is a score plugin that favors pods that create an even spread of resources across zones for HA
type RemoveWithAvailabilityZonePriority struct {
}
// Verify RemoveWithAvailabilityZonePriority Implements ScorePlugin Interface
var _ state.ScorePlugin = &RemoveWithAvailabilityZonePriority{}
// Name of the plugin
const Name = state.RemoveWithAvailabilityZonePriority
const (
ErrReasonInvalidArg = "invalid arguments"
ErrReasonNoResource = "zone does not exist"
)
func init() {
factory.RegisterSP(Name, &RemoveWithAvailabilityZonePriority{})
}
// Name returns name of the plugin
func (pl *RemoveWithAvailabilityZonePriority) Name() string {
return Name
}
// Score invoked at the score extension point. The "score" returned in this function is higher for zones that create an even spread across zones.
func (pl *RemoveWithAvailabilityZonePriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) {
logger := logging.FromContext(ctx).With("Score", pl.Name())
var score uint64 = 0
spreadArgs, ok := args.(string)
if !ok {
logger.Errorf("Scoring args %v for priority %q are not valid", args, pl.Name())
return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
skewVal := state.AvailabilityZonePriorityArgs{}
decoder := json.NewDecoder(strings.NewReader(spreadArgs))
decoder.DisallowUnknownFields()
if err := decoder.Decode(&skewVal); err != nil {
return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
if states.Replicas > 0 { //need at least a pod to compute spread
var skew int32
zoneMap := make(map[string]struct{})
for _, zoneName := range states.NodeToZoneMap {
zoneMap[zoneName] = struct{}{}
}
zoneName, _, err := states.GetPodInfo(state.PodNameFromOrdinal(states.StatefulSetName, podID))
if err != nil {
return score, state.NewStatus(state.Error, ErrReasonNoResource)
}
currentReps := states.ZoneSpread[key][zoneName] //get #vreps on this zone
for otherZoneName := range zoneMap { //compare with #vreps on other pods
if otherZoneName != zoneName {
otherReps, ok := states.ZoneSpread[key][otherZoneName]
if !ok {
continue //zone does not exist in current placement, so move on
}
if skew = (currentReps - 1) - otherReps; skew < 0 {
skew = skew * int32(-1)
}
//logger.Infof("Current Zone %v with %d and Other Zone %v with %d causing skew %d", zoneName, currentReps, otherZoneName, otherReps, skew)
if skew > skewVal.MaxSkew { //score low
logger.Infof("Pod %d in zone %v will cause an uneven zone spread %v with other zone %v", podID, zoneName, states.ZoneSpread[key], otherZoneName)
}
score = score + uint64(skew)
}
}
score = math.MaxUint64 - score //lesser skews get higher score
}
return score, state.NewStatus(state.Success)
}
// ScoreExtensions of the Score plugin.
func (pl *RemoveWithAvailabilityZonePriority) ScoreExtensions() state.ScoreExtensions {
return pl
}
// NormalizeScore invoked after scoring all pods.
func (pl *RemoveWithAvailabilityZonePriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status {
return nil
}

View File

@ -1,231 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package removewithavailabilityzonepriority
import (
"fmt"
"math"
"reflect"
"testing"
"github.com/stretchr/testify/assert"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
listers "knative.dev/eventing/pkg/reconciler/testing/v1"
"knative.dev/eventing/pkg/scheduler"
state "knative.dev/eventing/pkg/scheduler/state"
tscheduler "knative.dev/eventing/pkg/scheduler/testing"
kubeclient "knative.dev/pkg/client/injection/kube/client/fake"
)
const (
testNs = "test-ns"
sfsName = "statefulset-name"
vpodName = "source-name"
vpodNamespace = "source-namespace"
numZones = 3
numNodes = 6
)
func TestScore(t *testing.T) {
testCases := []struct {
name string
state *state.State
vpod types.NamespacedName
replicas int32
podID int32
expected *state.Status
expScore uint64
args interface{}
}{
{
name: "no vpods, no pods",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: sfsName, Replicas: 0,
ZoneSpread: map[types.NamespacedName]map[string]int32{}},
replicas: 0,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: 0,
args: "{\"MaxSkew\": 2}",
},
{
name: "no vpods, no pods, bad arg",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: sfsName, Replicas: 0,
ZoneSpread: map[types.NamespacedName]map[string]int32{}},
replicas: 0,
podID: 0,
expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg),
expScore: 0,
args: "{\"MaxSkewness\": 2}",
},
{
name: "no vpods, no pods, no resource",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: sfsName, Replicas: 1,
ZoneSpread: map[types.NamespacedName]map[string]int32{}},
replicas: 0,
podID: 1,
expected: state.NewStatus(state.Error, ErrReasonNoResource),
expScore: 0,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, one zone, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 1,
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"zone0": 5,
},
},
},
replicas: 1,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64,
args: "{\"MaxSkew\": 2}",
},
{
name: "two vpods, one zone, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 1,
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"zone0": 5,
},
{Name: vpodName + "-1", Namespace: vpodNamespace + "-1"}: {
"zone1": 4,
},
},
},
replicas: 1,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, two zones, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 2, ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"zone0": 5,
"zone1": 5,
"zone2": 3,
},
}},
replicas: 2,
podID: 1,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 2,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, three zones, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 3, ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"zone0": 5,
"zone1": 4,
"zone2": 3,
},
}},
replicas: 3,
podID: 1,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 2,
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, five pods, same pod filter",
vpod: types.NamespacedName{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"},
state: &state.State{StatefulSetName: sfsName, Replicas: 5, ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNamespace + "-0"}: {
"zone0": 8,
"zone1": 4,
"zone2": 3,
},
}},
replicas: 5,
podID: 0,
expected: state.NewStatus(state.Success),
expScore: math.MaxUint64 - 7,
args: "{\"MaxSkew\": 2}",
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t)
var plugin = &RemoveWithAvailabilityZonePriority{}
name := plugin.Name()
assert.Equal(t, name, state.RemoveWithAvailabilityZonePriority)
nodelist := make([]*v1.Node, 0)
podlist := make([]runtime.Object, 0)
for i := int32(0); i < numZones; i++ {
for j := int32(0); j < numNodes/numZones; j++ {
nodeName := "node" + fmt.Sprint((j*((numNodes/numZones)+1))+i)
zoneName := "zone" + fmt.Sprint(i)
node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
nodelist = append(nodelist, node)
}
}
for i := int32(0); i < tc.replicas; i++ {
nodeName := "node" + fmt.Sprint(i)
podName := sfsName + "-" + fmt.Sprint(i)
pod, err := kubeclient.Get(ctx).CoreV1().Pods(testNs).Create(ctx, tscheduler.MakePod(testNs, podName, nodeName), metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
podlist = append(podlist, pod)
}
nodeToZoneMap := make(map[string]string)
for i := 0; i < len(nodelist); i++ {
node := nodelist[i]
zoneName, ok := node.GetLabels()[scheduler.ZoneLabel]
if !ok {
continue //ignore node that doesn't have zone info (maybe a test setup or control node)
}
nodeToZoneMap[node.Name] = zoneName
}
lsp := listers.NewListers(podlist)
tc.state.PodLister = lsp.GetPodLister().Pods(testNs)
tc.state.NodeToZoneMap = nodeToZoneMap
score, status := plugin.Score(ctx, tc.args, tc.state, tc.state.SchedulablePods, tc.vpod, tc.podID)
if score != tc.expScore {
t.Errorf("unexpected score, got %v, want %v", score, tc.expScore)
}
if !reflect.DeepEqual(status, tc.expected) {
t.Errorf("unexpected status, got %v, want %v", status, tc.expected)
}
})
}
}

View File

@ -1,106 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package removewithevenpodspreadpriority
import (
"context"
"encoding/json"
"math"
"strings"
"k8s.io/apimachinery/pkg/types"
"knative.dev/eventing/pkg/scheduler/factory"
state "knative.dev/eventing/pkg/scheduler/state"
"knative.dev/pkg/logging"
)
// RemoveWithEvenPodSpreadPriority is a filter plugin that eliminates pods that do not create an equal spread of resources across pods
type RemoveWithEvenPodSpreadPriority struct {
}
// Verify RemoveWithEvenPodSpreadPriority Implements FilterPlugin Interface
var _ state.ScorePlugin = &RemoveWithEvenPodSpreadPriority{}
// Name of the plugin
const (
Name = state.RemoveWithEvenPodSpreadPriority
ErrReasonInvalidArg = "invalid arguments"
ErrReasonUnschedulable = "pod will cause an uneven spread"
)
func init() {
factory.RegisterSP(Name, &RemoveWithEvenPodSpreadPriority{})
}
// Name returns name of the plugin
func (pl *RemoveWithEvenPodSpreadPriority) Name() string {
return Name
}
// Score invoked at the score extension point. The "score" returned in this function is higher for pods that create an even spread across pods.
func (pl *RemoveWithEvenPodSpreadPriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) {
logger := logging.FromContext(ctx).With("Score", pl.Name())
var score uint64 = 0
spreadArgs, ok := args.(string)
if !ok {
logger.Errorf("Scoring args %v for priority %q are not valid", args, pl.Name())
return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
skewVal := state.EvenPodSpreadArgs{}
decoder := json.NewDecoder(strings.NewReader(spreadArgs))
decoder.DisallowUnknownFields()
if err := decoder.Decode(&skewVal); err != nil {
return 0, state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
if states.Replicas > 0 { //need at least a pod to compute spread
currentReps := states.PodSpread[key][state.PodNameFromOrdinal(states.StatefulSetName, podID)] //get #vreps on this podID
var skew int32
for _, otherPodID := range states.SchedulablePods { //compare with #vreps on other pods
if otherPodID != podID {
otherReps, ok := states.PodSpread[key][state.PodNameFromOrdinal(states.StatefulSetName, otherPodID)]
if !ok {
continue //pod does not exist in current placement, so move on
}
if skew = (currentReps - 1) - otherReps; skew < 0 {
skew = skew * int32(-1)
}
//logger.Infof("Current Pod %v with %d and Other Pod %v with %d causing skew %d", podID, currentReps, otherPodID, otherReps, skew)
if skew > skewVal.MaxSkew { //score low
logger.Infof("Pod %d will cause an uneven spread %v with other pod %v", podID, states.PodSpread[key], otherPodID)
}
score = score + uint64(skew)
}
}
score = math.MaxUint64 - score //lesser skews get higher score
}
return score, state.NewStatus(state.Success)
}
// ScoreExtensions of the Score plugin.
func (pl *RemoveWithEvenPodSpreadPriority) ScoreExtensions() state.ScoreExtensions {
return pl
}
// NormalizeScore invoked after scoring all pods.
func (pl *RemoveWithEvenPodSpreadPriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status {
return nil
}

View File

@ -1,166 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package removewithevenpodspreadpriority
import (
"math"
"reflect"
"testing"
"github.com/stretchr/testify/assert"
"k8s.io/apimachinery/pkg/types"
state "knative.dev/eventing/pkg/scheduler/state"
tscheduler "knative.dev/eventing/pkg/scheduler/testing"
)
func TestFilter(t *testing.T) {
testCases := []struct {
name string
state *state.State
vpod types.NamespacedName
podID int32
expected *state.Status
expScore uint64
args interface{}
}{
{
name: "no vpods, no pods",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: "pod-name", Replicas: 0, PodSpread: map[types.NamespacedName]map[string]int32{}},
podID: 0,
expScore: 0,
expected: state.NewStatus(state.Success),
args: "{\"MaxSkew\": 2}",
},
{
name: "no vpods, no pods, bad arg",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: "pod-name", Replicas: 0, PodSpread: map[types.NamespacedName]map[string]int32{}},
podID: 0,
expScore: 0,
expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg),
args: "{\"MaxSkewness\": 2}",
},
{
name: "one vpod, one pod, same pod filter",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", Replicas: 1,
SchedulablePods: []int32{int32(0)},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
},
},
},
podID: 0,
expScore: math.MaxUint64,
expected: state.NewStatus(state.Success),
args: "{\"MaxSkew\": 2}",
},
{
name: "two vpods, one pod, same pod filter",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", Replicas: 1,
SchedulablePods: []int32{int32(0)},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
},
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: {
"pod-name-0": 4,
},
},
},
podID: 0,
expScore: math.MaxUint64,
expected: state.NewStatus(state.Success),
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, two pods,same pod filter",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", Replicas: 2,
SchedulablePods: []int32{int32(0), int32(1)},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
"pod-name-1": 5,
},
}},
podID: 1,
expScore: math.MaxUint64 - 1,
expected: state.NewStatus(state.Success),
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, five pods, same pod filter",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", Replicas: 5,
SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3), int32(4)},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
"pod-name-1": 4,
"pod-name-2": 3,
"pod-name-3": 4,
"pod-name-4": 5,
},
}},
podID: 1,
expScore: math.MaxUint64 - 5,
expected: state.NewStatus(state.Success),
args: "{\"MaxSkew\": 2}",
},
{
name: "one vpod, five pods, same pod filter diff pod",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", Replicas: 6,
SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3), int32(4), int32(5)},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 10,
"pod-name-1": 4,
"pod-name-2": 3,
"pod-name-3": 4,
"pod-name-4": 5,
},
}},
podID: 0,
expScore: math.MaxUint64 - 20,
expected: state.NewStatus(state.Success),
args: "{\"MaxSkew\": 2}",
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t)
var plugin = &RemoveWithEvenPodSpreadPriority{}
name := plugin.Name()
assert.Equal(t, name, state.RemoveWithEvenPodSpreadPriority)
score, status := plugin.Score(ctx, tc.args, tc.state, tc.state.SchedulablePods, tc.vpod, tc.podID)
if score != tc.expScore {
t.Errorf("unexpected score, got %v, want %v", score, tc.expScore)
}
if !reflect.DeepEqual(status, tc.expected) {
t.Errorf("unexpected status, got %v, want %v", status, tc.expected)
}
})
}
}

View File

@ -1,60 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package removewithhighestordinalpriority
import (
"context"
"k8s.io/apimachinery/pkg/types"
"knative.dev/eventing/pkg/scheduler/factory"
state "knative.dev/eventing/pkg/scheduler/state"
)
// RemoveWithHighestOrdinalPriority is a score plugin that favors pods that have a higher ordinal
type RemoveWithHighestOrdinalPriority struct {
}
// Verify RemoveWithHighestOrdinalPriority Implements ScorePlugin Interface
var _ state.ScorePlugin = &RemoveWithHighestOrdinalPriority{}
// Name of the plugin
const Name = state.RemoveWithHighestOrdinalPriority
func init() {
factory.RegisterSP(Name, &RemoveWithHighestOrdinalPriority{})
}
// Name returns name of the plugin
func (pl *RemoveWithHighestOrdinalPriority) Name() string {
return Name
}
// Score invoked at the score extension point. The "score" returned in this function is higher for pods with higher ordinal values.
func (pl *RemoveWithHighestOrdinalPriority) Score(ctx context.Context, args interface{}, states *state.State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *state.Status) {
score := uint64(podID) //higher ordinals get higher score
return score, state.NewStatus(state.Success)
}
// ScoreExtensions of the Score plugin.
func (pl *RemoveWithHighestOrdinalPriority) ScoreExtensions() state.ScoreExtensions {
return pl
}
// NormalizeScore invoked after scoring all pods.
func (pl *RemoveWithHighestOrdinalPriority) NormalizeScore(ctx context.Context, states *state.State, scores state.PodScoreList) *state.Status {
return nil
}

View File

@ -1,113 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package removewithhighestordinalpriority
import (
"reflect"
"testing"
"github.com/stretchr/testify/assert"
"k8s.io/apimachinery/pkg/types"
state "knative.dev/eventing/pkg/scheduler/state"
tscheduler "knative.dev/eventing/pkg/scheduler/testing"
)
func TestScore(t *testing.T) {
testCases := []struct {
name string
state *state.State
podID int32
expScore uint64
expected *state.Status
}{
{
name: "no vpods",
state: &state.State{LastOrdinal: -1},
podID: 0,
expScore: 0,
expected: state.NewStatus(state.Success),
},
{
name: "one vpods free",
state: &state.State{LastOrdinal: 0},
podID: 0,
expScore: 0,
expected: state.NewStatus(state.Success),
},
{
name: "two vpods free",
state: &state.State{LastOrdinal: 0},
podID: 1,
expScore: 1,
expected: state.NewStatus(state.Success),
},
{
name: "one vpods not free",
state: &state.State{LastOrdinal: 1},
podID: 0,
expScore: 0,
expected: state.NewStatus(state.Success),
},
{
name: "one vpods not free",
state: &state.State{LastOrdinal: 1},
podID: 1,
expScore: 01,
expected: state.NewStatus(state.Success),
},
{
name: "many vpods, no gaps",
state: &state.State{LastOrdinal: 1},
podID: 2,
expScore: 2,
expected: state.NewStatus(state.Success),
},
{
name: "many vpods, with gaps",
state: &state.State{LastOrdinal: 2},
podID: 0,
expScore: 0,
expected: state.NewStatus(state.Success),
},
{
name: "many vpods, with gaps",
state: &state.State{LastOrdinal: 2},
podID: 1000,
expScore: 1000,
expected: state.NewStatus(state.Success),
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t)
var plugin = &RemoveWithHighestOrdinalPriority{}
var args interface{}
name := plugin.Name()
assert.Equal(t, name, state.RemoveWithHighestOrdinalPriority)
score, status := plugin.Score(ctx, args, tc.state, tc.state.SchedulablePods, types.NamespacedName{}, tc.podID)
if score != tc.expScore {
t.Errorf("unexpected score, got %v, want %v", score, tc.expScore)
}
if !reflect.DeepEqual(status, tc.expected) {
t.Errorf("unexpected status, got %v, want %v", status, tc.expected)
}
})
}
}

View File

@ -1,78 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package nomaxresourcecount
import (
"context"
"encoding/json"
"strings"
"k8s.io/apimachinery/pkg/types"
"knative.dev/eventing/pkg/scheduler/factory"
state "knative.dev/eventing/pkg/scheduler/state"
"knative.dev/pkg/logging"
)
// NoMaxResourceCount plugin filters pods that cause total pods with placements to exceed total partitioncount.
type NoMaxResourceCount struct {
}
// Verify NoMaxResourceCount Implements FilterPlugin Interface
var _ state.FilterPlugin = &NoMaxResourceCount{}
// Name of the plugin
const Name = state.NoMaxResourceCount
const (
ErrReasonInvalidArg = "invalid arguments"
ErrReasonUnschedulable = "pod increases total # of pods beyond partition count"
)
func init() {
factory.RegisterFP(Name, &NoMaxResourceCount{})
}
// Name returns name of the plugin
func (pl *NoMaxResourceCount) Name() string {
return Name
}
// Filter invoked at the filter extension point.
func (pl *NoMaxResourceCount) Filter(ctx context.Context, args interface{}, states *state.State, key types.NamespacedName, podID int32) *state.Status {
logger := logging.FromContext(ctx).With("Filter", pl.Name())
resourceCountArgs, ok := args.(string)
if !ok {
logger.Errorf("Filter args %v for predicate %q are not valid", args, pl.Name())
return state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
resVal := state.NoMaxResourceCountArgs{}
decoder := json.NewDecoder(strings.NewReader(resourceCountArgs))
decoder.DisallowUnknownFields()
if err := decoder.Decode(&resVal); err != nil {
return state.NewStatus(state.Unschedulable, ErrReasonInvalidArg)
}
podName := state.PodNameFromOrdinal(states.StatefulSetName, podID)
if _, ok := states.PodSpread[key][podName]; !ok && ((len(states.PodSpread[key]) + 1) > resVal.NumPartitions) { //pod not in vrep's partition map and counting this new pod towards total pod count
logger.Infof("Unschedulable! Pod %d filtered due to total pod count %v exceeding partition count", podID, len(states.PodSpread[key])+1)
return state.NewStatus(state.Unschedulable, ErrReasonUnschedulable)
}
return state.NewStatus(state.Success)
}

View File

@ -1,146 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package nomaxresourcecount
import (
"reflect"
"testing"
"github.com/stretchr/testify/assert"
"k8s.io/apimachinery/pkg/types"
state "knative.dev/eventing/pkg/scheduler/state"
tscheduler "knative.dev/eventing/pkg/scheduler/testing"
)
func TestFilter(t *testing.T) {
testCases := []struct {
name string
state *state.State
vpod types.NamespacedName
podID int32
expected *state.Status
args interface{}
}{
{
name: "no vpods, no pods",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: "pod-name", LastOrdinal: -1, PodSpread: map[types.NamespacedName]map[string]int32{}},
podID: 0,
expected: state.NewStatus(state.Success),
args: "{\"NumPartitions\": 5}",
},
{
name: "no vpods, no pods, bad arg",
vpod: types.NamespacedName{},
state: &state.State{StatefulSetName: "pod-name", LastOrdinal: -1, PodSpread: map[types.NamespacedName]map[string]int32{}},
podID: 0,
expected: state.NewStatus(state.Unschedulable, ErrReasonInvalidArg),
args: "{\"NumParts\": 5}",
},
{
name: "one vpod, one pod, same pod filter",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", LastOrdinal: 0,
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
},
},
},
podID: 0,
expected: state.NewStatus(state.Success),
args: "{\"NumPartitions\": 5}",
},
{
name: "two vpods, one pod, same pod filter",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", LastOrdinal: 0,
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
},
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: {
"pod-name-0": 4,
},
},
},
podID: 0,
expected: state.NewStatus(state.Success),
args: "{\"NumPartitions\": 5}",
},
{
name: "one vpod, two pods,same pod filter",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", LastOrdinal: 1, PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
"pod-name-1": 5,
},
}},
podID: 1,
expected: state.NewStatus(state.Success),
args: "{\"NumPartitions\": 5}",
},
{
name: "one vpod, five pods, same pod filter",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", LastOrdinal: 4, PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 5,
"pod-name-1": 4,
"pod-name-2": 3,
"pod-name-3": 4,
"pod-name-4": 5,
},
}},
podID: 1,
expected: state.NewStatus(state.Success),
args: "{\"NumPartitions\": 5}",
},
{
name: "one vpod, five pods, same pod filter unschedulable",
vpod: types.NamespacedName{Name: "vpod-name-0", Namespace: "vpod-ns-0"},
state: &state.State{StatefulSetName: "pod-name", LastOrdinal: 2, PodSpread: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"pod-name-0": 7,
"pod-name-1": 4,
"pod-name-2": 3,
"pod-name-3": 4,
"pod-name-4": 5,
},
}},
podID: 5,
expected: state.NewStatus(state.Unschedulable, ErrReasonUnschedulable),
args: "{\"NumPartitions\": 5}",
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t)
var plugin = &NoMaxResourceCount{}
name := plugin.Name()
assert.Equal(t, name, state.NoMaxResourceCount)
status := plugin.Filter(ctx, tc.args, tc.state, tc.vpod, tc.podID)
if !reflect.DeepEqual(status, tc.expected) {
t.Errorf("unexpected state, got %v, want %v", status, tc.expected)
}
})
}
}

View File

@ -30,63 +30,20 @@ import (
duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1" duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1"
) )
type SchedulerPolicyType string
const ( const (
// MAXFILLUP policy type adds vreplicas to existing pods to fill them up before adding to new pods
MAXFILLUP SchedulerPolicyType = "MAXFILLUP"
// PodAnnotationKey is an annotation used by the scheduler to be informed of pods // PodAnnotationKey is an annotation used by the scheduler to be informed of pods
// being evicted and not use it for placing vreplicas // being evicted and not use it for placing vreplicas
PodAnnotationKey = "eventing.knative.dev/unschedulable" PodAnnotationKey = "eventing.knative.dev/unschedulable"
) )
const (
ZoneLabel = "topology.kubernetes.io/zone"
UnknownZone = "unknown"
)
const (
// MaxWeight is the maximum weight that can be assigned for a priority.
MaxWeight uint64 = 10
// MinWeight is the minimum weight that can be assigned for a priority.
MinWeight uint64 = 0
)
// Policy describes a struct of a policy resource.
type SchedulerPolicy struct {
// Holds the information to configure the fit predicate functions.
Predicates []PredicatePolicy `json:"predicates"`
// Holds the information to configure the priority functions.
Priorities []PriorityPolicy `json:"priorities"`
}
// PredicatePolicy describes a struct of a predicate policy.
type PredicatePolicy struct {
// Identifier of the predicate policy
Name string `json:"name"`
// Holds the parameters to configure the given predicate
Args interface{} `json:"args"`
}
// PriorityPolicy describes a struct of a priority policy.
type PriorityPolicy struct {
// Identifier of the priority policy
Name string `json:"name"`
// The numeric multiplier for the pod scores that the priority function generates
// The weight should be a positive integer
Weight uint64 `json:"weight"`
// Holds the parameters to configure the given priority function
Args interface{} `json:"args"`
}
// VPodLister is the function signature for returning a list of VPods // VPodLister is the function signature for returning a list of VPods
type VPodLister func() ([]VPod, error) type VPodLister func() ([]VPod, error)
// Evictor allows for vreplicas to be evicted. // Evictor allows for vreplicas to be evicted.
// For instance, the evictor is used by the statefulset scheduler to // For instance, the evictor is used by the statefulset scheduler to
// move vreplicas to pod with a lower ordinal. // move vreplicas to pod with a lower ordinal.
//
// pod might be `nil`.
type Evictor func(pod *corev1.Pod, vpod VPod, from *duckv1alpha1.Placement) error type Evictor func(pod *corev1.Pod, vpod VPod, from *duckv1alpha1.Placement) error
// Scheduler is responsible for placing VPods into real Kubernetes pods // Scheduler is responsible for placing VPods into real Kubernetes pods
@ -109,6 +66,8 @@ func (f SchedulerFunc) Schedule(ctx context.Context, vpod VPod) ([]duckv1alpha1.
// VPod represents virtual replicas placed into real Kubernetes pods // VPod represents virtual replicas placed into real Kubernetes pods
// The scheduler is responsible for placing VPods // The scheduler is responsible for placing VPods
type VPod interface { type VPod interface {
GetDeletionTimestamp() *metav1.Time
// GetKey returns the VPod key (namespace/name). // GetKey returns the VPod key (namespace/name).
GetKey() types.NamespacedName GetKey() types.NamespacedName

View File

@ -17,14 +17,10 @@ limitations under the License.
package state package state
import ( import (
"context"
"math"
"strconv" "strconv"
"strings" "strings"
"time"
"k8s.io/apimachinery/pkg/types" "k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/wait"
"knative.dev/eventing/pkg/scheduler" "knative.dev/eventing/pkg/scheduler"
) )
@ -36,7 +32,7 @@ func PodNameFromOrdinal(name string, ordinal int32) string {
func OrdinalFromPodName(podName string) int32 { func OrdinalFromPodName(podName string) int32 {
ordinal, err := strconv.ParseInt(podName[strings.LastIndex(podName, "-")+1:], 10, 32) ordinal, err := strconv.ParseInt(podName[strings.LastIndex(podName, "-")+1:], 10, 32)
if err != nil { if err != nil {
return math.MaxInt32 panic(podName + " is not a valid pod name")
} }
return int32(ordinal) return int32(ordinal)
} }
@ -50,31 +46,3 @@ func GetVPod(key types.NamespacedName, vpods []scheduler.VPod) scheduler.VPod {
} }
return nil return nil
} }
func SatisfyZoneAvailability(feasiblePods []int32, states *State) bool {
zoneMap := make(map[string]struct{})
var zoneName string
var err error
for _, podID := range feasiblePods {
zoneName, _, err = states.GetPodInfo(PodNameFromOrdinal(states.StatefulSetName, podID))
if err != nil {
continue
}
zoneMap[zoneName] = struct{}{}
}
return len(zoneMap) == int(states.NumZones)
}
func SatisfyNodeAvailability(feasiblePods []int32, states *State) bool {
nodeMap := make(map[string]struct{})
var nodeName string
var err error
for _, podID := range feasiblePods {
wait.PollUntilContextTimeout(context.Background(), 50*time.Millisecond, 5*time.Second, true, func(ctx context.Context) (bool, error) {
_, nodeName, err = states.GetPodInfo(PodNameFromOrdinal(states.StatefulSetName, podID))
return err == nil, nil
})
nodeMap[nodeName] = struct{}{}
}
return len(nodeMap) == int(states.NumNodes)
}

View File

@ -1,209 +0,0 @@
/*
Copyright 2021 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package state
import (
"context"
"errors"
"strings"
"k8s.io/apimachinery/pkg/types"
)
const (
PodFitsResources = "PodFitsResources"
NoMaxResourceCount = "NoMaxResourceCount"
EvenPodSpread = "EvenPodSpread"
AvailabilityNodePriority = "AvailabilityNodePriority"
AvailabilityZonePriority = "AvailabilityZonePriority"
LowestOrdinalPriority = "LowestOrdinalPriority"
RemoveWithEvenPodSpreadPriority = "RemoveWithEvenPodSpreadPriority"
RemoveWithAvailabilityNodePriority = "RemoveWithAvailabilityNodePriority"
RemoveWithAvailabilityZonePriority = "RemoveWithAvailabilityZonePriority"
RemoveWithHighestOrdinalPriority = "RemoveWithHighestOrdinalPriority"
)
// Plugin is the parent type for all the scheduling framework plugins.
type Plugin interface {
Name() string
}
type FilterPlugin interface {
Plugin
// Filter is called by the scheduler.
// All FilterPlugins should return "Success" to declare that
// the given pod fits the vreplica.
Filter(ctx context.Context, args interface{}, state *State, key types.NamespacedName, podID int32) *Status
}
// ScoreExtensions is an interface for Score extended functionality.
type ScoreExtensions interface {
// NormalizeScore is called for all pod scores produced by the same plugin's "Score"
// method. A successful run of NormalizeScore will update the scores list and return
// a success status.
NormalizeScore(ctx context.Context, state *State, scores PodScoreList) *Status
}
type ScorePlugin interface {
Plugin
// Score is called by the scheduler.
// All ScorePlugins should return "Success" unless the args are invalid.
Score(ctx context.Context, args interface{}, state *State, feasiblePods []int32, key types.NamespacedName, podID int32) (uint64, *Status)
// ScoreExtensions returns a ScoreExtensions interface if it implements one, or nil if does not
ScoreExtensions() ScoreExtensions
}
// NoMaxResourceCountArgs holds arguments used to configure the NoMaxResourceCount plugin.
type NoMaxResourceCountArgs struct {
NumPartitions int
}
// EvenPodSpreadArgs holds arguments used to configure the EvenPodSpread plugin.
type EvenPodSpreadArgs struct {
MaxSkew int32
}
// AvailabilityZonePriorityArgs holds arguments used to configure the AvailabilityZonePriority plugin.
type AvailabilityZonePriorityArgs struct {
MaxSkew int32
}
// AvailabilityNodePriorityArgs holds arguments used to configure the AvailabilityNodePriority plugin.
type AvailabilityNodePriorityArgs struct {
MaxSkew int32
}
// Code is the Status code/type which is returned from plugins.
type Code int
// These are predefined codes used in a Status.
const (
// Success means that plugin ran correctly and found pod schedulable.
Success Code = iota
// Unschedulable is used when a plugin finds a pod unschedulable due to not satisying the predicate.
Unschedulable
// Error is used for internal plugin errors, unexpected input, etc.
Error
)
// Status indicates the result of running a plugin.
type Status struct {
code Code
reasons []string
err error
}
// Code returns code of the Status.
func (s *Status) Code() Code {
if s == nil {
return Success
}
return s.code
}
// Message returns a concatenated message on reasons of the Status.
func (s *Status) Message() string {
if s == nil {
return ""
}
return strings.Join(s.reasons, ", ")
}
// NewStatus makes a Status out of the given arguments and returns its pointer.
func NewStatus(code Code, reasons ...string) *Status {
s := &Status{
code: code,
reasons: reasons,
}
if code == Error {
s.err = errors.New(s.Message())
}
return s
}
// AsStatus wraps an error in a Status.
func AsStatus(err error) *Status {
return &Status{
code: Error,
reasons: []string{err.Error()},
err: err,
}
}
// AsError returns nil if the status is a success; otherwise returns an "error" object
// with a concatenated message on reasons of the Status.
func (s *Status) AsError() error {
if s.IsSuccess() {
return nil
}
if s.err != nil {
return s.err
}
return errors.New(s.Message())
}
// IsSuccess returns true if and only if "Status" is nil or Code is "Success".
func (s *Status) IsSuccess() bool {
return s.Code() == Success
}
// IsError returns true if and only if "Status" is "Error".
func (s *Status) IsError() bool {
return s.Code() == Error
}
// IsUnschedulable returns true if "Status" is Unschedulable
func (s *Status) IsUnschedulable() bool {
return s.Code() == Unschedulable
}
type PodScore struct {
ID int32
Score uint64
}
type PodScoreList []PodScore
// PluginToPodScores declares a map from plugin name to its PodScoreList.
type PluginToPodScores map[string]PodScoreList
// PluginToStatus maps plugin name to status. Currently used to identify which Filter plugin
// returned which status.
type PluginToStatus map[string]*Status
// Merge merges the statuses in the map into one. The resulting status code have the following
// precedence: Error, Unschedulable, Success
func (p PluginToStatus) Merge() *Status {
if len(p) == 0 {
return nil
}
finalStatus := NewStatus(Success)
for _, s := range p {
if s.Code() == Error {
finalStatus.err = s.AsError()
}
if s.Code() > finalStatus.code {
finalStatus.code = s.Code()
}
finalStatus.reasons = append(finalStatus.reasons, s.reasons...)
}
return finalStatus
}

View File

@ -1,87 +0,0 @@
/*
Copyright 2020 The Knative Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
package state
import (
"errors"
"testing"
)
func TestStatus(t *testing.T) {
testCases := []struct {
name string
status *Status
code Code
err error
}{
{
name: "success",
status: NewStatus(Success),
},
{
name: "error",
status: NewStatus(Error),
code: Error,
},
{
name: "error as status",
status: AsStatus(errors.New("invalid arguments")),
code: Error,
},
{
name: "unschedulable",
status: NewStatus(Unschedulable, "invalid arguments"),
code: Unschedulable,
err: NewStatus(Unschedulable, "invalid arguments").AsError(),
},
}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
if tc.status.IsSuccess() && tc.status.Code() != tc.code && tc.status.AsError() != tc.err {
t.Errorf("unexpected code, got %v, want %v", tc.status.code, tc.code)
} else if tc.status.IsUnschedulable() && tc.status.Code() != tc.code && tc.status.AsError() != tc.err {
t.Errorf("unexpected code/msg, got %v, want %v, got %v, want %v", tc.status.code, tc.code, tc.status.AsError().Error(), tc.err.Error())
} else if tc.status.IsError() && tc.status.Code() != tc.code && tc.status.AsError() != tc.err {
t.Errorf("unexpected code/msg, got %v, want %v, got %v, want %v", tc.status.code, tc.code, tc.status.AsError().Error(), tc.err.Error())
}
})
}
}
func TestStatusMerge(t *testing.T) {
ps := PluginToStatus{"A": NewStatus(Success), "B": NewStatus(Success)}
if !ps.Merge().IsSuccess() {
t.Errorf("unexpected status from merge")
}
ps = PluginToStatus{"A": NewStatus(Success), "B": NewStatus(Error)}
if !ps.Merge().IsError() {
t.Errorf("unexpected status from merge")
}
ps = PluginToStatus{"A": NewStatus(Unschedulable), "B": NewStatus(Error)}
if !ps.Merge().IsError() {
t.Errorf("unexpected status from merge")
}
ps = PluginToStatus{"A": NewStatus(Unschedulable), "B": NewStatus(Success)}
if !ps.Merge().IsUnschedulable() {
t.Errorf("unexpected status from merge")
}
}

View File

@ -19,14 +19,12 @@ package state
import ( import (
"context" "context"
"encoding/json" "encoding/json"
"errors"
"math" "math"
"strconv" "strconv"
"go.uber.org/zap" "go.uber.org/zap"
v1 "k8s.io/api/core/v1" v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/labels"
"k8s.io/apimachinery/pkg/types" "k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/sets" "k8s.io/apimachinery/pkg/util/sets"
corev1 "k8s.io/client-go/listers/core/v1" corev1 "k8s.io/client-go/listers/core/v1"
@ -39,46 +37,27 @@ type StateAccessor interface {
// State returns the current state (snapshot) about placed vpods // State returns the current state (snapshot) about placed vpods
// Take into account reserved vreplicas and update `reserved` to reflect // Take into account reserved vreplicas and update `reserved` to reflect
// the current state. // the current state.
State(ctx context.Context, reserved map[types.NamespacedName]map[string]int32) (*State, error) State(ctx context.Context) (*State, error)
} }
// state provides information about the current scheduling of all vpods // state provides information about the current scheduling of all vpods
// It is used by for the scheduler and the autoscaler // It is used by for the scheduler and the autoscaler
type State struct { type State struct {
// free tracks the free capacity of each pod. // free tracks the free capacity of each pod.
//
// Including pods that might not exist anymore, it reflects the free capacity determined by
// placements in the vpod status.
FreeCap []int32 FreeCap []int32
// schedulable pods tracks the pods that aren't being evicted. // schedulable pods tracks the pods that aren't being evicted.
SchedulablePods []int32 SchedulablePods []int32
// LastOrdinal is the ordinal index corresponding to the last statefulset replica
// with placed vpods.
LastOrdinal int32
// Pod capacity. // Pod capacity.
Capacity int32 Capacity int32
// Replicas is the (cached) number of statefulset replicas. // Replicas is the (cached) number of statefulset replicas.
Replicas int32 Replicas int32
// Number of available zones in cluster
NumZones int32
// Number of available nodes in cluster
NumNodes int32
// Scheduling policy type for placing vreplicas on pods
SchedulerPolicy scheduler.SchedulerPolicyType
// Scheduling policy plugin for placing vreplicas on pods
SchedPolicy *scheduler.SchedulerPolicy
// De-scheduling policy plugin for removing vreplicas from pods
DeschedPolicy *scheduler.SchedulerPolicy
// Mapping node names of nodes currently in cluster to their zone info
NodeToZoneMap map[string]string
StatefulSetName string StatefulSetName string
PodLister corev1.PodNamespaceLister PodLister corev1.PodNamespaceLister
@ -86,12 +65,6 @@ type State struct {
// Stores for each vpod, a map of podname to number of vreplicas placed on that pod currently // Stores for each vpod, a map of podname to number of vreplicas placed on that pod currently
PodSpread map[types.NamespacedName]map[string]int32 PodSpread map[types.NamespacedName]map[string]int32
// Stores for each vpod, a map of nodename to total number of vreplicas placed on all pods running on that node currently
NodeSpread map[types.NamespacedName]map[string]int32
// Stores for each vpod, a map of zonename to total number of vreplicas placed on all pods located in that zone currently
ZoneSpread map[types.NamespacedName]map[string]int32
// Pending tracks the number of virtual replicas that haven't been scheduled yet // Pending tracks the number of virtual replicas that haven't been scheduled yet
// because there wasn't enough free capacity. // because there wasn't enough free capacity.
Pending map[types.NamespacedName]int32 Pending map[types.NamespacedName]int32
@ -114,7 +87,7 @@ func (s *State) SetFree(ordinal int32, value int32) {
s.FreeCap[int(ordinal)] = value s.FreeCap[int(ordinal)] = value
} }
// freeCapacity returns the number of vreplicas that can be used, // FreeCapacity returns the number of vreplicas that can be used,
// up to the last ordinal // up to the last ordinal
func (s *State) FreeCapacity() int32 { func (s *State) FreeCapacity() int32 {
t := int32(0) t := int32(0)
@ -124,20 +97,6 @@ func (s *State) FreeCapacity() int32 {
return t return t
} }
func (s *State) GetPodInfo(podName string) (zoneName string, nodeName string, err error) {
pod, err := s.PodLister.Get(podName)
if err != nil {
return zoneName, nodeName, err
}
nodeName = pod.Spec.NodeName
zoneName, ok := s.NodeToZoneMap[nodeName]
if !ok {
return zoneName, nodeName, errors.New("could not find zone")
}
return zoneName, nodeName, nil
}
func (s *State) IsSchedulablePod(ordinal int32) bool { func (s *State) IsSchedulablePod(ordinal int32) bool {
for _, x := range s.SchedulablePods { for _, x := range s.SchedulablePods {
if x == ordinal { if x == ordinal {
@ -151,32 +110,24 @@ func (s *State) IsSchedulablePod(ordinal int32) bool {
type stateBuilder struct { type stateBuilder struct {
vpodLister scheduler.VPodLister vpodLister scheduler.VPodLister
capacity int32 capacity int32
schedulerPolicy scheduler.SchedulerPolicyType
nodeLister corev1.NodeLister
statefulSetCache *scheduler.ScaleCache statefulSetCache *scheduler.ScaleCache
statefulSetName string statefulSetName string
podLister corev1.PodNamespaceLister podLister corev1.PodNamespaceLister
schedPolicy *scheduler.SchedulerPolicy
deschedPolicy *scheduler.SchedulerPolicy
} }
// NewStateBuilder returns a StateAccessor recreating the state from scratch each time it is requested // NewStateBuilder returns a StateAccessor recreating the state from scratch each time it is requested
func NewStateBuilder(sfsname string, lister scheduler.VPodLister, podCapacity int32, schedulerPolicy scheduler.SchedulerPolicyType, schedPolicy, deschedPolicy *scheduler.SchedulerPolicy, podlister corev1.PodNamespaceLister, nodeLister corev1.NodeLister, statefulSetCache *scheduler.ScaleCache) StateAccessor { func NewStateBuilder(sfsname string, lister scheduler.VPodLister, podCapacity int32, podlister corev1.PodNamespaceLister, statefulSetCache *scheduler.ScaleCache) StateAccessor {
return &stateBuilder{ return &stateBuilder{
vpodLister: lister, vpodLister: lister,
capacity: podCapacity, capacity: podCapacity,
schedulerPolicy: schedulerPolicy,
nodeLister: nodeLister,
statefulSetCache: statefulSetCache, statefulSetCache: statefulSetCache,
statefulSetName: sfsname, statefulSetName: sfsname,
podLister: podlister, podLister: podlister,
schedPolicy: schedPolicy,
deschedPolicy: deschedPolicy,
} }
} }
func (s *stateBuilder) State(ctx context.Context, reserved map[types.NamespacedName]map[string]int32) (*State, error) { func (s *stateBuilder) State(ctx context.Context) (*State, error) {
vpods, err := s.vpodLister() vpods, err := s.vpodLister()
if err != nil { if err != nil {
return nil, err return nil, err
@ -191,44 +142,12 @@ func (s *stateBuilder) State(ctx context.Context, reserved map[types.NamespacedN
return nil, err return nil, err
} }
free := make([]int32, 0) freeCap := make([]int32, 0)
pending := make(map[types.NamespacedName]int32, 4) pending := make(map[types.NamespacedName]int32, 4)
expectedVReplicasByVPod := make(map[types.NamespacedName]int32, len(vpods)) expectedVReplicasByVPod := make(map[types.NamespacedName]int32, len(vpods))
schedulablePods := sets.NewInt32() schedulablePods := sets.NewInt32()
last := int32(-1)
// keep track of (vpod key, podname) pairs with existing placements
withPlacement := make(map[types.NamespacedName]map[string]bool)
podSpread := make(map[types.NamespacedName]map[string]int32) podSpread := make(map[types.NamespacedName]map[string]int32)
nodeSpread := make(map[types.NamespacedName]map[string]int32)
zoneSpread := make(map[types.NamespacedName]map[string]int32)
//Build the node to zone map
nodes, err := s.nodeLister.List(labels.Everything())
if err != nil {
return nil, err
}
nodeToZoneMap := make(map[string]string)
zoneMap := make(map[string]struct{})
for i := 0; i < len(nodes); i++ {
node := nodes[i]
if isNodeUnschedulable(node) {
// Ignore node that is currently unschedulable.
continue
}
zoneName, ok := node.GetLabels()[scheduler.ZoneLabel]
if ok && zoneName != "" {
nodeToZoneMap[node.Name] = zoneName
zoneMap[zoneName] = struct{}{}
} else {
nodeToZoneMap[node.Name] = scheduler.UnknownZone
zoneMap[scheduler.UnknownZone] = struct{}{}
}
}
for podId := int32(0); podId < scale.Spec.Replicas && s.podLister != nil; podId++ { for podId := int32(0); podId < scale.Spec.Replicas && s.podLister != nil; podId++ {
pod, err := s.podLister.Get(PodNameFromOrdinal(s.statefulSetName, podId)) pod, err := s.podLister.Get(PodNameFromOrdinal(s.statefulSetName, podId))
@ -242,24 +161,13 @@ func (s *stateBuilder) State(ctx context.Context, reserved map[types.NamespacedN
continue continue
} }
node, err := s.nodeLister.Get(pod.Spec.NodeName)
if err != nil {
return nil, err
}
if isNodeUnschedulable(node) {
// Node is marked as Unschedulable - CANNOT SCHEDULE VREPS on a pod running on this node.
logger.Debugw("Pod is on an unschedulable node", zap.Any("pod", node))
continue
}
// Pod has no annotation or not annotated as unschedulable and // Pod has no annotation or not annotated as unschedulable and
// not on an unschedulable node, so add to feasible // not on an unschedulable node, so add to feasible
schedulablePods.Insert(podId) schedulablePods.Insert(podId)
} }
for _, p := range schedulablePods.List() { for _, p := range schedulablePods.List() {
free, last = s.updateFreeCapacity(logger, free, last, PodNameFromOrdinal(s.statefulSetName, p), 0) freeCap = s.updateFreeCapacity(logger, freeCap, PodNameFromOrdinal(s.statefulSetName, p), 0)
} }
// Getting current state from existing placements for all vpods // Getting current state from existing placements for all vpods
@ -269,21 +177,13 @@ func (s *stateBuilder) State(ctx context.Context, reserved map[types.NamespacedN
pending[vpod.GetKey()] = pendingFromVPod(vpod) pending[vpod.GetKey()] = pendingFromVPod(vpod)
expectedVReplicasByVPod[vpod.GetKey()] = vpod.GetVReplicas() expectedVReplicasByVPod[vpod.GetKey()] = vpod.GetVReplicas()
withPlacement[vpod.GetKey()] = make(map[string]bool)
podSpread[vpod.GetKey()] = make(map[string]int32) podSpread[vpod.GetKey()] = make(map[string]int32)
nodeSpread[vpod.GetKey()] = make(map[string]int32)
zoneSpread[vpod.GetKey()] = make(map[string]int32)
for i := 0; i < len(ps); i++ { for i := 0; i < len(ps); i++ {
podName := ps[i].PodName podName := ps[i].PodName
vreplicas := ps[i].VReplicas vreplicas := ps[i].VReplicas
// Account for reserved vreplicas freeCap = s.updateFreeCapacity(logger, freeCap, podName, vreplicas)
vreplicas = withReserved(vpod.GetKey(), podName, vreplicas, reserved)
free, last = s.updateFreeCapacity(logger, free, last, podName, vreplicas)
withPlacement[vpod.GetKey()][podName] = true
pod, err := s.podLister.Get(podName) pod, err := s.podLister.Get(podName)
if err != nil { if err != nil {
@ -291,47 +191,24 @@ func (s *stateBuilder) State(ctx context.Context, reserved map[types.NamespacedN
} }
if pod != nil && schedulablePods.Has(OrdinalFromPodName(pod.GetName())) { if pod != nil && schedulablePods.Has(OrdinalFromPodName(pod.GetName())) {
nodeName := pod.Spec.NodeName //node name for this pod
zoneName := nodeToZoneMap[nodeName] //zone name for this pod
podSpread[vpod.GetKey()][podName] = podSpread[vpod.GetKey()][podName] + vreplicas podSpread[vpod.GetKey()][podName] = podSpread[vpod.GetKey()][podName] + vreplicas
nodeSpread[vpod.GetKey()][nodeName] = nodeSpread[vpod.GetKey()][nodeName] + vreplicas
zoneSpread[vpod.GetKey()][zoneName] = zoneSpread[vpod.GetKey()][zoneName] + vreplicas
} }
} }
} }
// Account for reserved vreplicas with no prior placements state := &State{
for key, ps := range reserved { FreeCap: freeCap,
for podName, rvreplicas := range ps { SchedulablePods: schedulablePods.List(),
if wp, ok := withPlacement[key]; ok { Capacity: s.capacity,
if _, ok := wp[podName]; ok { Replicas: scale.Spec.Replicas,
// already accounted for StatefulSetName: s.statefulSetName,
continue PodLister: s.podLister,
} PodSpread: podSpread,
Pending: pending,
pod, err := s.podLister.Get(podName) ExpectedVReplicaByVPod: expectedVReplicasByVPod,
if err != nil {
logger.Warnw("Failed to get pod", zap.String("podName", podName), zap.Error(err))
}
if pod != nil && schedulablePods.Has(OrdinalFromPodName(pod.GetName())) {
nodeName := pod.Spec.NodeName //node name for this pod
zoneName := nodeToZoneMap[nodeName] //zone name for this pod
podSpread[key][podName] = podSpread[key][podName] + rvreplicas
nodeSpread[key][nodeName] = nodeSpread[key][nodeName] + rvreplicas
zoneSpread[key][zoneName] = zoneSpread[key][zoneName] + rvreplicas
}
}
free, last = s.updateFreeCapacity(logger, free, last, podName, rvreplicas)
}
} }
state := &State{FreeCap: free, SchedulablePods: schedulablePods.List(), LastOrdinal: last, Capacity: s.capacity, Replicas: scale.Spec.Replicas, NumZones: int32(len(zoneMap)), NumNodes: int32(len(nodeToZoneMap)), logger.Infow("cluster state info", zap.Any("state", state))
SchedulerPolicy: s.schedulerPolicy, SchedPolicy: s.schedPolicy, DeschedPolicy: s.deschedPolicy, NodeToZoneMap: nodeToZoneMap, StatefulSetName: s.statefulSetName, PodLister: s.podLister,
PodSpread: podSpread, NodeSpread: nodeSpread, ZoneSpread: zoneSpread, Pending: pending, ExpectedVReplicaByVPod: expectedVReplicasByVPod}
logger.Infow("cluster state info", zap.Any("state", state), zap.Any("reserved", toJSONable(reserved)))
return state, nil return state, nil
} }
@ -343,23 +220,19 @@ func pendingFromVPod(vpod scheduler.VPod) int32 {
return int32(math.Max(float64(0), float64(expected-scheduled))) return int32(math.Max(float64(0), float64(expected-scheduled)))
} }
func (s *stateBuilder) updateFreeCapacity(logger *zap.SugaredLogger, free []int32, last int32, podName string, vreplicas int32) ([]int32, int32) { func (s *stateBuilder) updateFreeCapacity(logger *zap.SugaredLogger, freeCap []int32, podName string, vreplicas int32) []int32 {
ordinal := OrdinalFromPodName(podName) ordinal := OrdinalFromPodName(podName)
free = grow(free, ordinal, s.capacity) freeCap = grow(freeCap, ordinal, s.capacity)
free[ordinal] -= vreplicas freeCap[ordinal] -= vreplicas
// Assert the pod is not overcommitted // Assert the pod is not overcommitted
if free[ordinal] < 0 { if overcommit := freeCap[ordinal]; overcommit < 0 {
// This should not happen anymore. Log as an error but do not interrupt the current scheduling. // This should not happen anymore. Log as an error but do not interrupt the current scheduling.
logger.Warnw("pod is overcommitted", zap.String("podName", podName), zap.Int32("free", free[ordinal])) logger.Warnw("pod is overcommitted", zap.String("podName", podName), zap.Int32("overcommit", overcommit))
} }
if ordinal > last { return freeCap
last = ordinal
}
return free, last
} }
func (s *State) TotalPending() int32 { func (s *State) TotalPending() int32 {
@ -392,27 +265,6 @@ func grow(slice []int32, ordinal int32, def int32) []int32 {
return slice return slice
} }
func withReserved(key types.NamespacedName, podName string, committed int32, reserved map[types.NamespacedName]map[string]int32) int32 {
if reserved != nil {
if rps, ok := reserved[key]; ok {
if rvreplicas, ok := rps[podName]; ok {
if committed == rvreplicas {
// new placement has been committed.
delete(rps, podName)
if len(rps) == 0 {
delete(reserved, key)
}
} else {
// new placement hasn't been committed yet. Adjust locally
// needed for descheduling vreps using policies
return rvreplicas
}
}
}
}
return committed
}
func isPodUnschedulable(pod *v1.Pod) bool { func isPodUnschedulable(pod *v1.Pod) bool {
annotVal, ok := pod.ObjectMeta.Annotations[scheduler.PodAnnotationKey] annotVal, ok := pod.ObjectMeta.Annotations[scheduler.PodAnnotationKey]
unschedulable, err := strconv.ParseBool(annotVal) unschedulable, err := strconv.ParseBool(annotVal)
@ -423,75 +275,32 @@ func isPodUnschedulable(pod *v1.Pod) bool {
return isMarkedUnschedulable || isPending return isMarkedUnschedulable || isPending
} }
func isNodeUnschedulable(node *v1.Node) bool {
noExec := &v1.Taint{
Key: "node.kubernetes.io/unreachable",
Effect: v1.TaintEffectNoExecute,
}
noSched := &v1.Taint{
Key: "node.kubernetes.io/unreachable",
Effect: v1.TaintEffectNoSchedule,
}
return node.Spec.Unschedulable ||
contains(node.Spec.Taints, noExec) ||
contains(node.Spec.Taints, noSched)
}
func contains(taints []v1.Taint, taint *v1.Taint) bool {
for _, v := range taints {
if v.MatchTaint(taint) {
return true
}
}
return false
}
func (s *State) MarshalJSON() ([]byte, error) { func (s *State) MarshalJSON() ([]byte, error) {
type S struct { type S struct {
FreeCap []int32 `json:"freeCap"` FreeCap []int32 `json:"freeCap"`
SchedulablePods []int32 `json:"schedulablePods"` SchedulablePods []int32 `json:"schedulablePods"`
LastOrdinal int32 `json:"lastOrdinal"` Capacity int32 `json:"capacity"`
Capacity int32 `json:"capacity"` Replicas int32 `json:"replicas"`
Replicas int32 `json:"replicas"` StatefulSetName string `json:"statefulSetName"`
NumZones int32 `json:"numZones"` PodSpread map[string]map[string]int32 `json:"podSpread"`
NumNodes int32 `json:"numNodes"` Pending map[string]int32 `json:"pending"`
NodeToZoneMap map[string]string `json:"nodeToZoneMap"`
StatefulSetName string `json:"statefulSetName"`
PodSpread map[string]map[string]int32 `json:"podSpread"`
NodeSpread map[string]map[string]int32 `json:"nodeSpread"`
ZoneSpread map[string]map[string]int32 `json:"zoneSpread"`
SchedulerPolicy scheduler.SchedulerPolicyType `json:"schedulerPolicy"`
SchedPolicy *scheduler.SchedulerPolicy `json:"schedPolicy"`
DeschedPolicy *scheduler.SchedulerPolicy `json:"deschedPolicy"`
Pending map[string]int32 `json:"pending"`
} }
sj := S{ sj := S{
FreeCap: s.FreeCap, FreeCap: s.FreeCap,
SchedulablePods: s.SchedulablePods, SchedulablePods: s.SchedulablePods,
LastOrdinal: s.LastOrdinal,
Capacity: s.Capacity, Capacity: s.Capacity,
Replicas: s.Replicas, Replicas: s.Replicas,
NumZones: s.NumZones,
NumNodes: s.NumNodes,
NodeToZoneMap: s.NodeToZoneMap,
StatefulSetName: s.StatefulSetName, StatefulSetName: s.StatefulSetName,
PodSpread: toJSONable(s.PodSpread), PodSpread: ToJSONable(s.PodSpread),
NodeSpread: toJSONable(s.NodeSpread),
ZoneSpread: toJSONable(s.ZoneSpread),
SchedulerPolicy: s.SchedulerPolicy,
SchedPolicy: s.SchedPolicy,
DeschedPolicy: s.DeschedPolicy,
Pending: toJSONablePending(s.Pending), Pending: toJSONablePending(s.Pending),
} }
return json.Marshal(sj) return json.Marshal(sj)
} }
func toJSONable(ps map[types.NamespacedName]map[string]int32) map[string]map[string]int32 { func ToJSONable(ps map[types.NamespacedName]map[string]int32) map[string]map[string]int32 {
r := make(map[string]map[string]int32, len(ps)) r := make(map[string]map[string]int32, len(ps))
for k, v := range ps { for k, v := range ps {
r[k.String()] = v r[k.String()] = v

View File

@ -46,521 +46,31 @@ const (
func TestStateBuilder(t *testing.T) { func TestStateBuilder(t *testing.T) {
testCases := []struct { testCases := []struct {
name string name string
replicas int32 replicas int32
pendingReplicas int32 pendingReplicas int32
vpods [][]duckv1alpha1.Placement vpods [][]duckv1alpha1.Placement
expected State expected State
freec int32 freec int32
schedulerPolicyType scheduler.SchedulerPolicyType err error
schedulerPolicy *scheduler.SchedulerPolicy
deschedulerPolicy *scheduler.SchedulerPolicy
reserved map[types.NamespacedName]map[string]int32
nodes []*v1.Node
err error
}{ }{
{ {
name: "no vpods", name: "no vpods",
replicas: int32(0), replicas: int32(0),
vpods: [][]duckv1alpha1.Placement{}, vpods: [][]duckv1alpha1.Placement{},
expected: State{Capacity: 10, FreeCap: []int32{}, SchedulablePods: []int32{}, LastOrdinal: -1, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName, Pending: map[types.NamespacedName]int32{}, ExpectedVReplicaByVPod: map[types.NamespacedName]int32{}}, expected: State{Capacity: 10, FreeCap: []int32{}, SchedulablePods: []int32{}, StatefulSetName: sfsName, Pending: map[types.NamespacedName]int32{}, ExpectedVReplicaByVPod: map[types.NamespacedName]int32{}},
freec: int32(0), freec: int32(0),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "one vpods", name: "one vpods",
replicas: int32(1), replicas: int32(1),
vpods: [][]duckv1alpha1.Placement{{{PodName: "statefulset-name-0", VReplicas: 1}}}, vpods: [][]duckv1alpha1.Placement{{{PodName: "statefulset-name-0", VReplicas: 1}}},
expected: State{Capacity: 10, FreeCap: []int32{int32(9)}, SchedulablePods: []int32{int32(0)}, LastOrdinal: 0, Replicas: 1, NumNodes: 1, NumZones: 1, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName, expected: State{Capacity: 10, FreeCap: []int32{int32(9)}, SchedulablePods: []int32{int32(0)}, Replicas: 1, StatefulSetName: sfsName,
NodeToZoneMap: map[string]string{"node-0": "zone-0"},
PodSpread: map[types.NamespacedName]map[string]int32{ PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: { {Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 1, "statefulset-name-0": 1,
}, },
}, },
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"node-0": 1,
},
},
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"zone-0": 1,
},
},
Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
},
},
freec: int32(9),
schedulerPolicyType: scheduler.MAXFILLUP,
nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0")},
},
{
name: "many vpods, no gaps",
replicas: int32(3),
vpods: [][]duckv1alpha1.Placement{
{{PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-2", VReplicas: 5}},
{{PodName: "statefulset-name-1", VReplicas: 2}},
{{PodName: "statefulset-name-1", VReplicas: 3}, {PodName: "statefulset-name-0", VReplicas: 1}},
},
expected: State{Capacity: 10, FreeCap: []int32{int32(8), int32(5), int32(5)}, SchedulablePods: []int32{int32(0), int32(1), int32(2)}, LastOrdinal: 2, Replicas: 3, NumNodes: 3, NumZones: 3, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName,
NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": "zone-1", "node-2": "zone-2"},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 1,
"statefulset-name-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"statefulset-name-1": 2,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"statefulset-name-0": 1,
"statefulset-name-1": 3,
},
},
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"node-0": 1,
"node-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"node-1": 2,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"node-0": 1,
"node-1": 3,
},
},
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"zone-0": 1,
"zone-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"zone-1": 2,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"zone-0": 1,
"zone-1": 3,
},
},
Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 0,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 0,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
},
freec: int32(18),
schedulerPolicyType: scheduler.MAXFILLUP,
nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNode("node-1", "zone-1"), tscheduler.MakeNode("node-2", "zone-2")},
},
{
name: "many vpods, unschedulable pending pods (statefulset-name-0)",
replicas: int32(3),
pendingReplicas: int32(1),
vpods: [][]duckv1alpha1.Placement{
{{PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-2", VReplicas: 5}},
{{PodName: "statefulset-name-1", VReplicas: 2}},
{{PodName: "statefulset-name-1", VReplicas: 3}, {PodName: "statefulset-name-0", VReplicas: 1}},
},
expected: State{Capacity: 10, FreeCap: []int32{int32(8), int32(5), int32(5)}, SchedulablePods: []int32{int32(1), int32(2)}, LastOrdinal: 2, Replicas: 3, NumNodes: 3, NumZones: 3, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName,
NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": "zone-1", "node-2": "zone-2"},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"statefulset-name-1": 2,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"statefulset-name-1": 3,
},
},
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"node-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"node-1": 2,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"node-1": 3,
},
},
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"zone-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"zone-1": 2,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"zone-1": 3,
},
},
Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 0,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 0,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
},
freec: int32(10),
schedulerPolicyType: scheduler.MAXFILLUP,
nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNode("node-1", "zone-1"), tscheduler.MakeNode("node-2", "zone-2")},
},
{
name: "many vpods, with gaps",
replicas: int32(4),
vpods: [][]duckv1alpha1.Placement{
{{PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-2", VReplicas: 5}},
{{PodName: "statefulset-name-1", VReplicas: 0}},
{{PodName: "statefulset-name-1", VReplicas: 0}, {PodName: "statefulset-name-3", VReplicas: 0}},
},
expected: State{Capacity: 10, FreeCap: []int32{int32(9), int32(10), int32(5), int32(10)}, SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3)}, LastOrdinal: 3, Replicas: 4, NumNodes: 4, NumZones: 3, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName,
NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": "zone-1", "node-2": "zone-2", "node-3": "zone-0"},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 1,
"statefulset-name-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"statefulset-name-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"statefulset-name-1": 0,
"statefulset-name-3": 0,
},
},
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"node-0": 1,
"node-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"node-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"node-1": 0,
"node-3": 0,
},
},
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"zone-0": 1,
"zone-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"zone-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"zone-0": 0,
"zone-1": 0,
},
},
Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
},
freec: int32(34),
schedulerPolicyType: scheduler.MAXFILLUP,
nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNode("node-1", "zone-1"), tscheduler.MakeNode("node-2", "zone-2"), tscheduler.MakeNode("node-3", "zone-0")},
},
{
name: "many vpods, with gaps and reserved vreplicas",
replicas: int32(4),
vpods: [][]duckv1alpha1.Placement{
{{PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-2", VReplicas: 5}},
{{PodName: "statefulset-name-1", VReplicas: 0}},
{{PodName: "statefulset-name-1", VReplicas: 0}, {PodName: "statefulset-name-3", VReplicas: 0}},
},
expected: State{Capacity: 10, FreeCap: []int32{int32(3), int32(10), int32(5), int32(10)}, SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3)}, LastOrdinal: 3, Replicas: 4, NumNodes: 4, NumZones: 3, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName,
NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": "zone-1", "node-2": "zone-2", "node-3": "zone-0"},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 2,
"statefulset-name-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"statefulset-name-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"statefulset-name-1": 0,
"statefulset-name-3": 0,
},
},
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"node-0": 2,
"node-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"node-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"node-1": 0,
"node-3": 0,
},
},
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"zone-0": 2,
"zone-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"zone-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"zone-0": 0,
"zone-1": 0,
},
},
Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
},
freec: int32(28),
reserved: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 2,
"statefulset-name-2": 5,
},
{Name: vpodName + "-3", Namespace: vpodNs + "-3"}: {
"statefulset-name-0": 5,
},
},
schedulerPolicyType: scheduler.MAXFILLUP,
nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNode("node-1", "zone-1"), tscheduler.MakeNode("node-2", "zone-2"), tscheduler.MakeNode("node-3", "zone-0")},
},
{
name: "many vpods, with gaps and reserved vreplicas on existing and new placements, fully committed",
replicas: int32(4),
vpods: [][]duckv1alpha1.Placement{
{{PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-2", VReplicas: 5}},
{{PodName: "statefulset-name-1", VReplicas: 0}},
{{PodName: "statefulset-name-1", VReplicas: 0}, {PodName: "statefulset-name-3", VReplicas: 0}},
},
expected: State{Capacity: 10, FreeCap: []int32{int32(4), int32(7), int32(5), int32(10), int32(5)}, SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3)}, LastOrdinal: 4, Replicas: 4, NumNodes: 4, NumZones: 3, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName,
NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": "zone-1", "node-2": "zone-2", "node-3": "zone-0"},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 1,
"statefulset-name-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"statefulset-name-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"statefulset-name-1": 0,
"statefulset-name-3": 0,
},
},
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"node-0": 1,
"node-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"node-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"node-1": 0,
"node-3": 0,
},
},
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"zone-0": 1,
"zone-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"zone-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"zone-0": 0,
"zone-1": 0,
},
},
Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
},
freec: int32(26),
reserved: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-3", Namespace: "vpod-ns-3"}: {
"statefulset-name-4": 5,
},
{Name: "vpod-name-4", Namespace: "vpod-ns-4"}: {
"statefulset-name-0": 5,
"statefulset-name-1": 3,
},
},
schedulerPolicyType: scheduler.MAXFILLUP,
nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNode("node-1", "zone-1"), tscheduler.MakeNode("node-2", "zone-2"), tscheduler.MakeNode("node-3", "zone-0")},
},
{
name: "many vpods, with gaps and reserved vreplicas on existing and new placements, partially committed",
replicas: int32(5),
vpods: [][]duckv1alpha1.Placement{
{{PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-2", VReplicas: 5}},
{{PodName: "statefulset-name-1", VReplicas: 0}},
{{PodName: "statefulset-name-1", VReplicas: 0}, {PodName: "statefulset-name-3", VReplicas: 0}},
},
expected: State{Capacity: 10, FreeCap: []int32{int32(4), int32(7), int32(5), int32(10), int32(2)}, SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3), int32(4)}, LastOrdinal: 4, Replicas: 5, NumNodes: 5, NumZones: 3, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName,
NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": "zone-1", "node-2": "zone-2", "node-3": "zone-0", "node-4": "zone-1"},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 1,
"statefulset-name-2": 5,
"statefulset-name-4": 8,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"statefulset-name-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"statefulset-name-1": 0,
"statefulset-name-3": 0,
},
},
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"node-0": 1,
"node-2": 5,
"node-4": 8,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"node-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"node-1": 0,
"node-3": 0,
},
},
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"zone-0": 1,
"zone-1": 8,
"zone-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"zone-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"zone-0": 0,
"zone-1": 0,
},
},
Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
},
freec: int32(28),
reserved: map[types.NamespacedName]map[string]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: {
"statefulset-name-4": 8,
},
{Name: "vpod-name-4", Namespace: "vpod-ns-4"}: {
"statefulset-name-0": 5,
"statefulset-name-1": 3,
},
},
schedulerPolicyType: scheduler.MAXFILLUP,
nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNode("node-1", "zone-1"), tscheduler.MakeNode("node-2", "zone-2"), tscheduler.MakeNode("node-3", "zone-0"), tscheduler.MakeNode("node-4", "zone-1")},
},
{
name: "three vpods but one tainted and one with no zone label",
replicas: int32(1),
vpods: [][]duckv1alpha1.Placement{{{PodName: "statefulset-name-0", VReplicas: 1}}},
expected: State{Capacity: 10, FreeCap: []int32{int32(9)}, SchedulablePods: []int32{int32(0)}, LastOrdinal: 0, Replicas: 1, NumNodes: 2, NumZones: 2, SchedulerPolicy: scheduler.MAXFILLUP, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName,
NodeToZoneMap: map[string]string{"node-0": "zone-0", "node-1": scheduler.UnknownZone},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 1,
},
},
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"node-0": 1,
},
},
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"zone-0": 1,
},
},
Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
},
},
freec: int32(9),
schedulerPolicyType: scheduler.MAXFILLUP,
nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0"), tscheduler.MakeNodeNoLabel("node-1"), tscheduler.MakeNodeTainted("node-2", "zone-2")},
},
{
name: "one vpod (HA)",
replicas: int32(1),
vpods: [][]duckv1alpha1.Placement{{{PodName: "statefulset-name-0", VReplicas: 1}}},
expected: State{Capacity: 10, FreeCap: []int32{int32(9)}, SchedulablePods: []int32{int32(0)}, LastOrdinal: 0, Replicas: 1, NumNodes: 1, NumZones: 1, SchedPolicy: &scheduler.SchedulerPolicy{}, DeschedPolicy: &scheduler.SchedulerPolicy{}, StatefulSetName: sfsName,
NodeToZoneMap: map[string]string{"node-0": "zone-0"},
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 1,
},
},
NodeSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"node-0": 1,
},
},
ZoneSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"zone-0": 1,
},
},
Pending: map[types.NamespacedName]int32{ Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0, {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
}, },
@ -569,17 +79,148 @@ func TestStateBuilder(t *testing.T) {
}, },
}, },
freec: int32(9), freec: int32(9),
schedulerPolicy: &scheduler.SchedulerPolicy{ },
Predicates: []scheduler.PredicatePolicy{ {
{Name: "PodFitsResources"}, name: "many vpods, no gaps",
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"}, replicas: int32(3),
vpods: [][]duckv1alpha1.Placement{
{{PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-2", VReplicas: 5}},
{{PodName: "statefulset-name-1", VReplicas: 2}},
{{PodName: "statefulset-name-1", VReplicas: 3}, {PodName: "statefulset-name-0", VReplicas: 1}},
},
expected: State{Capacity: 10, FreeCap: []int32{int32(8), int32(5), int32(5)}, SchedulablePods: []int32{int32(0), int32(1), int32(2)}, Replicas: 3, StatefulSetName: sfsName,
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 1,
"statefulset-name-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"statefulset-name-1": 2,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"statefulset-name-0": 1,
"statefulset-name-1": 3,
},
}, },
Priorities: []scheduler.PriorityPolicy{ Pending: map[types.NamespacedName]int32{
{Name: "AvailabilityNodePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"}, {Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
{Name: "LowestOrdinalPriority", Weight: 5}, {Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 0,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 0,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
}, },
}, },
nodes: []*v1.Node{tscheduler.MakeNode("node-0", "zone-0")}, freec: int32(18),
},
{
name: "many vpods, unschedulable pending pods (statefulset-name-0)",
replicas: int32(3),
pendingReplicas: int32(1),
vpods: [][]duckv1alpha1.Placement{
{{PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-2", VReplicas: 5}},
{{PodName: "statefulset-name-1", VReplicas: 2}},
{{PodName: "statefulset-name-1", VReplicas: 3}, {PodName: "statefulset-name-0", VReplicas: 1}},
},
expected: State{Capacity: 10, FreeCap: []int32{int32(8), int32(5), int32(5)}, SchedulablePods: []int32{int32(1), int32(2)}, Replicas: 3, StatefulSetName: sfsName,
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"statefulset-name-1": 2,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"statefulset-name-1": 3,
},
},
Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 0,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 0,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
},
freec: int32(10),
},
{
name: "many vpods, with gaps",
replicas: int32(4),
vpods: [][]duckv1alpha1.Placement{
{{PodName: "statefulset-name-0", VReplicas: 1}, {PodName: "statefulset-name-2", VReplicas: 5}},
{{PodName: "statefulset-name-1", VReplicas: 0}},
{{PodName: "statefulset-name-1", VReplicas: 0}, {PodName: "statefulset-name-3", VReplicas: 0}},
},
expected: State{Capacity: 10, FreeCap: []int32{int32(9), int32(10), int32(5), int32(10)}, SchedulablePods: []int32{int32(0), int32(1), int32(2), int32(3)}, Replicas: 4, StatefulSetName: sfsName,
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 1,
"statefulset-name-2": 5,
},
{Name: vpodName + "-1", Namespace: vpodNs + "-1"}: {
"statefulset-name-1": 0,
},
{Name: vpodName + "-2", Namespace: vpodNs + "-2"}: {
"statefulset-name-1": 0,
"statefulset-name-3": 0,
},
},
Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
{Name: "vpod-name-1", Namespace: "vpod-ns-1"}: 1,
{Name: "vpod-name-2", Namespace: "vpod-ns-2"}: 1,
},
},
freec: int32(34),
},
{
name: "three vpods but one tainted and one with no zone label",
replicas: int32(1),
vpods: [][]duckv1alpha1.Placement{{{PodName: "statefulset-name-0", VReplicas: 1}}},
expected: State{Capacity: 10, FreeCap: []int32{int32(9)}, SchedulablePods: []int32{int32(0)}, Replicas: 1, StatefulSetName: sfsName,
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 1,
},
},
Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
},
},
freec: int32(9),
},
{
name: "one vpod (HA)",
replicas: int32(1),
vpods: [][]duckv1alpha1.Placement{{{PodName: "statefulset-name-0", VReplicas: 1}}},
expected: State{Capacity: 10, FreeCap: []int32{int32(9)}, SchedulablePods: []int32{int32(0)}, Replicas: 1, StatefulSetName: sfsName,
PodSpread: map[types.NamespacedName]map[string]int32{
{Name: vpodName + "-0", Namespace: vpodNs + "-0"}: {
"statefulset-name-0": 1,
},
},
Pending: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 0,
},
ExpectedVReplicaByVPod: map[types.NamespacedName]int32{
{Name: "vpod-name-0", Namespace: "vpod-ns-0"}: 1,
},
},
freec: int32(9),
}, },
} }
@ -587,7 +228,6 @@ func TestStateBuilder(t *testing.T) {
t.Run(tc.name, func(t *testing.T) { t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t) ctx, _ := tscheduler.SetupFakeContext(t)
vpodClient := tscheduler.NewVPodClient() vpodClient := tscheduler.NewVPodClient()
nodelist := make([]runtime.Object, 0, len(tc.nodes))
podlist := make([]runtime.Object, 0, tc.replicas) podlist := make([]runtime.Object, 0, tc.replicas)
if tc.pendingReplicas > tc.replicas { if tc.pendingReplicas > tc.replicas {
@ -610,14 +250,6 @@ func TestStateBuilder(t *testing.T) {
} }
} }
for i := 0; i < len(tc.nodes); i++ {
node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tc.nodes[i], metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
nodelist = append(nodelist, node)
}
for i := tc.replicas - 1; i >= 0; i-- { for i := tc.replicas - 1; i >= 0; i-- {
var pod *v1.Pod var pod *v1.Pod
var err error var err error
@ -641,12 +273,11 @@ func TestStateBuilder(t *testing.T) {
} }
lsp := listers.NewListers(podlist) lsp := listers.NewListers(podlist)
lsn := listers.NewListers(nodelist)
scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5}) scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5})
stateBuilder := NewStateBuilder(sfsName, vpodClient.List, int32(10), tc.schedulerPolicyType, &scheduler.SchedulerPolicy{}, &scheduler.SchedulerPolicy{}, lsp.GetPodLister().Pods(testNs), lsn.GetNodeLister(), scaleCache) stateBuilder := NewStateBuilder(sfsName, vpodClient.List, int32(10), lsp.GetPodLister().Pods(testNs), scaleCache)
state, err := stateBuilder.State(ctx, tc.reserved) state, err := stateBuilder.State(ctx)
if err != nil { if err != nil {
t.Fatal("unexpected error", err) t.Fatal("unexpected error", err)
} }
@ -658,15 +289,6 @@ func TestStateBuilder(t *testing.T) {
if tc.expected.PodSpread == nil { if tc.expected.PodSpread == nil {
tc.expected.PodSpread = make(map[types.NamespacedName]map[string]int32) tc.expected.PodSpread = make(map[types.NamespacedName]map[string]int32)
} }
if tc.expected.NodeSpread == nil {
tc.expected.NodeSpread = make(map[types.NamespacedName]map[string]int32)
}
if tc.expected.ZoneSpread == nil {
tc.expected.ZoneSpread = make(map[types.NamespacedName]map[string]int32)
}
if tc.expected.NodeToZoneMap == nil {
tc.expected.NodeToZoneMap = make(map[string]string)
}
if !reflect.DeepEqual(*state, tc.expected) { if !reflect.DeepEqual(*state, tc.expected) {
diff := cmp.Diff(tc.expected, *state, cmpopts.IgnoreInterfaces(struct{ corev1.PodNamespaceLister }{})) diff := cmp.Diff(tc.expected, *state, cmpopts.IgnoreInterfaces(struct{ corev1.PodNamespaceLister }{}))
t.Errorf("unexpected state, got %v, want %v\n(-want, +got)\n%s", *state, tc.expected, diff) t.Errorf("unexpected state, got %v, want %v\n(-want, +got)\n%s", *state, tc.expected, diff)
@ -675,14 +297,6 @@ func TestStateBuilder(t *testing.T) {
if state.FreeCapacity() != tc.freec { if state.FreeCapacity() != tc.freec {
t.Errorf("unexpected free capacity, got %d, want %d", state.FreeCapacity(), tc.freec) t.Errorf("unexpected free capacity, got %d, want %d", state.FreeCapacity(), tc.freec)
} }
if tc.schedulerPolicy != nil && !SatisfyZoneAvailability(state.SchedulablePods, state) {
t.Errorf("unexpected state, got %v, want %v", *state, tc.expected)
}
if tc.schedulerPolicy != nil && !SatisfyNodeAvailability(state.SchedulablePods, state) {
t.Errorf("unexpected state, got %v, want %v", *state, tc.expected)
}
}) })
} }
} }

View File

@ -26,8 +26,10 @@ import (
"go.uber.org/zap" "go.uber.org/zap"
v1 "k8s.io/api/core/v1" v1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types" "k8s.io/apimachinery/pkg/types"
"k8s.io/utils/integer"
"knative.dev/pkg/logging" "knative.dev/pkg/logging"
"knative.dev/pkg/reconciler" "knative.dev/pkg/reconciler"
@ -62,7 +64,8 @@ type autoscaler struct {
evictor scheduler.Evictor evictor scheduler.Evictor
// capacity is the total number of virtual replicas available per pod. // capacity is the total number of virtual replicas available per pod.
capacity int32 capacity int32
minReplicas int32
// refreshPeriod is how often the autoscaler tries to scale down the statefulset // refreshPeriod is how often the autoscaler tries to scale down the statefulset
refreshPeriod time.Duration refreshPeriod time.Duration
@ -113,6 +116,7 @@ func newAutoscaler(cfg *Config, stateAccessor st.StateAccessor, statefulSetCache
evictor: cfg.Evictor, evictor: cfg.Evictor,
trigger: make(chan context.Context, 1), trigger: make(chan context.Context, 1),
capacity: cfg.PodCapacity, capacity: cfg.PodCapacity,
minReplicas: cfg.MinReplicas,
refreshPeriod: cfg.RefreshPeriod, refreshPeriod: cfg.RefreshPeriod,
retryPeriod: cfg.RetryPeriod, retryPeriod: cfg.RetryPeriod,
lock: new(sync.Mutex), lock: new(sync.Mutex),
@ -188,7 +192,7 @@ func (a *autoscaler) doautoscale(ctx context.Context, attemptScaleDown bool) err
logger := logging.FromContext(ctx).With("component", "autoscaler") logger := logging.FromContext(ctx).With("component", "autoscaler")
ctx = logging.WithLogger(ctx, logger) ctx = logging.WithLogger(ctx, logger)
state, err := a.stateAccessor.State(ctx, a.getReserved()) state, err := a.stateAccessor.State(ctx)
if err != nil { if err != nil {
logger.Info("error while refreshing scheduler state (will retry)", zap.Error(err)) logger.Info("error while refreshing scheduler state (will retry)", zap.Error(err))
return err return err
@ -205,46 +209,15 @@ func (a *autoscaler) doautoscale(ctx context.Context, attemptScaleDown bool) err
zap.Int32("replicas", scale.Spec.Replicas), zap.Int32("replicas", scale.Spec.Replicas),
zap.Any("state", state)) zap.Any("state", state))
var scaleUpFactor, newreplicas, minNumPods int32 newReplicas := integer.Int32Max(int32(math.Ceil(float64(state.TotalExpectedVReplicas())/float64(state.Capacity))), a.minReplicas)
scaleUpFactor = 1 // Non-HA scaling
if state.SchedPolicy != nil && contains(nil, state.SchedPolicy.Priorities, st.AvailabilityZonePriority) { //HA scaling across zones
scaleUpFactor = state.NumZones
}
if state.SchedPolicy != nil && contains(nil, state.SchedPolicy.Priorities, st.AvailabilityNodePriority) { //HA scaling across nodes
scaleUpFactor = state.NumNodes
}
newreplicas = state.LastOrdinal + 1 // Ideal number
if state.SchedulerPolicy == scheduler.MAXFILLUP {
newreplicas = int32(math.Ceil(float64(state.TotalExpectedVReplicas()) / float64(state.Capacity)))
} else {
// Take into account pending replicas and pods that are already filled (for even pod spread)
pending := state.TotalPending()
if pending > 0 {
// Make sure to allocate enough pods for holding all pending replicas.
if state.SchedPolicy != nil && contains(state.SchedPolicy.Predicates, nil, st.EvenPodSpread) && len(state.FreeCap) > 0 { //HA scaling across pods
leastNonZeroCapacity := a.minNonZeroInt(state.FreeCap)
minNumPods = int32(math.Ceil(float64(pending) / float64(leastNonZeroCapacity)))
} else {
minNumPods = int32(math.Ceil(float64(pending) / float64(a.capacity)))
}
newreplicas += int32(math.Ceil(float64(minNumPods)/float64(scaleUpFactor)) * float64(scaleUpFactor))
}
if newreplicas <= state.LastOrdinal {
// Make sure to never scale down past the last ordinal
newreplicas = state.LastOrdinal + scaleUpFactor
}
}
// Only scale down if permitted // Only scale down if permitted
if !attemptScaleDown && newreplicas < scale.Spec.Replicas { if !attemptScaleDown && newReplicas < scale.Spec.Replicas {
newreplicas = scale.Spec.Replicas newReplicas = scale.Spec.Replicas
} }
if newreplicas != scale.Spec.Replicas { if newReplicas != scale.Spec.Replicas {
scale.Spec.Replicas = newreplicas scale.Spec.Replicas = newReplicas
logger.Infow("updating adapter replicas", zap.Int32("replicas", scale.Spec.Replicas)) logger.Infow("updating adapter replicas", zap.Int32("replicas", scale.Spec.Replicas))
_, err = a.statefulSetCache.UpdateScale(ctx, a.statefulSetName, scale, metav1.UpdateOptions{}) _, err = a.statefulSetCache.UpdateScale(ctx, a.statefulSetName, scale, metav1.UpdateOptions{})
@ -255,12 +228,12 @@ func (a *autoscaler) doautoscale(ctx context.Context, attemptScaleDown bool) err
} else if attemptScaleDown { } else if attemptScaleDown {
// since the number of replicas hasn't changed and time has approached to scale down, // since the number of replicas hasn't changed and time has approached to scale down,
// take the opportunity to compact the vreplicas // take the opportunity to compact the vreplicas
return a.mayCompact(logger, state, scaleUpFactor) return a.mayCompact(logger, state)
} }
return nil return nil
} }
func (a *autoscaler) mayCompact(logger *zap.SugaredLogger, s *st.State, scaleUpFactor int32) error { func (a *autoscaler) mayCompact(logger *zap.SugaredLogger, s *st.State) error {
// This avoids a too aggressive scale down by adding a "grace period" based on the refresh // This avoids a too aggressive scale down by adding a "grace period" based on the refresh
// period // period
@ -275,53 +248,24 @@ func (a *autoscaler) mayCompact(logger *zap.SugaredLogger, s *st.State, scaleUpF
} }
logger.Debugw("Trying to compact and scale down", logger.Debugw("Trying to compact and scale down",
zap.Int32("scaleUpFactor", scaleUpFactor),
zap.Any("state", s), zap.Any("state", s),
) )
// when there is only one pod there is nothing to move or number of pods is just enough! // Determine if there are vpods that need compaction
if s.LastOrdinal < 1 || len(s.SchedulablePods) <= int(scaleUpFactor) { if s.Replicas != int32(len(s.FreeCap)) {
return nil a.lastCompactAttempt = time.Now()
} err := a.compact(s)
if err != nil {
if s.SchedulerPolicy == scheduler.MAXFILLUP { return fmt.Errorf("vreplicas compaction failed: %w", err)
// Determine if there is enough free capacity to
// move all vreplicas placed in the last pod to pods with a lower ordinal
freeCapacity := s.FreeCapacity() - s.Free(s.LastOrdinal)
usedInLastPod := s.Capacity - s.Free(s.LastOrdinal)
if freeCapacity >= usedInLastPod {
a.lastCompactAttempt = time.Now()
err := a.compact(s, scaleUpFactor)
if err != nil {
return fmt.Errorf("vreplicas compaction failed (scaleUpFactor %d): %w", scaleUpFactor, err)
}
}
// only do 1 replica at a time to avoid overloading the scheduler with too many
// rescheduling requests.
} else if s.SchedPolicy != nil {
//Below calculation can be optimized to work for recovery scenarios when nodes/zones are lost due to failure
freeCapacity := s.FreeCapacity()
usedInLastXPods := s.Capacity * scaleUpFactor
for i := int32(0); i < scaleUpFactor && s.LastOrdinal-i >= 0; i++ {
freeCapacity = freeCapacity - s.Free(s.LastOrdinal-i)
usedInLastXPods = usedInLastXPods - s.Free(s.LastOrdinal-i)
}
if (freeCapacity >= usedInLastXPods) && //remaining pods can hold all vreps from evicted pods
(s.Replicas-scaleUpFactor >= scaleUpFactor) { //remaining # of pods is enough for HA scaling
a.lastCompactAttempt = time.Now()
err := a.compact(s, scaleUpFactor)
if err != nil {
return fmt.Errorf("vreplicas compaction failed (scaleUpFactor %d): %w", scaleUpFactor, err)
}
} }
} }
// only do 1 replica at a time to avoid overloading the scheduler with too many
// rescheduling requests.
return nil return nil
} }
func (a *autoscaler) compact(s *st.State, scaleUpFactor int32) error { func (a *autoscaler) compact(s *st.State) error {
var pod *v1.Pod var pod *v1.Pod
vpods, err := a.vpodLister() vpods, err := a.vpodLister()
if err != nil { if err != nil {
@ -331,47 +275,20 @@ func (a *autoscaler) compact(s *st.State, scaleUpFactor int32) error {
for _, vpod := range vpods { for _, vpod := range vpods {
placements := vpod.GetPlacements() placements := vpod.GetPlacements()
for i := len(placements) - 1; i >= 0; i-- { //start from the last placement for i := len(placements) - 1; i >= 0; i-- { //start from the last placement
for j := int32(0); j < scaleUpFactor; j++ { ordinal := st.OrdinalFromPodName(placements[i].PodName)
ordinal := st.OrdinalFromPodName(placements[i].PodName)
if ordinal == s.LastOrdinal-j { if ordinal >= s.Replicas {
pod, err = s.PodLister.Get(placements[i].PodName) pod, err = s.PodLister.Get(placements[i].PodName)
if err != nil { if err != nil && !apierrors.IsNotFound(err) {
return fmt.Errorf("failed to get pod %s: %w", placements[i].PodName, err) return fmt.Errorf("failed to get pod %s: %w", placements[i].PodName, err)
} }
err = a.evictor(pod, vpod, &placements[i]) err = a.evictor(pod, vpod, &placements[i])
if err != nil { if err != nil {
return fmt.Errorf("failed to evict pod %s: %w", pod.Name, err) return fmt.Errorf("failed to evict pod %s: %w", pod.Name, err)
}
} }
} }
} }
} }
return nil return nil
} }
func contains(preds []scheduler.PredicatePolicy, priors []scheduler.PriorityPolicy, name string) bool {
for _, v := range preds {
if v.Name == name {
return true
}
}
for _, v := range priors {
if v.Name == name {
return true
}
}
return false
}
func (a *autoscaler) minNonZeroInt(slice []int32) int32 {
min := a.capacity
for _, v := range slice {
if v < min && v > 0 {
min = v
}
}
return min
}

View File

@ -41,7 +41,6 @@ import (
duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1" duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1"
"knative.dev/eventing/pkg/scheduler" "knative.dev/eventing/pkg/scheduler"
"knative.dev/eventing/pkg/scheduler/state" "knative.dev/eventing/pkg/scheduler/state"
st "knative.dev/eventing/pkg/scheduler/state"
tscheduler "knative.dev/eventing/pkg/scheduler/testing" tscheduler "knative.dev/eventing/pkg/scheduler/testing"
) )
@ -51,15 +50,12 @@ const (
func TestAutoscaler(t *testing.T) { func TestAutoscaler(t *testing.T) {
testCases := []struct { testCases := []struct {
name string name string
replicas int32 replicas int32
vpods []scheduler.VPod vpods []scheduler.VPod
scaleDown bool scaleDown bool
wantReplicas int32 wantReplicas int32
schedulerPolicyType scheduler.SchedulerPolicyType reserved map[types.NamespacedName]map[string]int32
schedulerPolicy *scheduler.SchedulerPolicy
deschedulerPolicy *scheduler.SchedulerPolicy
reserved map[types.NamespacedName]map[string]int32
}{ }{
{ {
name: "no replicas, no placements, no pending", name: "no replicas, no placements, no pending",
@ -67,8 +63,7 @@ func TestAutoscaler(t *testing.T) {
vpods: []scheduler.VPod{ vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 0, nil), tscheduler.NewVPod(testNs, "vpod-1", 0, nil),
}, },
wantReplicas: int32(0), wantReplicas: int32(0),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "no replicas, no placements, with pending", name: "no replicas, no placements, with pending",
@ -76,8 +71,7 @@ func TestAutoscaler(t *testing.T) {
vpods: []scheduler.VPod{ vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 5, nil), tscheduler.NewVPod(testNs, "vpod-1", 5, nil),
}, },
wantReplicas: int32(1), wantReplicas: int32(1),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "no replicas, with placements, no pending", name: "no replicas, with placements, no pending",
@ -87,8 +81,7 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
wantReplicas: int32(2), wantReplicas: int32(2),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "no replicas, with placements, with pending, enough capacity", name: "no replicas, with placements, with pending, enough capacity",
@ -98,8 +91,7 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
wantReplicas: int32(2), wantReplicas: int32(2),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "no replicas, with placements, with pending, not enough capacity", name: "no replicas, with placements, with pending, not enough capacity",
@ -109,8 +101,7 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
wantReplicas: int32(3), wantReplicas: int32(3),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, no placements, no pending, scale down", name: "with replicas, no placements, no pending, scale down",
@ -118,17 +109,15 @@ func TestAutoscaler(t *testing.T) {
vpods: []scheduler.VPod{ vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 0, nil), tscheduler.NewVPod(testNs, "vpod-1", 0, nil),
}, },
scaleDown: true, scaleDown: true,
wantReplicas: int32(0), wantReplicas: int32(0),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, no placements, no pending, scale down (no vpods)", name: "with replicas, no placements, no pending, scale down (no vpods)",
replicas: int32(3), replicas: int32(3),
vpods: []scheduler.VPod{}, vpods: []scheduler.VPod{},
scaleDown: true, scaleDown: true,
wantReplicas: int32(0), wantReplicas: int32(0),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, no placements, with pending, scale down", name: "with replicas, no placements, with pending, scale down",
@ -136,9 +125,8 @@ func TestAutoscaler(t *testing.T) {
vpods: []scheduler.VPod{ vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 5, nil), tscheduler.NewVPod(testNs, "vpod-1", 5, nil),
}, },
scaleDown: true, scaleDown: true,
wantReplicas: int32(1), wantReplicas: int32(1),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, no placements, with pending, scale down disabled", name: "with replicas, no placements, with pending, scale down disabled",
@ -146,9 +134,8 @@ func TestAutoscaler(t *testing.T) {
vpods: []scheduler.VPod{ vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 5, nil), tscheduler.NewVPod(testNs, "vpod-1", 5, nil),
}, },
scaleDown: false, scaleDown: false,
wantReplicas: int32(3), wantReplicas: int32(3),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, no placements, with pending, scale up", name: "with replicas, no placements, with pending, scale up",
@ -156,8 +143,7 @@ func TestAutoscaler(t *testing.T) {
vpods: []scheduler.VPod{ vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 45, nil), tscheduler.NewVPod(testNs, "vpod-1", 45, nil),
}, },
wantReplicas: int32(5), wantReplicas: int32(5),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, no placements, with pending, no change", name: "with replicas, no placements, with pending, no change",
@ -165,8 +151,7 @@ func TestAutoscaler(t *testing.T) {
vpods: []scheduler.VPod{ vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 25, nil), tscheduler.NewVPod(testNs, "vpod-1", 25, nil),
}, },
wantReplicas: int32(3), wantReplicas: int32(3),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, with placements, no pending, no change", name: "with replicas, with placements, no pending, no change",
@ -176,8 +161,7 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
wantReplicas: int32(2), wantReplicas: int32(2),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, with placements, with reserved", name: "with replicas, with placements, with reserved",
@ -187,8 +171,7 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(5)}, {PodName: "statefulset-name-0", VReplicas: int32(5)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
wantReplicas: int32(2), wantReplicas: int32(2),
schedulerPolicyType: scheduler.MAXFILLUP,
reserved: map[types.NamespacedName]map[string]int32{ reserved: map[types.NamespacedName]map[string]int32{
{Namespace: testNs, Name: "vpod-1"}: { {Namespace: testNs, Name: "vpod-1"}: {
"statefulset-name-0": 8, "statefulset-name-0": 8,
@ -203,8 +186,7 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(2)}, {PodName: "statefulset-name-0", VReplicas: int32(2)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
wantReplicas: int32(3), wantReplicas: int32(3),
schedulerPolicyType: scheduler.MAXFILLUP,
reserved: map[types.NamespacedName]map[string]int32{ reserved: map[types.NamespacedName]map[string]int32{
{Namespace: testNs, Name: "vpod-1"}: { {Namespace: testNs, Name: "vpod-1"}: {
"statefulset-name-0": 9, "statefulset-name-0": 9,
@ -219,8 +201,7 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(5)}, {PodName: "statefulset-name-0", VReplicas: int32(5)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
wantReplicas: int32(3), wantReplicas: int32(3),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, with placements, with pending (scale up)", name: "with replicas, with placements, with pending (scale up)",
@ -233,8 +214,7 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(5)}, {PodName: "statefulset-name-0", VReplicas: int32(5)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
wantReplicas: int32(4), wantReplicas: int32(4),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, with placements, with pending (scale up), 1 over capacity", name: "with replicas, with placements, with pending (scale up), 1 over capacity",
@ -247,8 +227,7 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(5)}, {PodName: "statefulset-name-0", VReplicas: int32(5)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
wantReplicas: int32(5), wantReplicas: int32(5),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, with placements, with pending, attempt scale down", name: "with replicas, with placements, with pending, attempt scale down",
@ -258,9 +237,8 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(5)}, {PodName: "statefulset-name-0", VReplicas: int32(5)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
wantReplicas: int32(3), wantReplicas: int32(3),
scaleDown: true, scaleDown: true,
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, with placements, no pending, scale down", name: "with replicas, with placements, no pending, scale down",
@ -270,9 +248,8 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
scaleDown: true, scaleDown: true,
wantReplicas: int32(2), wantReplicas: int32(2),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, with placements, with pending, enough capacity", name: "with replicas, with placements, with pending, enough capacity",
@ -282,8 +259,7 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
wantReplicas: int32(2), wantReplicas: int32(2),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, with placements, with pending, not enough capacity", name: "with replicas, with placements, with pending, not enough capacity",
@ -293,8 +269,7 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
wantReplicas: int32(3), wantReplicas: int32(3),
schedulerPolicyType: scheduler.MAXFILLUP,
}, },
{ {
name: "with replicas, with placements, no pending, round up capacity", name: "with replicas, with placements, no pending, round up capacity",
@ -307,105 +282,7 @@ func TestAutoscaler(t *testing.T) {
{PodName: "statefulset-name-3", VReplicas: int32(1)}, {PodName: "statefulset-name-3", VReplicas: int32(1)},
{PodName: "statefulset-name-4", VReplicas: int32(1)}}), {PodName: "statefulset-name-4", VReplicas: int32(1)}}),
}, },
wantReplicas: int32(5),
schedulerPolicyType: scheduler.MAXFILLUP,
},
{
name: "with replicas, with placements, with pending, enough capacity, with Predicates and Zone Priorities",
replicas: int32(2),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 18, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}),
},
wantReplicas: int32(5), wantReplicas: int32(5),
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"},
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
},
{
name: "with replicas, with placements, with pending, enough capacity, with Predicates and Node Priorities",
replicas: int32(2),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 18, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}),
},
wantReplicas: int32(8),
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "AvailabilityNodePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"},
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
},
{
name: "with replicas, with placements, with pending, enough capacity, with Pod Predicates and Priorities",
replicas: int32(2),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 18, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}),
},
wantReplicas: int32(4),
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
},
{
name: "with replicas, with placements, with pending, enough capacity, with Pod Predicates and Zone Priorities",
replicas: int32(2),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 18, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}),
},
wantReplicas: int32(5),
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"},
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
},
{
name: "with replicas, with placements, with pending, enough capacity, with Pod Predicates and Node Priorities",
replicas: int32(2),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 18, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}),
},
wantReplicas: int32(8),
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "AvailabilityNodePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"},
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
}, },
} }
@ -413,22 +290,9 @@ func TestAutoscaler(t *testing.T) {
t.Run(tc.name, func(t *testing.T) { t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t) ctx, _ := tscheduler.SetupFakeContext(t)
nodelist := make([]runtime.Object, 0, numZones)
podlist := make([]runtime.Object, 0, tc.replicas) podlist := make([]runtime.Object, 0, tc.replicas)
vpodClient := tscheduler.NewVPodClient() vpodClient := tscheduler.NewVPodClient()
for i := int32(0); i < numZones; i++ {
for j := int32(0); j < numNodes/numZones; j++ {
nodeName := "node" + fmt.Sprint((j*((numNodes/numZones)+1))+i)
zoneName := "zone" + fmt.Sprint(i)
node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
nodelist = append(nodelist, node)
}
}
for i := int32(0); i < int32(math.Max(float64(tc.wantReplicas), float64(tc.replicas))); i++ { for i := int32(0); i < int32(math.Max(float64(tc.wantReplicas), float64(tc.replicas))); i++ {
nodeName := "node" + fmt.Sprint(i) nodeName := "node" + fmt.Sprint(i)
podName := sfsName + "-" + fmt.Sprint(i) podName := sfsName + "-" + fmt.Sprint(i)
@ -440,19 +304,14 @@ func TestAutoscaler(t *testing.T) {
} }
var lspp v1.PodNamespaceLister var lspp v1.PodNamespaceLister
var lsnn v1.NodeLister
if len(podlist) != 0 { if len(podlist) != 0 {
lsp := listers.NewListers(podlist) lsp := listers.NewListers(podlist)
lspp = lsp.GetPodLister().Pods(testNs) lspp = lsp.GetPodLister().Pods(testNs)
} }
if len(nodelist) != 0 {
lsn := listers.NewListers(nodelist)
lsnn = lsn.GetNodeLister()
}
scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5}) scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5})
stateAccessor := state.NewStateBuilder(sfsName, vpodClient.List, 10, tc.schedulerPolicyType, tc.schedulerPolicy, tc.deschedulerPolicy, lspp, lsnn, scaleCache) stateAccessor := state.NewStateBuilder(sfsName, vpodClient.List, 10, lspp, scaleCache)
sfsClient := kubeclient.Get(ctx).AppsV1().StatefulSets(testNs) sfsClient := kubeclient.Get(ctx).AppsV1().StatefulSets(testNs)
_, err := sfsClient.Create(ctx, tscheduler.MakeStatefulset(testNs, sfsName, tc.replicas), metav1.CreateOptions{}) _, err := sfsClient.Create(ctx, tscheduler.MakeStatefulset(testNs, sfsName, tc.replicas), metav1.CreateOptions{})
@ -511,9 +370,8 @@ func TestAutoscalerScaleDownToZero(t *testing.T) {
}) })
vpodClient := tscheduler.NewVPodClient() vpodClient := tscheduler.NewVPodClient()
ls := listers.NewListers(nil)
scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5}) scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5})
stateAccessor := state.NewStateBuilder(sfsName, vpodClient.List, 10, scheduler.MAXFILLUP, &scheduler.SchedulerPolicy{}, &scheduler.SchedulerPolicy{}, nil, ls.GetNodeLister(), scaleCache) stateAccessor := state.NewStateBuilder(sfsName, vpodClient.List, 10, nil, scaleCache)
sfsClient := kubeclient.Get(ctx).AppsV1().StatefulSets(testNs) sfsClient := kubeclient.Get(ctx).AppsV1().StatefulSets(testNs)
_, err := sfsClient.Create(ctx, tscheduler.MakeStatefulset(testNs, sfsName, 10), metav1.CreateOptions{}) _, err := sfsClient.Create(ctx, tscheduler.MakeStatefulset(testNs, sfsName, 10), metav1.CreateOptions{})
@ -571,13 +429,10 @@ func TestAutoscalerScaleDownToZero(t *testing.T) {
func TestCompactor(t *testing.T) { func TestCompactor(t *testing.T) {
testCases := []struct { testCases := []struct {
name string name string
replicas int32 replicas int32
vpods []scheduler.VPod vpods []scheduler.VPod
schedulerPolicyType scheduler.SchedulerPolicyType wantEvictions map[types.NamespacedName][]duckv1alpha1.Placement
wantEvictions map[types.NamespacedName][]duckv1alpha1.Placement
schedulerPolicy *scheduler.SchedulerPolicy
deschedulerPolicy *scheduler.SchedulerPolicy
}{ }{
{ {
name: "no replicas, no placements, no pending", name: "no replicas, no placements, no pending",
@ -585,8 +440,7 @@ func TestCompactor(t *testing.T) {
vpods: []scheduler.VPod{ vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 0, nil), tscheduler.NewVPod(testNs, "vpod-1", 0, nil),
}, },
schedulerPolicyType: scheduler.MAXFILLUP, wantEvictions: nil,
wantEvictions: nil,
}, },
{ {
name: "one vpod, with placements in 2 pods, compacted", name: "one vpod, with placements in 2 pods, compacted",
@ -596,8 +450,7 @@ func TestCompactor(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, },
schedulerPolicyType: scheduler.MAXFILLUP, wantEvictions: nil,
wantEvictions: nil,
}, },
{ {
name: "one vpod, with placements in 2 pods, compacted edge", name: "one vpod, with placements in 2 pods, compacted edge",
@ -607,8 +460,7 @@ func TestCompactor(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(3)}}), {PodName: "statefulset-name-1", VReplicas: int32(3)}}),
}, },
schedulerPolicyType: scheduler.MAXFILLUP, wantEvictions: nil,
wantEvictions: nil,
}, },
{ {
name: "one vpod, with placements in 2 pods, not compacted", name: "one vpod, with placements in 2 pods, not compacted",
@ -618,10 +470,7 @@ func TestCompactor(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(2)}}), {PodName: "statefulset-name-1", VReplicas: int32(2)}}),
}, },
schedulerPolicyType: scheduler.MAXFILLUP, wantEvictions: nil,
wantEvictions: map[types.NamespacedName][]duckv1alpha1.Placement{
{Name: "vpod-1", Namespace: testNs}: {{PodName: "statefulset-name-1", VReplicas: int32(2)}},
},
}, },
{ {
name: "multiple vpods, with placements in multiple pods, compacted", name: "multiple vpods, with placements in multiple pods, compacted",
@ -635,8 +484,7 @@ func TestCompactor(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(2)}, {PodName: "statefulset-name-0", VReplicas: int32(2)},
{PodName: "statefulset-name-2", VReplicas: int32(7)}}), {PodName: "statefulset-name-2", VReplicas: int32(7)}}),
}, },
schedulerPolicyType: scheduler.MAXFILLUP, wantEvictions: nil,
wantEvictions: nil,
}, },
{ {
name: "multiple vpods, with placements in multiple pods, not compacted", name: "multiple vpods, with placements in multiple pods, not compacted",
@ -650,266 +498,49 @@ func TestCompactor(t *testing.T) {
{PodName: "statefulset-name-0", VReplicas: int32(2)}, {PodName: "statefulset-name-0", VReplicas: int32(2)},
{PodName: "statefulset-name-2", VReplicas: int32(7)}}), {PodName: "statefulset-name-2", VReplicas: int32(7)}}),
}, },
schedulerPolicyType: scheduler.MAXFILLUP,
wantEvictions: map[types.NamespacedName][]duckv1alpha1.Placement{
{Name: "vpod-2", Namespace: testNs}: {{PodName: "statefulset-name-2", VReplicas: int32(7)}},
},
},
{
name: "no replicas, no placements, no pending, with Predicates and Priorities",
replicas: int32(0),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 0, nil),
},
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
wantEvictions: nil, wantEvictions: nil,
}, },
{ {
name: "one vpod, with placements in 2 pods, compacted, with Predicates and Priorities", name: "multiple vpods, scale down, with placements in multiple pods, compacted",
replicas: int32(2),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 15, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}),
},
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
wantEvictions: nil,
},
{
name: "one vpod, with placements in 2 pods, compacted edge, with Predicates and Priorities",
replicas: int32(2),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 11, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(3)}}),
},
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
wantEvictions: nil,
},
{
name: "one vpod, with placements in 2 pods, not compacted, with Predicates and Priorities",
replicas: int32(2), replicas: int32(2),
vpods: []scheduler.VPod{ vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 10, []duckv1alpha1.Placement{ tscheduler.NewVPod(testNs, "vpod-1", 10, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(8)}, {PodName: "statefulset-name-0", VReplicas: int32(3)},
{PodName: "statefulset-name-1", VReplicas: int32(2)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)}}),
}, tscheduler.NewVPod(testNs, "vpod-2", 10, []duckv1alpha1.Placement{
schedulerPolicy: &scheduler.SchedulerPolicy{ {PodName: "statefulset-name-0", VReplicas: int32(7)},
Predicates: []scheduler.PredicatePolicy{ {PodName: "statefulset-name-2", VReplicas: int32(3)}}),
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "LowestOrdinalPriority", Weight: 5},
},
}, },
wantEvictions: map[types.NamespacedName][]duckv1alpha1.Placement{ wantEvictions: map[types.NamespacedName][]duckv1alpha1.Placement{
{Name: "vpod-1", Namespace: testNs}: {{PodName: "statefulset-name-1", VReplicas: int32(2)}}, {Name: "vpod-2", Namespace: testNs}: {{PodName: "statefulset-name-2", VReplicas: int32(3)}},
}, },
}, },
{ {
name: "multiple vpods, with placements in multiple pods, compacted, with Predicates and Priorities", name: "multiple vpods, scale down multiple, with placements in multiple pods, compacted",
replicas: int32(3), replicas: int32(1),
// pod-0:6, pod-1:8, pod-2:7
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 12, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(4)},
{PodName: "statefulset-name-1", VReplicas: int32(8)}}),
tscheduler.NewVPod(testNs, "vpod-2", 9, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(2)},
{PodName: "statefulset-name-2", VReplicas: int32(7)}}),
},
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
wantEvictions: nil,
},
{
name: "multiple vpods, with placements in multiple pods, not compacted, with Predicates and Priorities",
replicas: int32(3),
// pod-0:6, pod-1:7, pod-2:7
vpods: []scheduler.VPod{ vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 6, []duckv1alpha1.Placement{ tscheduler.NewVPod(testNs, "vpod-1", 6, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(4)}, {PodName: "statefulset-name-0", VReplicas: int32(3)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}), {PodName: "statefulset-name-1", VReplicas: int32(7)},
tscheduler.NewVPod(testNs, "vpod-2", 15, []duckv1alpha1.Placement{ {PodName: "statefulset-name-2", VReplicas: int32(6)},
{PodName: "statefulset-name-0", VReplicas: int32(2)}, }),
{PodName: "statefulset-name-2", VReplicas: int32(7)}}), tscheduler.NewVPod(testNs, "vpod-2", 3, []duckv1alpha1.Placement{
}, {PodName: "statefulset-name-0", VReplicas: int32(7)},
schedulerPolicy: &scheduler.SchedulerPolicy{ {PodName: "statefulset-name-2", VReplicas: int32(3)},
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
wantEvictions: map[types.NamespacedName][]duckv1alpha1.Placement{
{Name: "vpod-2", Namespace: testNs}: {{PodName: "statefulset-name-2", VReplicas: int32(7)}},
},
},
{
name: "no replicas, no placements, no pending, with Predicates and HA Priorities",
replicas: int32(0),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 0, nil),
},
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"},
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
wantEvictions: nil,
},
{
name: "one vpod, with placements in 2 pods, compacted, with Predicates and HA Priorities",
replicas: int32(2),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 15, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(7)}}),
},
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"},
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
wantEvictions: nil,
},
{
name: "one vpod, with placements in 2 pods, compacted edge, with Predicates and HA Priorities",
replicas: int32(2),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 11, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(3)}}),
},
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"},
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
wantEvictions: nil,
},
{
name: "one vpod, with placements in 3 pods, compacted, with Predicates and HA Priorities",
replicas: int32(3),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 14, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(8)},
{PodName: "statefulset-name-1", VReplicas: int32(2)},
{PodName: "statefulset-name-2", VReplicas: int32(4)}}),
},
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"},
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
wantEvictions: nil,
},
{
name: "multiple vpods, with placements in multiple pods, compacted, with Predicates and HA Priorities",
replicas: int32(3),
// pod-0:6, pod-1:8, pod-2:7
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 12, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(4)},
{PodName: "statefulset-name-1", VReplicas: int32(8)}}),
tscheduler.NewVPod(testNs, "vpod-2", 9, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(2)},
{PodName: "statefulset-name-2", VReplicas: int32(7)}}),
},
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "AvailabilityNodePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"},
{Name: "LowestOrdinalPriority", Weight: 5},
},
},
wantEvictions: nil,
},
{
name: "multiple vpods, with placements in multiple pods, not compacted, with Predicates and HA Priorities",
replicas: int32(6),
vpods: []scheduler.VPod{
tscheduler.NewVPod(testNs, "vpod-1", 16, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(4)},
{PodName: "statefulset-name-1", VReplicas: int32(2)},
{PodName: "statefulset-name-2", VReplicas: int32(2)},
{PodName: "statefulset-name-3", VReplicas: int32(2)}, {PodName: "statefulset-name-3", VReplicas: int32(2)},
{PodName: "statefulset-name-4", VReplicas: int32(3)}, {PodName: "statefulset-name-10", VReplicas: int32(1)},
{PodName: "statefulset-name-5", VReplicas: int32(3)}}), }),
tscheduler.NewVPod(testNs, "vpod-2", 11, []duckv1alpha1.Placement{
{PodName: "statefulset-name-0", VReplicas: int32(2)},
{PodName: "statefulset-name-1", VReplicas: int32(4)},
{PodName: "statefulset-name-2", VReplicas: int32(5)}}),
},
schedulerPolicy: &scheduler.SchedulerPolicy{
Predicates: []scheduler.PredicatePolicy{
{Name: "PodFitsResources"},
{Name: "EvenPodSpread", Args: "{\"MaxSkew\": 1}"},
},
Priorities: []scheduler.PriorityPolicy{
{Name: "AvailabilityZonePriority", Weight: 10, Args: "{\"MaxSkew\": 1}"},
{Name: "LowestOrdinalPriority", Weight: 5},
},
}, },
wantEvictions: map[types.NamespacedName][]duckv1alpha1.Placement{ wantEvictions: map[types.NamespacedName][]duckv1alpha1.Placement{
{Name: "vpod-1", Namespace: testNs}: {{PodName: "statefulset-name-5", VReplicas: int32(3)}, {PodName: "statefulset-name-4", VReplicas: int32(3)}, {PodName: "statefulset-name-3", VReplicas: int32(2)}}, {Name: "vpod-1", Namespace: testNs}: {
{PodName: "statefulset-name-2", VReplicas: int32(6)},
{PodName: "statefulset-name-1", VReplicas: int32(7)},
},
{Name: "vpod-2", Namespace: testNs}: {
{PodName: "statefulset-name-10", VReplicas: int32(1)},
{PodName: "statefulset-name-3", VReplicas: int32(2)},
{PodName: "statefulset-name-2", VReplicas: int32(3)},
},
}, },
}, },
} }
@ -918,21 +549,9 @@ func TestCompactor(t *testing.T) {
t.Run(tc.name, func(t *testing.T) { t.Run(tc.name, func(t *testing.T) {
ctx, _ := tscheduler.SetupFakeContext(t) ctx, _ := tscheduler.SetupFakeContext(t)
nodelist := make([]runtime.Object, 0, numZones)
podlist := make([]runtime.Object, 0, tc.replicas) podlist := make([]runtime.Object, 0, tc.replicas)
vpodClient := tscheduler.NewVPodClient() vpodClient := tscheduler.NewVPodClient()
for i := int32(0); i < numZones; i++ {
for j := int32(0); j < numNodes/numZones; j++ {
nodeName := "node" + fmt.Sprint((j*((numNodes/numZones)+1))+i)
zoneName := "zone" + fmt.Sprint(i)
node, err := kubeclient.Get(ctx).CoreV1().Nodes().Create(ctx, tscheduler.MakeNode(nodeName, zoneName), metav1.CreateOptions{})
if err != nil {
t.Fatal("unexpected error", err)
}
nodelist = append(nodelist, node)
}
}
for i := int32(0); i < tc.replicas; i++ { for i := int32(0); i < tc.replicas; i++ {
nodeName := "node" + fmt.Sprint(i) nodeName := "node" + fmt.Sprint(i)
podName := sfsName + "-" + fmt.Sprint(i) podName := sfsName + "-" + fmt.Sprint(i)
@ -949,9 +568,8 @@ func TestCompactor(t *testing.T) {
} }
lsp := listers.NewListers(podlist) lsp := listers.NewListers(podlist)
lsn := listers.NewListers(nodelist)
scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5}) scaleCache := scheduler.NewScaleCache(ctx, testNs, kubeclient.Get(ctx).AppsV1().StatefulSets(testNs), scheduler.ScaleCacheConfig{RefreshPeriod: time.Minute * 5})
stateAccessor := state.NewStateBuilder(sfsName, vpodClient.List, 10, tc.schedulerPolicyType, tc.schedulerPolicy, tc.deschedulerPolicy, lsp.GetPodLister().Pods(testNs), lsn.GetNodeLister(), scaleCache) stateAccessor := state.NewStateBuilder(sfsName, vpodClient.List, 10, lsp.GetPodLister().Pods(testNs), scaleCache)
evictions := make(map[types.NamespacedName][]duckv1alpha1.Placement) evictions := make(map[types.NamespacedName][]duckv1alpha1.Placement)
recordEviction := func(pod *corev1.Pod, vpod scheduler.VPod, from *duckv1alpha1.Placement) error { recordEviction := func(pod *corev1.Pod, vpod scheduler.VPod, from *duckv1alpha1.Placement) error {
@ -975,21 +593,12 @@ func TestCompactor(t *testing.T) {
vpodClient.Append(vpod) vpodClient.Append(vpod)
} }
state, err := stateAccessor.State(ctx, nil) state, err := stateAccessor.State(ctx)
if err != nil { if err != nil {
t.Fatalf("unexpected error: %v", err) t.Fatalf("unexpected error: %v", err)
} }
var scaleUpFactor int32 if err := autoscaler.mayCompact(logging.FromContext(ctx), state); err != nil {
if tc.schedulerPolicy != nil && contains(nil, tc.schedulerPolicy.Priorities, st.AvailabilityZonePriority) { //HA scaling across zones
scaleUpFactor = state.NumZones
} else if tc.schedulerPolicy != nil && contains(nil, tc.schedulerPolicy.Priorities, st.AvailabilityNodePriority) { //HA scalingacross nodes
scaleUpFactor = state.NumNodes
} else {
scaleUpFactor = 1 // Non-HA scaling
}
if err := autoscaler.mayCompact(logging.FromContext(ctx), state, scaleUpFactor); err != nil {
t.Fatal(err) t.Fatal(err)
} }

View File

@ -18,21 +18,20 @@ package statefulset
import ( import (
"context" "context"
"crypto/rand"
"fmt" "fmt"
"math/big"
"sort" "sort"
"sync" "sync"
"sync/atomic"
"time" "time"
"go.uber.org/zap" "go.uber.org/zap"
appsv1 "k8s.io/api/apps/v1" appsv1 "k8s.io/api/apps/v1"
"k8s.io/apimachinery/pkg/types" "k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/sets"
"k8s.io/client-go/informers" "k8s.io/client-go/informers"
clientappsv1 "k8s.io/client-go/kubernetes/typed/apps/v1" clientappsv1 "k8s.io/client-go/kubernetes/typed/apps/v1"
corev1listers "k8s.io/client-go/listers/core/v1" corev1listers "k8s.io/client-go/listers/core/v1"
"k8s.io/client-go/tools/cache" "k8s.io/client-go/tools/cache"
"k8s.io/utils/integer"
"knative.dev/pkg/logging" "knative.dev/pkg/logging"
"knative.dev/pkg/reconciler" "knative.dev/pkg/reconciler"
@ -41,19 +40,7 @@ import (
duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1" duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1"
"knative.dev/eventing/pkg/scheduler" "knative.dev/eventing/pkg/scheduler"
"knative.dev/eventing/pkg/scheduler/factory"
st "knative.dev/eventing/pkg/scheduler/state" st "knative.dev/eventing/pkg/scheduler/state"
_ "knative.dev/eventing/pkg/scheduler/plugins/core/availabilitynodepriority"
_ "knative.dev/eventing/pkg/scheduler/plugins/core/availabilityzonepriority"
_ "knative.dev/eventing/pkg/scheduler/plugins/core/evenpodspread"
_ "knative.dev/eventing/pkg/scheduler/plugins/core/lowestordinalpriority"
_ "knative.dev/eventing/pkg/scheduler/plugins/core/podfitsresources"
_ "knative.dev/eventing/pkg/scheduler/plugins/core/removewithavailabilitynodepriority"
_ "knative.dev/eventing/pkg/scheduler/plugins/core/removewithavailabilityzonepriority"
_ "knative.dev/eventing/pkg/scheduler/plugins/core/removewithevenpodspreadpriority"
_ "knative.dev/eventing/pkg/scheduler/plugins/core/removewithhighestordinalpriority"
_ "knative.dev/eventing/pkg/scheduler/plugins/kafka/nomaxresourcecount"
) )
type GetReserved func() map[types.NamespacedName]map[string]int32 type GetReserved func() map[types.NamespacedName]map[string]int32
@ -65,19 +52,16 @@ type Config struct {
ScaleCacheConfig scheduler.ScaleCacheConfig `json:"scaleCacheConfig"` ScaleCacheConfig scheduler.ScaleCacheConfig `json:"scaleCacheConfig"`
// PodCapacity max capacity for each StatefulSet's pod. // PodCapacity max capacity for each StatefulSet's pod.
PodCapacity int32 `json:"podCapacity"` PodCapacity int32 `json:"podCapacity"`
// MinReplicas is the minimum replicas of the statefulset.
MinReplicas int32 `json:"minReplicas"`
// Autoscaler refresh period // Autoscaler refresh period
RefreshPeriod time.Duration `json:"refreshPeriod"` RefreshPeriod time.Duration `json:"refreshPeriod"`
// Autoscaler retry period // Autoscaler retry period
RetryPeriod time.Duration `json:"retryPeriod"` RetryPeriod time.Duration `json:"retryPeriod"`
SchedulerPolicy scheduler.SchedulerPolicyType `json:"schedulerPolicy"`
SchedPolicy *scheduler.SchedulerPolicy `json:"schedPolicy"`
DeschedPolicy *scheduler.SchedulerPolicy `json:"deschedPolicy"`
Evictor scheduler.Evictor `json:"-"` Evictor scheduler.Evictor `json:"-"`
VPodLister scheduler.VPodLister `json:"-"` VPodLister scheduler.VPodLister `json:"-"`
NodeLister corev1listers.NodeLister `json:"-"`
// Pod lister for statefulset: StatefulSetNamespace / StatefulSetName // Pod lister for statefulset: StatefulSetNamespace / StatefulSetName
PodLister corev1listers.PodNamespaceLister `json:"-"` PodLister corev1listers.PodNamespaceLister `json:"-"`
@ -93,7 +77,7 @@ func New(ctx context.Context, cfg *Config) (scheduler.Scheduler, error) {
scaleCache := scheduler.NewScaleCache(ctx, cfg.StatefulSetNamespace, kubeclient.Get(ctx).AppsV1().StatefulSets(cfg.StatefulSetNamespace), cfg.ScaleCacheConfig) scaleCache := scheduler.NewScaleCache(ctx, cfg.StatefulSetNamespace, kubeclient.Get(ctx).AppsV1().StatefulSets(cfg.StatefulSetNamespace), cfg.ScaleCacheConfig)
stateAccessor := st.NewStateBuilder(cfg.StatefulSetName, cfg.VPodLister, cfg.PodCapacity, cfg.SchedulerPolicy, cfg.SchedPolicy, cfg.DeschedPolicy, cfg.PodLister, cfg.NodeLister, scaleCache) stateAccessor := st.NewStateBuilder(cfg.StatefulSetName, cfg.VPodLister, cfg.PodCapacity, cfg.PodLister, scaleCache)
var getReserved GetReserved var getReserved GetReserved
cfg.getReserved = func() map[types.NamespacedName]map[string]int32 { cfg.getReserved = func() map[types.NamespacedName]map[string]int32 {
@ -118,14 +102,6 @@ func New(ctx context.Context, cfg *Config) (scheduler.Scheduler, error) {
type Pending map[types.NamespacedName]int32 type Pending map[types.NamespacedName]int32
func (p Pending) Total() int32 {
t := int32(0)
for _, vr := range p {
t += vr
}
return t
}
// StatefulSetScheduler is a scheduler placing VPod into statefulset-managed set of pods // StatefulSetScheduler is a scheduler placing VPod into statefulset-managed set of pods
type StatefulSetScheduler struct { type StatefulSetScheduler struct {
statefulSetName string statefulSetName string
@ -139,6 +115,11 @@ type StatefulSetScheduler struct {
// replicas is the (cached) number of statefulset replicas. // replicas is the (cached) number of statefulset replicas.
replicas int32 replicas int32
// isLeader signals whether a given Scheduler instance is leader or not.
// The autoscaler is considered the leader when ephemeralLeaderElectionObject is in a
// bucket where we've been promoted.
isLeader atomic.Bool
// reserved tracks vreplicas that have been placed (ie. scheduled) but haven't been // reserved tracks vreplicas that have been placed (ie. scheduled) but haven't been
// committed yet (ie. not appearing in vpodLister) // committed yet (ie. not appearing in vpodLister)
reserved map[types.NamespacedName]map[string]int32 reserved map[types.NamespacedName]map[string]int32
@ -152,14 +133,79 @@ var (
// Promote implements reconciler.LeaderAware. // Promote implements reconciler.LeaderAware.
func (s *StatefulSetScheduler) Promote(b reconciler.Bucket, enq func(reconciler.Bucket, types.NamespacedName)) error { func (s *StatefulSetScheduler) Promote(b reconciler.Bucket, enq func(reconciler.Bucket, types.NamespacedName)) error {
if !b.Has(ephemeralLeaderElectionObject) {
return nil
}
// The demoted bucket has the ephemeralLeaderElectionObject, so we are not leader anymore.
// Flip the flag after running initReserved.
defer s.isLeader.Store(true)
if v, ok := s.autoscaler.(reconciler.LeaderAware); ok { if v, ok := s.autoscaler.(reconciler.LeaderAware); ok {
return v.Promote(b, enq) return v.Promote(b, enq)
} }
if err := s.initReserved(); err != nil {
return err
}
return nil
}
func (s *StatefulSetScheduler) initReserved() error {
s.reservedMu.Lock()
defer s.reservedMu.Unlock()
vPods, err := s.vpodLister()
if err != nil {
return fmt.Errorf("failed to list vPods during init: %w", err)
}
s.reserved = make(map[types.NamespacedName]map[string]int32, len(vPods))
for _, vPod := range vPods {
if !vPod.GetDeletionTimestamp().IsZero() {
continue
}
s.reserved[vPod.GetKey()] = make(map[string]int32, len(vPod.GetPlacements()))
for _, placement := range vPod.GetPlacements() {
s.reserved[vPod.GetKey()][placement.PodName] += placement.VReplicas
}
}
return nil
}
// resyncReserved removes deleted vPods from reserved to keep the state consistent when leadership
// changes (Promote / Demote).
// initReserved is not enough since the vPod lister can be stale.
func (s *StatefulSetScheduler) resyncReserved() error {
if !s.isLeader.Load() {
return nil
}
vPods, err := s.vpodLister()
if err != nil {
return fmt.Errorf("failed to list vPods during reserved resync: %w", err)
}
vPodsByK := vPodsByKey(vPods)
s.reservedMu.Lock()
defer s.reservedMu.Unlock()
for key := range s.reserved {
vPod, ok := vPodsByK[key]
if !ok || vPod == nil {
delete(s.reserved, key)
}
}
return nil return nil
} }
// Demote implements reconciler.LeaderAware. // Demote implements reconciler.LeaderAware.
func (s *StatefulSetScheduler) Demote(b reconciler.Bucket) { func (s *StatefulSetScheduler) Demote(b reconciler.Bucket) {
if !b.Has(ephemeralLeaderElectionObject) {
return
}
// The demoted bucket has the ephemeralLeaderElectionObject, so we are not leader anymore.
defer s.isLeader.Store(false)
if v, ok := s.autoscaler.(reconciler.LeaderAware); ok { if v, ok := s.autoscaler.(reconciler.LeaderAware); ok {
v.Demote(b) v.Demote(b)
} }
@ -170,7 +216,7 @@ func newStatefulSetScheduler(ctx context.Context,
stateAccessor st.StateAccessor, stateAccessor st.StateAccessor,
autoscaler Autoscaler) *StatefulSetScheduler { autoscaler Autoscaler) *StatefulSetScheduler {
scheduler := &StatefulSetScheduler{ s := &StatefulSetScheduler{
statefulSetNamespace: cfg.StatefulSetNamespace, statefulSetNamespace: cfg.StatefulSetNamespace,
statefulSetName: cfg.StatefulSetName, statefulSetName: cfg.StatefulSetName,
statefulSetClient: kubeclient.Get(ctx).AppsV1().StatefulSets(cfg.StatefulSetNamespace), statefulSetClient: kubeclient.Get(ctx).AppsV1().StatefulSets(cfg.StatefulSetNamespace),
@ -188,13 +234,16 @@ func newStatefulSetScheduler(ctx context.Context,
informers.WithNamespace(cfg.StatefulSetNamespace), informers.WithNamespace(cfg.StatefulSetNamespace),
) )
sif.Apps().V1().StatefulSets().Informer(). _, err := sif.Apps().V1().StatefulSets().Informer().
AddEventHandler(cache.FilteringResourceEventHandler{ AddEventHandler(cache.FilteringResourceEventHandler{
FilterFunc: controller.FilterWithNameAndNamespace(cfg.StatefulSetNamespace, cfg.StatefulSetName), FilterFunc: controller.FilterWithNameAndNamespace(cfg.StatefulSetNamespace, cfg.StatefulSetName),
Handler: controller.HandleAll(func(i interface{}) { Handler: controller.HandleAll(func(i interface{}) {
scheduler.updateStatefulset(ctx, i) s.updateStatefulset(ctx, i)
}), }),
}) })
if err != nil {
logging.FromContext(ctx).Fatalw("Failed to register informer", zap.Error(err))
}
sif.Start(ctx.Done()) sif.Start(ctx.Done())
_ = sif.WaitForCacheSync(ctx.Done()) _ = sif.WaitForCacheSync(ctx.Done())
@ -204,7 +253,18 @@ func newStatefulSetScheduler(ctx context.Context,
sif.Shutdown() sif.Shutdown()
}() }()
return scheduler go func() {
for {
select {
case <-ctx.Done():
return
case <-time.After(cfg.RefreshPeriod * 3):
_ = s.resyncReserved()
}
}
}()
return s
} }
func (s *StatefulSetScheduler) Schedule(ctx context.Context, vpod scheduler.VPod) ([]duckv1alpha1.Placement, error) { func (s *StatefulSetScheduler) Schedule(ctx context.Context, vpod scheduler.VPod) ([]duckv1alpha1.Placement, error) {
@ -214,9 +274,6 @@ func (s *StatefulSetScheduler) Schedule(ctx context.Context, vpod scheduler.VPod
defer s.reservedMu.Unlock() defer s.reservedMu.Unlock()
placements, err := s.scheduleVPod(ctx, vpod) placements, err := s.scheduleVPod(ctx, vpod)
if placements == nil {
return placements, err
}
sort.SliceStable(placements, func(i int, j int) bool { sort.SliceStable(placements, func(i int, j int) bool {
return st.OrdinalFromPodName(placements[i].PodName) < st.OrdinalFromPodName(placements[j].PodName) return st.OrdinalFromPodName(placements[i].PodName) < st.OrdinalFromPodName(placements[j].PodName)
@ -234,30 +291,42 @@ func (s *StatefulSetScheduler) scheduleVPod(ctx context.Context, vpod scheduler.
// Get the current placements state // Get the current placements state
// Quite an expensive operation but safe and simple. // Quite an expensive operation but safe and simple.
state, err := s.stateAccessor.State(ctx, s.reserved) state, err := s.stateAccessor.State(ctx)
if err != nil { if err != nil {
logger.Debug("error while refreshing scheduler state (will retry)", zap.Error(err)) logger.Debug("error while refreshing scheduler state (will retry)", zap.Error(err))
return nil, err return nil, err
} }
// Clean up reserved from removed resources that don't appear in the vpod list anymore and have reservedByPodName := make(map[string]int32, 2)
// no pending resources. for _, v := range s.reserved {
reserved := make(map[types.NamespacedName]map[string]int32) for podName, vReplicas := range v {
for k, v := range s.reserved { v, _ := reservedByPodName[podName]
if pendings, ok := state.Pending[k]; ok { reservedByPodName[podName] = vReplicas + v
if pendings == 0 {
reserved[k] = map[string]int32{}
} else {
reserved[k] = v
}
} }
} }
s.reserved = reserved
logger.Debugw("scheduling", zap.Any("state", state)) // Use reserved placements as starting point, if we have them.
existingPlacements := make([]duckv1alpha1.Placement, 0)
if placements, ok := s.reserved[vpod.GetKey()]; ok {
existingPlacements = make([]duckv1alpha1.Placement, 0, len(placements))
for podName, n := range placements {
existingPlacements = append(existingPlacements, duckv1alpha1.Placement{
PodName: podName,
VReplicas: n,
})
}
}
existingPlacements := vpod.GetPlacements() sort.SliceStable(existingPlacements, func(i int, j int) bool {
var left int32 return st.OrdinalFromPodName(existingPlacements[i].PodName) < st.OrdinalFromPodName(existingPlacements[j].PodName)
})
logger.Debugw("scheduling state",
zap.Any("state", state),
zap.Any("reservedByPodName", reservedByPodName),
zap.Any("reserved", st.ToJSONable(s.reserved)),
zap.Any("vpod", vpod),
)
// Remove unschedulable or adjust overcommitted pods from placements // Remove unschedulable or adjust overcommitted pods from placements
var placements []duckv1alpha1.Placement var placements []duckv1alpha1.Placement
@ -272,23 +341,26 @@ func (s *StatefulSetScheduler) scheduleVPod(ctx context.Context, vpod scheduler.
} }
// Handle overcommitted pods. // Handle overcommitted pods.
if state.Free(ordinal) < 0 { reserved, _ := reservedByPodName[p.PodName]
if state.Capacity-reserved < 0 {
// vr > free => vr: 9, overcommit 4 -> free: 0, vr: 5, pending: +4 // vr > free => vr: 9, overcommit 4 -> free: 0, vr: 5, pending: +4
// vr = free => vr: 4, overcommit 4 -> free: 0, vr: 0, pending: +4 // vr = free => vr: 4, overcommit 4 -> free: 0, vr: 0, pending: +4
// vr < free => vr: 3, overcommit 4 -> free: -1, vr: 0, pending: +3 // vr < free => vr: 3, overcommit 4 -> free: -1, vr: 0, pending: +3
overcommit := -state.FreeCap[ordinal] overcommit := -(state.Capacity - reserved)
logger.Debugw("overcommit", zap.Any("overcommit", overcommit), zap.Any("placement", p)) logger.Debugw("overcommit", zap.Any("overcommit", overcommit), zap.Any("placement", p))
if p.VReplicas >= overcommit { if p.VReplicas >= overcommit {
state.SetFree(ordinal, 0) state.SetFree(ordinal, 0)
state.Pending[vpod.GetKey()] += overcommit state.Pending[vpod.GetKey()] += overcommit
reservedByPodName[p.PodName] -= overcommit
p.VReplicas = p.VReplicas - overcommit p.VReplicas = p.VReplicas - overcommit
} else { } else {
state.SetFree(ordinal, p.VReplicas-overcommit) state.SetFree(ordinal, p.VReplicas-overcommit)
state.Pending[vpod.GetKey()] += p.VReplicas state.Pending[vpod.GetKey()] += p.VReplicas
reservedByPodName[p.PodName] -= p.VReplicas
p.VReplicas = 0 p.VReplicas = 0
} }
@ -314,52 +386,26 @@ func (s *StatefulSetScheduler) scheduleVPod(ctx context.Context, vpod scheduler.
return placements, nil return placements, nil
} }
if state.SchedulerPolicy != "" { // Need less => scale down
// Need less => scale down if tr > vpod.GetVReplicas() {
if tr > vpod.GetVReplicas() { logger.Debugw("scaling down", zap.Int32("vreplicas", tr), zap.Int32("new vreplicas", vpod.GetVReplicas()),
logger.Debugw("scaling down", zap.Int32("vreplicas", tr), zap.Int32("new vreplicas", vpod.GetVReplicas()),
zap.Any("placements", placements),
zap.Any("existingPlacements", existingPlacements))
placements = s.removeReplicas(tr-vpod.GetVReplicas(), placements)
// Do not trigger the autoscaler to avoid unnecessary churn
return placements, nil
}
// Need more => scale up
logger.Debugw("scaling up", zap.Int32("vreplicas", tr), zap.Int32("new vreplicas", vpod.GetVReplicas()),
zap.Any("placements", placements), zap.Any("placements", placements),
zap.Any("existingPlacements", existingPlacements)) zap.Any("existingPlacements", existingPlacements))
placements, left = s.addReplicas(state, vpod.GetVReplicas()-tr, placements) placements = s.removeReplicas(tr-vpod.GetVReplicas(), placements)
} else { //Predicates and priorities must be used for scheduling // Do not trigger the autoscaler to avoid unnecessary churn
// Need less => scale down
if tr > vpod.GetVReplicas() && state.DeschedPolicy != nil {
logger.Infow("scaling down", zap.Int32("vreplicas", tr), zap.Int32("new vreplicas", vpod.GetVReplicas()),
zap.Any("placements", placements),
zap.Any("existingPlacements", existingPlacements))
placements = s.removeReplicasWithPolicy(ctx, vpod, tr-vpod.GetVReplicas(), placements)
// Do not trigger the autoscaler to avoid unnecessary churn return placements, nil
return placements, nil
}
if state.SchedPolicy != nil {
// Need more => scale up
// rebalancing needed for all vreps most likely since there are pending vreps from previous reconciliation
// can fall here when vreps scaled up or after eviction
logger.Infow("scaling up with a rebalance (if needed)", zap.Int32("vreplicas", tr), zap.Int32("new vreplicas", vpod.GetVReplicas()),
zap.Any("placements", placements),
zap.Any("existingPlacements", existingPlacements))
placements, left = s.rebalanceReplicasWithPolicy(ctx, vpod, vpod.GetVReplicas(), placements)
}
} }
// Need more => scale up
logger.Debugw("scaling up", zap.Int32("vreplicas", tr), zap.Int32("new vreplicas", vpod.GetVReplicas()),
zap.Any("placements", placements),
zap.Any("existingPlacements", existingPlacements))
placements, left := s.addReplicas(state, reservedByPodName, vpod, vpod.GetVReplicas()-tr, placements)
if left > 0 { if left > 0 {
// Give time for the autoscaler to do its job // Give time for the autoscaler to do its job
logger.Infow("not enough pod replicas to schedule") logger.Infow("not enough pod replicas to schedule")
@ -370,12 +416,6 @@ func (s *StatefulSetScheduler) scheduleVPod(ctx context.Context, vpod scheduler.
s.autoscaler.Autoscale(ctx) s.autoscaler.Autoscale(ctx)
} }
if state.SchedulerPolicy == "" && state.SchedPolicy != nil {
logger.Info("reverting to previous placements")
s.reservePlacements(vpod, existingPlacements) // rebalancing doesn't care about new placements since all vreps will be re-placed
return existingPlacements, s.notEnoughPodReplicas(left) // requeue to wait for the autoscaler to do its job
}
return placements, s.notEnoughPodReplicas(left) return placements, s.notEnoughPodReplicas(left)
} }
@ -384,344 +424,6 @@ func (s *StatefulSetScheduler) scheduleVPod(ctx context.Context, vpod scheduler.
return placements, nil return placements, nil
} }
func toJSONable(pending map[types.NamespacedName]int32) map[string]int32 {
r := make(map[string]int32, len(pending))
for k, v := range pending {
r[k.String()] = v
}
return r
}
func (s *StatefulSetScheduler) rebalanceReplicasWithPolicy(ctx context.Context, vpod scheduler.VPod, diff int32, placements []duckv1alpha1.Placement) ([]duckv1alpha1.Placement, int32) {
s.makeZeroPlacements(vpod, placements)
placements, diff = s.addReplicasWithPolicy(ctx, vpod, diff, make([]duckv1alpha1.Placement, 0)) //start fresh with a new placements list
return placements, diff
}
func (s *StatefulSetScheduler) removeReplicasWithPolicy(ctx context.Context, vpod scheduler.VPod, diff int32, placements []duckv1alpha1.Placement) []duckv1alpha1.Placement {
logger := logging.FromContext(ctx).Named("remove replicas with policy")
numVreps := diff
for i := int32(0); i < numVreps; i++ { //deschedule one vreplica at a time
state, err := s.stateAccessor.State(ctx, s.reserved)
if err != nil {
logger.Info("error while refreshing scheduler state (will retry)", zap.Error(err))
return placements
}
feasiblePods := s.findFeasiblePods(ctx, state, vpod, state.DeschedPolicy)
feasiblePods = s.removePodsNotInPlacement(vpod, feasiblePods)
if len(feasiblePods) == 1 { //nothing to score, remove vrep from that pod
placementPodID := feasiblePods[0]
logger.Infof("Selected pod #%v to remove vreplica #%v from", placementPodID, i)
placements = s.removeSelectionFromPlacements(placementPodID, placements)
state.SetFree(placementPodID, state.Free(placementPodID)+1)
s.reservePlacements(vpod, placements)
continue
}
priorityList, err := s.prioritizePods(ctx, state, vpod, feasiblePods, state.DeschedPolicy)
if err != nil {
logger.Info("error while scoring pods using priorities", zap.Error(err))
s.reservePlacements(vpod, placements)
break
}
placementPodID, err := s.selectPod(priorityList)
if err != nil {
logger.Info("error while selecting the placement pod", zap.Error(err))
s.reservePlacements(vpod, placements)
break
}
logger.Infof("Selected pod #%v to remove vreplica #%v from", placementPodID, i)
placements = s.removeSelectionFromPlacements(placementPodID, placements)
state.SetFree(placementPodID, state.Free(placementPodID)+1)
s.reservePlacements(vpod, placements)
}
return placements
}
func (s *StatefulSetScheduler) removeSelectionFromPlacements(placementPodID int32, placements []duckv1alpha1.Placement) []duckv1alpha1.Placement {
newPlacements := make([]duckv1alpha1.Placement, 0, len(placements))
for i := 0; i < len(placements); i++ {
ordinal := st.OrdinalFromPodName(placements[i].PodName)
if placementPodID == ordinal {
if placements[i].VReplicas == 1 {
// remove the entire placement
} else {
newPlacements = append(newPlacements, duckv1alpha1.Placement{
PodName: placements[i].PodName,
VReplicas: placements[i].VReplicas - 1,
})
}
} else {
newPlacements = append(newPlacements, duckv1alpha1.Placement{
PodName: placements[i].PodName,
VReplicas: placements[i].VReplicas,
})
}
}
return newPlacements
}
func (s *StatefulSetScheduler) addReplicasWithPolicy(ctx context.Context, vpod scheduler.VPod, diff int32, placements []duckv1alpha1.Placement) ([]duckv1alpha1.Placement, int32) {
logger := logging.FromContext(ctx).Named("add replicas with policy")
numVreps := diff
for i := int32(0); i < numVreps; i++ { //schedule one vreplica at a time (find most suitable pod placement satisying predicates with high score)
// Get the current placements state
state, err := s.stateAccessor.State(ctx, s.reserved)
if err != nil {
logger.Info("error while refreshing scheduler state (will retry)", zap.Error(err))
return placements, diff
}
if s.replicas == 0 { //no pods to filter
logger.Infow("no pods available in statefulset")
s.reservePlacements(vpod, placements)
diff = numVreps - i //for autoscaling up
break //end the iteration for all vreps since there are not pods
}
feasiblePods := s.findFeasiblePods(ctx, state, vpod, state.SchedPolicy)
if len(feasiblePods) == 0 { //no pods available to schedule this vreplica
logger.Info("no feasible pods available to schedule this vreplica")
s.reservePlacements(vpod, placements)
diff = numVreps - i //for autoscaling up and possible rebalancing
break
}
/* if len(feasiblePods) == 1 { //nothing to score, place vrep on that pod (Update: for HA, must run HA scorers)
placementPodID := feasiblePods[0]
logger.Infof("Selected pod #%v for vreplica #%v ", placementPodID, i)
placements = s.addSelectionToPlacements(placementPodID, placements)
//state.SetFree(placementPodID, state.Free(placementPodID)-1)
s.reservePlacements(vpod, placements)
diff--
continue
} */
priorityList, err := s.prioritizePods(ctx, state, vpod, feasiblePods, state.SchedPolicy)
if err != nil {
logger.Info("error while scoring pods using priorities", zap.Error(err))
s.reservePlacements(vpod, placements)
diff = numVreps - i //for autoscaling up and possible rebalancing
break
}
placementPodID, err := s.selectPod(priorityList)
if err != nil {
logger.Info("error while selecting the placement pod", zap.Error(err))
s.reservePlacements(vpod, placements)
diff = numVreps - i //for autoscaling up and possible rebalancing
break
}
logger.Infof("Selected pod #%v for vreplica #%v", placementPodID, i)
placements = s.addSelectionToPlacements(placementPodID, placements)
state.SetFree(placementPodID, state.Free(placementPodID)-1)
s.reservePlacements(vpod, placements)
diff--
}
return placements, diff
}
func (s *StatefulSetScheduler) addSelectionToPlacements(placementPodID int32, placements []duckv1alpha1.Placement) []duckv1alpha1.Placement {
seen := false
for i := 0; i < len(placements); i++ {
ordinal := st.OrdinalFromPodName(placements[i].PodName)
if placementPodID == ordinal {
seen = true
placements[i].VReplicas = placements[i].VReplicas + 1
}
}
if !seen {
placements = append(placements, duckv1alpha1.Placement{
PodName: st.PodNameFromOrdinal(s.statefulSetName, placementPodID),
VReplicas: 1,
})
}
return placements
}
// findFeasiblePods finds the pods that fit the filter plugins
func (s *StatefulSetScheduler) findFeasiblePods(ctx context.Context, state *st.State, vpod scheduler.VPod, policy *scheduler.SchedulerPolicy) []int32 {
feasiblePods := make([]int32, 0)
for _, podId := range state.SchedulablePods {
statusMap := s.RunFilterPlugins(ctx, state, vpod, podId, policy)
status := statusMap.Merge()
if status.IsSuccess() {
feasiblePods = append(feasiblePods, podId)
}
}
return feasiblePods
}
// removePodsNotInPlacement removes pods that do not have vreplicas placed
func (s *StatefulSetScheduler) removePodsNotInPlacement(vpod scheduler.VPod, feasiblePods []int32) []int32 {
newFeasiblePods := make([]int32, 0)
for _, e := range vpod.GetPlacements() {
for _, podID := range feasiblePods {
if podID == st.OrdinalFromPodName(e.PodName) { //if pod is in current placement list
newFeasiblePods = append(newFeasiblePods, podID)
}
}
}
return newFeasiblePods
}
// prioritizePods prioritizes the pods by running the score plugins, which return a score for each pod.
// The scores from each plugin are added together to make the score for that pod.
func (s *StatefulSetScheduler) prioritizePods(ctx context.Context, states *st.State, vpod scheduler.VPod, feasiblePods []int32, policy *scheduler.SchedulerPolicy) (st.PodScoreList, error) {
logger := logging.FromContext(ctx).Named("prioritize all feasible pods")
// If no priority configs are provided, then all pods will have a score of one
result := make(st.PodScoreList, 0, len(feasiblePods))
if !s.HasScorePlugins(states, policy) {
for _, podID := range feasiblePods {
result = append(result, st.PodScore{
ID: podID,
Score: 1,
})
}
return result, nil
}
scoresMap, scoreStatus := s.RunScorePlugins(ctx, states, vpod, feasiblePods, policy)
if !scoreStatus.IsSuccess() {
logger.Infof("FAILURE! Cannot score feasible pods due to plugin errors %v", scoreStatus.AsError())
return nil, scoreStatus.AsError()
}
// Summarize all scores.
for i := range feasiblePods {
result = append(result, st.PodScore{ID: feasiblePods[i], Score: 0})
for j := range scoresMap {
result[i].Score += scoresMap[j][i].Score
}
}
return result, nil
}
// selectPod takes a prioritized list of pods and then picks one
func (s *StatefulSetScheduler) selectPod(podScoreList st.PodScoreList) (int32, error) {
if len(podScoreList) == 0 {
return -1, fmt.Errorf("empty priority list") //no selected pod
}
maxScore := podScoreList[0].Score
selected := podScoreList[0].ID
cntOfMaxScore := int64(1)
for _, ps := range podScoreList[1:] {
if ps.Score > maxScore {
maxScore = ps.Score
selected = ps.ID
cntOfMaxScore = 1
} else if ps.Score == maxScore { //if equal scores, randomly picks one
cntOfMaxScore++
randNum, err := rand.Int(rand.Reader, big.NewInt(cntOfMaxScore))
if err != nil {
return -1, fmt.Errorf("failed to generate random number")
}
if randNum.Int64() == int64(0) {
selected = ps.ID
}
}
}
return selected, nil
}
// RunFilterPlugins runs the set of configured Filter plugins for a vrep on the given pod.
// If any of these plugins doesn't return "Success", the pod is not suitable for placing the vrep.
// Meanwhile, the failure message and status are set for the given pod.
func (s *StatefulSetScheduler) RunFilterPlugins(ctx context.Context, states *st.State, vpod scheduler.VPod, podID int32, policy *scheduler.SchedulerPolicy) st.PluginToStatus {
logger := logging.FromContext(ctx).Named("run all filter plugins")
statuses := make(st.PluginToStatus)
for _, plugin := range policy.Predicates {
pl, err := factory.GetFilterPlugin(plugin.Name)
if err != nil {
logger.Error("Could not find filter plugin in Registry: ", plugin.Name)
continue
}
//logger.Infof("Going to run filter plugin: %s using state: %v ", pl.Name(), states)
pluginStatus := s.runFilterPlugin(ctx, pl, plugin.Args, states, vpod, podID)
if !pluginStatus.IsSuccess() {
if !pluginStatus.IsUnschedulable() {
errStatus := st.NewStatus(st.Error, fmt.Sprintf("running %q filter plugin for pod %q failed with: %v", pl.Name(), podID, pluginStatus.Message()))
return map[string]*st.Status{pl.Name(): errStatus} //TODO: if one plugin fails, then no more plugins are run
}
statuses[pl.Name()] = pluginStatus
return statuses
}
}
return statuses
}
func (s *StatefulSetScheduler) runFilterPlugin(ctx context.Context, pl st.FilterPlugin, args interface{}, states *st.State, vpod scheduler.VPod, podID int32) *st.Status {
status := pl.Filter(ctx, args, states, vpod.GetKey(), podID)
return status
}
// RunScorePlugins runs the set of configured scoring plugins. It returns a list that stores for each scoring plugin name the corresponding PodScoreList(s).
// It also returns *Status, which is set to non-success if any of the plugins returns a non-success status.
func (s *StatefulSetScheduler) RunScorePlugins(ctx context.Context, states *st.State, vpod scheduler.VPod, feasiblePods []int32, policy *scheduler.SchedulerPolicy) (st.PluginToPodScores, *st.Status) {
logger := logging.FromContext(ctx).Named("run all score plugins")
pluginToPodScores := make(st.PluginToPodScores, len(policy.Priorities))
for _, plugin := range policy.Priorities {
pl, err := factory.GetScorePlugin(plugin.Name)
if err != nil {
logger.Error("Could not find score plugin in registry: ", plugin.Name)
continue
}
//logger.Infof("Going to run score plugin: %s using state: %v ", pl.Name(), states)
pluginToPodScores[pl.Name()] = make(st.PodScoreList, len(feasiblePods))
for index, podID := range feasiblePods {
score, pluginStatus := s.runScorePlugin(ctx, pl, plugin.Args, states, feasiblePods, vpod, podID)
if !pluginStatus.IsSuccess() {
errStatus := st.NewStatus(st.Error, fmt.Sprintf("running %q scoring plugin for pod %q failed with: %v", pl.Name(), podID, pluginStatus.AsError()))
return pluginToPodScores, errStatus //TODO: if one plugin fails, then no more plugins are run
}
score = score * plugin.Weight //WEIGHED SCORE VALUE
//logger.Infof("scoring plugin %q produced score %v for pod %q: %v", pl.Name(), score, podID, pluginStatus)
pluginToPodScores[pl.Name()][index] = st.PodScore{
ID: podID,
Score: score,
}
}
status := pl.ScoreExtensions().NormalizeScore(ctx, states, pluginToPodScores[pl.Name()]) //NORMALIZE SCORES FOR ALL FEASIBLE PODS
if !status.IsSuccess() {
errStatus := st.NewStatus(st.Error, fmt.Sprintf("running %q scoring plugin failed with: %v", pl.Name(), status.AsError()))
return pluginToPodScores, errStatus
}
}
return pluginToPodScores, st.NewStatus(st.Success)
}
func (s *StatefulSetScheduler) runScorePlugin(ctx context.Context, pl st.ScorePlugin, args interface{}, states *st.State, feasiblePods []int32, vpod scheduler.VPod, podID int32) (uint64, *st.Status) {
score, status := pl.Score(ctx, args, states, feasiblePods, vpod.GetKey(), podID)
return score, status
}
// HasScorePlugins returns true if at least one score plugin is defined.
func (s *StatefulSetScheduler) HasScorePlugins(state *st.State, policy *scheduler.SchedulerPolicy) bool {
return len(policy.Priorities) > 0
}
func (s *StatefulSetScheduler) removeReplicas(diff int32, placements []duckv1alpha1.Placement) []duckv1alpha1.Placement { func (s *StatefulSetScheduler) removeReplicas(diff int32, placements []duckv1alpha1.Placement) []duckv1alpha1.Placement {
newPlacements := make([]duckv1alpha1.Placement, 0, len(placements)) newPlacements := make([]duckv1alpha1.Placement, 0, len(placements))
for i := len(placements) - 1; i > -1; i-- { for i := len(placements) - 1; i > -1; i-- {
@ -739,55 +441,110 @@ func (s *StatefulSetScheduler) removeReplicas(diff int32, placements []duckv1alp
return newPlacements return newPlacements
} }
func (s *StatefulSetScheduler) addReplicas(states *st.State, diff int32, placements []duckv1alpha1.Placement) ([]duckv1alpha1.Placement, int32) { func (s *StatefulSetScheduler) addReplicas(states *st.State, reservedByPodName map[string]int32, vpod scheduler.VPod, diff int32, placements []duckv1alpha1.Placement) ([]duckv1alpha1.Placement, int32) {
// Pod affinity algorithm: prefer adding replicas to existing pods before considering other replicas if states.Replicas <= 0 {
newPlacements := make([]duckv1alpha1.Placement, 0, len(placements)) return placements, diff
// Add to existing
for i := 0; i < len(placements); i++ {
podName := placements[i].PodName
ordinal := st.OrdinalFromPodName(podName)
// Is there space in PodName?
f := states.Free(ordinal)
if diff >= 0 && f > 0 {
allocation := integer.Int32Min(f, diff)
newPlacements = append(newPlacements, duckv1alpha1.Placement{
PodName: podName,
VReplicas: placements[i].VReplicas + allocation,
})
diff -= allocation
states.SetFree(ordinal, f-allocation)
} else {
newPlacements = append(newPlacements, placements[i])
}
} }
if diff > 0 { newPlacements := make([]duckv1alpha1.Placement, 0, len(placements))
// Needs to allocate replicas to additional pods
for ordinal := int32(0); ordinal < s.replicas; ordinal++ { // Preserve existing placements
f := states.Free(ordinal) for _, p := range placements {
if f > 0 { newPlacements = append(newPlacements, *p.DeepCopy())
allocation := integer.Int32Min(f, diff) }
newPlacements = append(newPlacements, duckv1alpha1.Placement{
PodName: st.PodNameFromOrdinal(s.statefulSetName, ordinal), candidates := s.candidatesOrdered(states, vpod, placements)
// Spread replicas in as many candidates as possible.
foundFreeCandidate := true
for diff > 0 && foundFreeCandidate {
foundFreeCandidate = false
for _, ordinal := range candidates {
if diff <= 0 {
break
}
podName := st.PodNameFromOrdinal(states.StatefulSetName, ordinal)
reserved, _ := reservedByPodName[podName]
// Is there space?
if states.Capacity-reserved > 0 {
foundFreeCandidate = true
allocation := int32(1)
newPlacements = upsertPlacements(newPlacements, duckv1alpha1.Placement{
PodName: st.PodNameFromOrdinal(states.StatefulSetName, ordinal),
VReplicas: allocation, VReplicas: allocation,
}) })
diff -= allocation diff -= allocation
states.SetFree(ordinal, f-allocation) reservedByPodName[podName] += allocation
}
if diff == 0 {
break
} }
} }
} }
if len(newPlacements) == 0 {
return nil, diff
}
return newPlacements, diff return newPlacements, diff
} }
func (s *StatefulSetScheduler) candidatesOrdered(states *st.State, vpod scheduler.VPod, placements []duckv1alpha1.Placement) []int32 {
existingPlacements := sets.New[string]()
candidates := make([]int32, len(states.SchedulablePods))
firstIdx := 0
lastIdx := len(candidates) - 1
// De-prioritize existing placements pods, add existing placements to the tail of the candidates.
// Start from the last one so that within the "existing replicas" group, we prioritize lower ordinals
// to reduce compaction.
for i := len(placements) - 1; i >= 0; i-- {
placement := placements[i]
ordinal := st.OrdinalFromPodName(placement.PodName)
if !states.IsSchedulablePod(ordinal) {
continue
}
// This should really never happen as placements are de-duped, however, better to handle
// edge cases in case the prerequisite doesn't hold in the future.
if existingPlacements.Has(placement.PodName) {
continue
}
candidates[lastIdx] = ordinal
lastIdx--
existingPlacements.Insert(placement.PodName)
}
// Prioritize reserved placements that don't appear in the committed placements.
if reserved, ok := s.reserved[vpod.GetKey()]; ok {
for podName := range reserved {
if !states.IsSchedulablePod(st.OrdinalFromPodName(podName)) {
continue
}
if existingPlacements.Has(podName) {
continue
}
candidates[firstIdx] = st.OrdinalFromPodName(podName)
firstIdx++
existingPlacements.Insert(podName)
}
}
// Add all the ordinals to the candidates list.
// De-prioritize the last ordinals over lower ordinals so that we reduce the chances for compaction.
for ordinal := s.replicas - 1; ordinal >= 0; ordinal-- {
if !states.IsSchedulablePod(ordinal) {
continue
}
podName := st.PodNameFromOrdinal(states.StatefulSetName, ordinal)
if existingPlacements.Has(podName) {
continue
}
candidates[lastIdx] = ordinal
lastIdx--
}
return candidates
}
func (s *StatefulSetScheduler) updateStatefulset(ctx context.Context, obj interface{}) { func (s *StatefulSetScheduler) updateStatefulset(ctx context.Context, obj interface{}) {
statefulset, ok := obj.(*appsv1.StatefulSet) statefulset, ok := obj.(*appsv1.StatefulSet)
if !ok { if !ok {
@ -808,31 +565,17 @@ func (s *StatefulSetScheduler) updateStatefulset(ctx context.Context, obj interf
func (s *StatefulSetScheduler) reservePlacements(vpod scheduler.VPod, placements []duckv1alpha1.Placement) { func (s *StatefulSetScheduler) reservePlacements(vpod scheduler.VPod, placements []duckv1alpha1.Placement) {
if len(placements) == 0 { // clear our old placements in reserved if len(placements) == 0 { // clear our old placements in reserved
s.reserved[vpod.GetKey()] = make(map[string]int32) delete(s.reserved, vpod.GetKey())
return
} }
s.reserved[vpod.GetKey()] = make(map[string]int32, len(placements))
for _, p := range placements { for _, p := range placements {
// note: track all vreplicas, not only the new ones since
// the next time `state()` is called some vreplicas might
// have been committed.
if _, ok := s.reserved[vpod.GetKey()]; !ok {
s.reserved[vpod.GetKey()] = make(map[string]int32)
}
s.reserved[vpod.GetKey()][p.PodName] = p.VReplicas s.reserved[vpod.GetKey()][p.PodName] = p.VReplicas
} }
} }
func (s *StatefulSetScheduler) makeZeroPlacements(vpod scheduler.VPod, placements []duckv1alpha1.Placement) {
newPlacements := make([]duckv1alpha1.Placement, len(placements))
for i := 0; i < len(placements); i++ {
newPlacements[i].PodName = placements[i].PodName
newPlacements[i].VReplicas = 0
}
// This is necessary to make sure State() zeroes out initial pod/node/zone spread and
// free capacity when there are existing placements for a vpod
s.reservePlacements(vpod, newPlacements)
}
// newNotEnoughPodReplicas returns an error explaining what is the problem, what are the actions we're taking // newNotEnoughPodReplicas returns an error explaining what is the problem, what are the actions we're taking
// to try to fix it (retry), wrapping a controller.requeueKeyError which signals to ReconcileKind to requeue the // to try to fix it (retry), wrapping a controller.requeueKeyError which signals to ReconcileKind to requeue the
// object after a given delay. // object after a given delay.
@ -859,3 +602,26 @@ func (s *StatefulSetScheduler) Reserved() map[types.NamespacedName]map[string]in
return r return r
} }
func upsertPlacements(placements []duckv1alpha1.Placement, placement duckv1alpha1.Placement) []duckv1alpha1.Placement {
found := false
for i := range placements {
if placements[i].PodName == placement.PodName {
placements[i].VReplicas = placements[i].VReplicas + placement.VReplicas
found = true
break
}
}
if !found {
placements = append(placements, placement)
}
return placements
}
func vPodsByKey(vPods []scheduler.VPod) map[types.NamespacedName]scheduler.VPod {
r := make(map[types.NamespacedName]scheduler.VPod, len(vPods))
for _, vPod := range vPods {
r[vPod.GetKey()] = vPod
}
return r
}

File diff suppressed because it is too large Load Diff

View File

@ -17,6 +17,8 @@ limitations under the License.
package testing package testing
import ( import (
"math/rand"
duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1" duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1"
"knative.dev/eventing/pkg/scheduler" "knative.dev/eventing/pkg/scheduler"
) )
@ -51,3 +53,10 @@ func (s *VPodClient) Append(vpod scheduler.VPod) {
func (s *VPodClient) List() ([]scheduler.VPod, error) { func (s *VPodClient) List() ([]scheduler.VPod, error) {
return s.lister() return s.lister()
} }
func (s *VPodClient) Random() scheduler.VPod {
s.store.lock.Lock()
defer s.store.lock.Unlock()
return s.store.vpods[rand.Intn(len(s.store.vpods))]
}

View File

@ -25,15 +25,14 @@ import (
"k8s.io/apimachinery/pkg/types" "k8s.io/apimachinery/pkg/types"
"knative.dev/pkg/controller" "knative.dev/pkg/controller"
duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1"
"knative.dev/eventing/pkg/scheduler"
appsv1 "k8s.io/api/apps/v1" appsv1 "k8s.io/api/apps/v1"
autoscalingv1 "k8s.io/api/autoscaling/v1" autoscalingv1 "k8s.io/api/autoscaling/v1"
v1 "k8s.io/api/core/v1" v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
gtesting "k8s.io/client-go/testing" gtesting "k8s.io/client-go/testing"
duckv1alpha1 "knative.dev/eventing/pkg/apis/duck/v1alpha1"
kubeclient "knative.dev/pkg/client/injection/kube/client/fake" kubeclient "knative.dev/pkg/client/injection/kube/client/fake"
_ "knative.dev/pkg/client/injection/kube/informers/apps/v1/statefulset/fake" _ "knative.dev/pkg/client/injection/kube/informers/apps/v1/statefulset/fake"
rectesting "knative.dev/pkg/reconciler/testing" rectesting "knative.dev/pkg/reconciler/testing"
@ -58,6 +57,10 @@ func NewVPod(ns, name string, vreplicas int32, placements []duckv1alpha1.Placeme
} }
} }
func (d *sampleVPod) GetDeletionTimestamp() *metav1.Time {
return nil
}
func (d *sampleVPod) GetKey() types.NamespacedName { func (d *sampleVPod) GetKey() types.NamespacedName {
return d.key return d.key
} }
@ -74,45 +77,6 @@ func (d *sampleVPod) GetResourceVersion() string {
return d.rsrcversion return d.rsrcversion
} }
func MakeNode(name, zonename string) *v1.Node {
obj := &v1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Labels: map[string]string{
scheduler.ZoneLabel: zonename,
},
},
}
return obj
}
func MakeNodeNoLabel(name string) *v1.Node {
obj := &v1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: name,
},
}
return obj
}
func MakeNodeTainted(name, zonename string) *v1.Node {
obj := &v1.Node{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Labels: map[string]string{
scheduler.ZoneLabel: zonename,
},
},
Spec: v1.NodeSpec{
Taints: []v1.Taint{
{Key: "node.kubernetes.io/unreachable", Effect: v1.TaintEffectNoExecute},
{Key: "node.kubernetes.io/unreachable", Effect: v1.TaintEffectNoSchedule},
},
},
}
return obj
}
func MakeStatefulset(ns, name string, replicas int32) *appsv1.StatefulSet { func MakeStatefulset(ns, name string, replicas int32) *appsv1.StatefulSet {
obj := &appsv1.StatefulSet{ obj := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{ ObjectMeta: metav1.ObjectMeta{
@ -143,7 +107,7 @@ func MakePod(ns, name, nodename string) *v1.Pod {
return obj return obj
} }
func SetupFakeContext(t *testing.T) (context.Context, context.CancelFunc) { func SetupFakeContext(t testing.TB) (context.Context, context.CancelFunc) {
ctx, cancel, informers := rectesting.SetupFakeContextWithCancel(t) ctx, cancel, informers := rectesting.SetupFakeContextWithCancel(t)
err := controller.StartInformers(ctx.Done(), informers...) err := controller.StartInformers(ctx.Done(), informers...)
if err != nil { if err != nil {

View File

@ -24,6 +24,7 @@ import (
cetest "github.com/cloudevents/sdk-go/v2/test" cetest "github.com/cloudevents/sdk-go/v2/test"
"github.com/google/uuid" "github.com/google/uuid"
batchv1 "k8s.io/api/batch/v1" batchv1 "k8s.io/api/batch/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime/schema" "k8s.io/apimachinery/pkg/runtime/schema"
"k8s.io/apimachinery/pkg/util/wait" "k8s.io/apimachinery/pkg/util/wait"
@ -42,11 +43,14 @@ import (
"knative.dev/eventing/test/rekt/resources/jobsink" "knative.dev/eventing/test/rekt/resources/jobsink"
) )
func Success() *feature.Feature { func Success(jobSinkName string) *feature.Feature {
f := feature.NewFeature() f := feature.NewFeature()
sink := feature.MakeRandomK8sName("sink") sink := feature.MakeRandomK8sName("sink")
jobSink := feature.MakeRandomK8sName("jobsink") jobSink := feature.MakeRandomK8sName("jobsink")
if jobSinkName != "" {
jobSink = jobSinkName
}
source := feature.MakeRandomK8sName("source") source := feature.MakeRandomK8sName("source")
sinkURL := &apis.URL{Scheme: "http", Host: sink} sinkURL := &apis.URL{Scheme: "http", Host: sink}
@ -78,6 +82,32 @@ func Success() *feature.Feature {
return f return f
} }
func DeleteJobsCascadeSecretsDeletion(jobSink string) *feature.Feature {
f := feature.NewFeature()
f.Setup("Prerequisite: At least one secret for jobsink present", verifySecretsForJobSink(jobSink, func(secrets *corev1.SecretList) bool {
return len(secrets.Items) > 0
}))
f.Requirement("delete jobs for jobsink", func(ctx context.Context, t feature.T) {
policy := metav1.DeletePropagationBackground
err := kubeclient.Get(ctx).BatchV1().
Jobs(environment.FromContext(ctx).Namespace()).
DeleteCollection(ctx, metav1.DeleteOptions{PropagationPolicy: &policy}, metav1.ListOptions{
LabelSelector: fmt.Sprintf("%s=%s", sinks.JobSinkNameLabel, jobSink),
})
if err != nil {
t.Error(err)
}
})
f.Assert("No secrets for jobsink are present", verifySecretsForJobSink(jobSink, func(secrets *corev1.SecretList) bool {
return len(secrets.Items) == 0
}))
return f
}
func SuccessTLS() *feature.Feature { func SuccessTLS() *feature.Feature {
f := feature.NewFeature() f := feature.NewFeature()
@ -234,3 +264,27 @@ func AtLeastOneJobIsComplete(jobSinkName string) feature.StepFn {
t.Errorf("No job is complete:\n%v", string(bytes)) t.Errorf("No job is complete:\n%v", string(bytes))
} }
} }
func verifySecretsForJobSink(jobSink string, verify func(secrets *corev1.SecretList) bool) feature.StepFn {
return func(ctx context.Context, t feature.T) {
interval, timeout := environment.PollTimingsFromContext(ctx)
lastSecretList := &corev1.SecretList{}
err := wait.PollUntilContextTimeout(ctx, interval, timeout, true, func(ctx context.Context) (bool, error) {
var err error
lastSecretList, err = kubeclient.Get(ctx).CoreV1().
Secrets(environment.FromContext(ctx).Namespace()).
List(ctx, metav1.ListOptions{
LabelSelector: fmt.Sprintf("%s=%s", sinks.JobSinkNameLabel, jobSink),
})
if err != nil {
return false, fmt.Errorf("failed to list secrets: %w", err)
}
return verify(lastSecretList), nil
})
if err != nil {
bytes, _ := json.Marshal(lastSecretList)
t.Errorf("failed to wait for no secrets: %v\nSecret list:\n%s", err, string(bytes))
}
}
}

View File

@ -46,7 +46,23 @@ func TestJobSinkSuccess(t *testing.T) {
environment.Managed(t), environment.Managed(t),
) )
env.Test(ctx, t, jobsink.Success()) env.Test(ctx, t, jobsink.Success(""))
}
func TestJobSinkDeleteJobCascadeSecretDeletion(t *testing.T) {
t.Parallel()
ctx, env := global.Environment(
knative.WithKnativeNamespace(system.Namespace()),
knative.WithLoggingConfig,
knative.WithTracingConfig,
k8s.WithEventListener,
environment.Managed(t),
)
jobSinkName := feature.MakeRandomK8sName("jobsink")
env.Test(ctx, t, jobsink.Success(jobSinkName))
env.Test(ctx, t, jobsink.DeleteJobsCascadeSecretsDeletion(jobSinkName))
} }
func TestJobSinkSuccessTLS(t *testing.T) { func TestJobSinkSuccessTLS(t *testing.T) {

View File

@ -73,6 +73,9 @@ func Install(name string, opts ...manifest.CfgFn) feature.StepFn {
fn(cfg) fn(cfg)
} }
if err := registerImage(ctx); err != nil {
t.Fatal(err)
}
if _, err := manifest.InstallYamlFS(ctx, yamlEmbed, cfg); err != nil { if _, err := manifest.InstallYamlFS(ctx, yamlEmbed, cfg); err != nil {
t.Fatal(err) t.Fatal(err)
} }
@ -223,3 +226,10 @@ func GoesReadySimple(name string) *feature.Feature {
return f return f
} }
func registerImage(ctx context.Context) error {
im := eventshub.ImageFromContext(ctx)
reg := environment.RegisterPackage(im)
_, err := reg(ctx, environment.FromContext(ctx))
return err
}

View File

@ -21,7 +21,7 @@ source "$(dirname "${BASH_SOURCE[0]:-$0}")/library.sh"
# Default Kubernetes version to use for GKE, if not overridden with # Default Kubernetes version to use for GKE, if not overridden with
# the `--cluster-version` parameter. # the `--cluster-version` parameter.
readonly GKE_DEFAULT_CLUSTER_VERSION="1.28" readonly GKE_DEFAULT_CLUSTER_VERSION="1.29"
# Dumps the k8s api server metrics. Spins up a proxy, waits a little bit and # Dumps the k8s api server metrics. Spins up a proxy, waits a little bit and
# dumps the metrics to ${ARTIFACTS}/k8s.metrics.txt # dumps the metrics to ${ARTIFACTS}/k8s.metrics.txt

View File

@ -75,7 +75,7 @@ RELEASE_NOTES=""
RELEASE_BRANCH="" RELEASE_BRANCH=""
RELEASE_GCS_BUCKET="knative-nightly/${REPO_NAME}" RELEASE_GCS_BUCKET="knative-nightly/${REPO_NAME}"
RELEASE_DIR="" RELEASE_DIR=""
KO_FLAGS="-P --platform=all" export KO_FLAGS="-P --platform=all"
VALIDATION_TESTS="./test/presubmit-tests.sh" VALIDATION_TESTS="./test/presubmit-tests.sh"
ARTIFACTS_TO_PUBLISH="" ARTIFACTS_TO_PUBLISH=""
FROM_NIGHTLY_RELEASE="" FROM_NIGHTLY_RELEASE=""
@ -90,11 +90,10 @@ export GOFLAGS="-ldflags=-s -ldflags=-w"
export GITHUB_TOKEN="" export GITHUB_TOKEN=""
readonly IMAGES_REFS_FILE="${IMAGES_REFS_FILE:-$(mktemp -d)/images_refs.txt}" readonly IMAGES_REFS_FILE="${IMAGES_REFS_FILE:-$(mktemp -d)/images_refs.txt}"
# Convenience function to run the hub tool. # Convenience function to run the GitHub CLI tool `gh`.
# Parameters: $1..$n - arguments to hub. # Parameters: $1..$n - arguments to gh.
function hub_tool() { function gh_tool() {
# Pinned to SHA because of https://github.com/github/hub/issues/2517 go_run github.com/cli/cli/v2/cmd/gh@v2.65.0 "$@"
go_run github.com/github/hub/v2@363513a "$@"
} }
# Shortcut to "git push" that handles authentication. # Shortcut to "git push" that handles authentication.
@ -193,7 +192,7 @@ function prepare_dot_release() {
# Support tags in two formats # Support tags in two formats
# - knative-v1.0.0 # - knative-v1.0.0
# - v1.0.0 # - v1.0.0
releases="$(hub_tool release | cut -d '-' -f2)" releases="$(gh_tool release list --json tagName --jq '.[].tagName' | cut -d '-' -f2)"
echo "Current releases are: ${releases}" echo "Current releases are: ${releases}"
[[ $? -eq 0 ]] || abort "cannot list releases" [[ $? -eq 0 ]] || abort "cannot list releases"
# If --release-branch passed, restrict to that release # If --release-branch passed, restrict to that release
@ -218,7 +217,7 @@ function prepare_dot_release() {
# Ensure there are new commits in the branch, otherwise we don't create a new release # Ensure there are new commits in the branch, otherwise we don't create a new release
setup_branch setup_branch
# Use the original tag (ie. potentially with a knative- prefix) when determining the last version commit sha # Use the original tag (ie. potentially with a knative- prefix) when determining the last version commit sha
local github_tag="$(hub_tool release | grep "${last_version}")" local github_tag="$(gh_tool release list --json tagName --jq '.[].tagName' | grep "${last_version}")"
local last_release_commit="$(git rev-list -n 1 "${github_tag}")" local last_release_commit="$(git rev-list -n 1 "${github_tag}")"
local last_release_commit_filtered="$(git rev-list --invert-grep --grep "\[skip-dot-release\]" -n 1 "${github_tag}")" local last_release_commit_filtered="$(git rev-list --invert-grep --grep "\[skip-dot-release\]" -n 1 "${github_tag}")"
local release_branch_commit="$(git rev-list -n 1 upstream/"${RELEASE_BRANCH}")" local release_branch_commit="$(git rev-list -n 1 upstream/"${RELEASE_BRANCH}")"
@ -239,7 +238,7 @@ function prepare_dot_release() {
# If --release-notes not used, copy from the latest release # If --release-notes not used, copy from the latest release
if [[ -z "${RELEASE_NOTES}" ]]; then if [[ -z "${RELEASE_NOTES}" ]]; then
RELEASE_NOTES="$(mktemp)" RELEASE_NOTES="$(mktemp)"
hub_tool release show -f "%b" "${github_tag}" > "${RELEASE_NOTES}" gh_tool release view "${github_tag}" --json "body" --jq '.body' > "${RELEASE_NOTES}"
echo "Release notes from ${last_version} copied to ${RELEASE_NOTES}" echo "Release notes from ${last_version} copied to ${RELEASE_NOTES}"
fi fi
} }
@ -640,18 +639,12 @@ function set_latest_to_highest_semver() {
local last_version release_id # don't combine with assignment else $? will be 0 local last_version release_id # don't combine with assignment else $? will be 0
last_version="$(hub_tool -p release | cut -d'-' -f2 | grep '^v[0-9]\+\.[0-9]\+\.[0-9]\+$'| sort -r -V | head -1)" last_version="$(gh_tool release list --json tagName --jq '.[].tagName' | cut -d'-' -f2 | grep '^v[0-9]\+\.[0-9]\+\.[0-9]\+$'| sort -r -V | head -1)"
if ! [[ $? -eq 0 ]]; then if ! [[ $? -eq 0 ]]; then
abort "cannot list releases" abort "cannot list releases"
fi fi
release_id="$(hub_tool api "/repos/${ORG_NAME}/${REPO_NAME}/releases/tags/knative-${last_version}" | jq .id)" gh_tool release edit "knative-${last_version}" --latest > /dev/null || abort "error setting $last_version to 'latest'"
if [[ $? -ne 0 ]]; then
abort "cannot get relase id from github"
fi
hub_tool api --method PATCH "/repos/${ORG_NAME}/${REPO_NAME}/releases/$release_id" \
-F make_latest=true > /dev/null || abort "error setting $last_version to 'latest'"
echo "Github release ${last_version} set as 'latest'" echo "Github release ${last_version} set as 'latest'"
} }
@ -742,12 +735,14 @@ function publish_to_github() {
local description="$(mktemp)" local description="$(mktemp)"
local attachments_dir="$(mktemp -d)" local attachments_dir="$(mktemp -d)"
local commitish="" local commitish=""
local target_branch=""
local github_tag="knative-${TAG}" local github_tag="knative-${TAG}"
# Copy files to a separate dir # Copy files to a separate dir
# shellcheck disable=SC2068
for artifact in $@; do for artifact in $@; do
cp ${artifact} "${attachments_dir}"/ cp ${artifact} "${attachments_dir}"/
attachments+=("--attach=${artifact}#$(basename ${artifact})") attachments+=("${artifact}#$(basename ${artifact})")
done done
echo -e "${title}\n" > "${description}" echo -e "${title}\n" > "${description}"
if [[ -n "${RELEASE_NOTES}" ]]; then if [[ -n "${RELEASE_NOTES}" ]]; then
@ -774,13 +769,16 @@ function publish_to_github() {
git tag -a "${github_tag}" -m "${title}" git tag -a "${github_tag}" -m "${title}"
git_push tag "${github_tag}" git_push tag "${github_tag}"
[[ -n "${RELEASE_BRANCH}" ]] && commitish="--commitish=${RELEASE_BRANCH}" [[ -n "${RELEASE_BRANCH}" ]] && target_branch="--target=${RELEASE_BRANCH}"
for i in {2..0}; do for i in {2..0}; do
hub_tool release create \ # shellcheck disable=SC2068
${attachments[@]} \ gh_tool release create \
--file="${description}" \ "${github_tag}" \
"${commitish}" \ --title "${title}" \
"${github_tag}" && return 0 --notes-file "${description}" \
"${target_branch}" \
${attachments[@]} && return 0
if [[ "${i}" -gt 0 ]]; then if [[ "${i}" -gt 0 ]]; then
echo "Error publishing the release, retrying in 15s..." echo "Error publishing the release, retrying in 15s..."
sleep 15 sleep 15

6
vendor/modules.txt vendored
View File

@ -1069,10 +1069,10 @@ k8s.io/utils/pointer
k8s.io/utils/ptr k8s.io/utils/ptr
k8s.io/utils/strings/slices k8s.io/utils/strings/slices
k8s.io/utils/trace k8s.io/utils/trace
# knative.dev/hack v0.0.0-20241010131451-05b2fb30cb4d # knative.dev/hack v0.0.0-20250220110655-b5e4ff820460
## explicit; go 1.21 ## explicit; go 1.21
knative.dev/hack knative.dev/hack
# knative.dev/hack/schema v0.0.0-20241010131451-05b2fb30cb4d # knative.dev/hack/schema v0.0.0-20250220110655-b5e4ff820460
## explicit; go 1.21 ## explicit; go 1.21
knative.dev/hack/schema/commands knative.dev/hack/schema/commands
knative.dev/hack/schema/docs knative.dev/hack/schema/docs
@ -1223,7 +1223,7 @@ knative.dev/pkg/webhook/resourcesemantics
knative.dev/pkg/webhook/resourcesemantics/conversion knative.dev/pkg/webhook/resourcesemantics/conversion
knative.dev/pkg/webhook/resourcesemantics/defaulting knative.dev/pkg/webhook/resourcesemantics/defaulting
knative.dev/pkg/webhook/resourcesemantics/validation knative.dev/pkg/webhook/resourcesemantics/validation
# knative.dev/reconciler-test v0.0.0-20241015093232-09111f0f1364 # knative.dev/reconciler-test v0.0.0-20250217113355-f4bd4f5199d4
## explicit; go 1.22.0 ## explicit; go 1.22.0
knative.dev/reconciler-test/cmd/eventshub knative.dev/reconciler-test/cmd/eventshub
knative.dev/reconciler-test/pkg/environment knative.dev/reconciler-test/pkg/environment