Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add node autonomy duration to lifecycle controller #2201

Merged
merged 2 commits into from
Nov 26, 2024

Conversation

tnsimon
Copy link
Contributor

@tnsimon tnsimon commented Nov 22, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Resolves #2196

Which issue(s) this PR fixes:

Fixes #2196

Special notes for your reviewer:

/assign @rambohe-ch

Does this PR introduce a user-facing change?

Adds a new node annotation node.openyurt.io/autonomy-duration which maps to pod tolerationSeconds to ensure pods are evicted during node failures.

other Note

# Label edge node 
k get nodes vagrant -o yaml
apiVersion: v1
kind: Node
metadata:
  annotations:
    node.openyurt.io/autonomy-duration: 30s

# Schedule workload
k get pods -o wide
NAME                               READY   STATUS    RESTARTS   AGE     IP               NODE                                NOMINATED NODE   READINESS GATES
netutils-6f94558c58-kvcwq          1/1     Running   0          3m50s   10.244.1.41      vagrant                             <none>           <none>
netutils-6f94558c58-pz9cp          1/1     Running   0          3m50s   10.244.1.248     vagrant                             <none>           <none>

# Show tolerationSeconds on pod
k get pods netutils-6f94558c58-kvcwq -o yaml
...
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 30
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 30

# Down network on edge
$ iptables -A INPUT -i eth0 -j DROP

# After 30 seconds
k get pods -o wide
NAME                               READY   STATUS        RESTARTS   AGE     IP               NODE                                NOMINATED NODE   
# Pod is terminating + 2 pending scheduled
netutils-6f94558c58-hsqv4          0/1     Pending       0          11s     <none>           <none>                              <none>           <none>
netutils-6f94558c58-kvcwq          1/1     Terminating   0          6m27s   10.244.1.41      vagrant                             <none>           <none>
netutils-6f94558c58-m88n7          0/1     Pending       0          11s     <none>           <none>                              <none>           <none>
netutils-6f94558c58-pz9cp          1/1     Terminating   0          6m27s   10.244.1.248     vagrant                             <none>           <none>

# Restore the node
k get nodes
NAME                                STATUS   ROLES    AGE    VERSION
vagrant                             Ready    <none>   16m    v1.29.9

# Check pods
$ k get po -o wide
NAME                               READY   STATUS    RESTARTS            AGE    IP               NODE                                NOMINATED NODE   READINESS GATES
netutils-6f94558c58-hsqv4          1/1     Running   0                   3m1s   10.244.1.16      vagrant                             <none>           <none>
netutils-6f94558c58-m88n7          1/1     Running   0                   3m1s   10.244.1.143     vagrant                             <none>           <none>

# On edge - the hash matches desired state
root@vagrant:/home/vagrant# c ps
CONTAINER           IMAGE               CREATED                  STATE               NAME                                     ATTEMPT             POD ID              POD
35a02917e7019       c7b1228ff9eb8       54 seconds ago           Running             netutils                                 0                   989e75d06e97f       netutils-6f94558c58-hsqv4
6d33ebc02a5f3       c7b1228ff9eb8       56 seconds ago           Running             netutils                                 0                   8eed0a481574b       netutils-6f94558c58-m88n7

Copy link

codecov bot commented Nov 23, 2024

Codecov Report

Attention: Patch coverage is 92.59259% with 2 lines in your changes missing coverage. Please review.

Project coverage is 45.13%. Comparing base (7763e7c) to head (c6723a4).
Report is 42 commits behind head on master.

Files with missing lines Patch % Lines
...rtcoordinator/podbinding/pod_binding_controller.go 90.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2201       +/-   ##
===========================================
- Coverage   58.93%   45.13%   -13.80%     
===========================================
  Files         210      402      +192     
  Lines       18968    27753     +8785     
===========================================
+ Hits        11179    12527     +1348     
- Misses       6707    13997     +7290     
- Partials     1082     1229      +147     
Flag Coverage Δ
unittests 45.13% <92.59%> (-13.80%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zyjhtangtang
Copy link
Contributor

/LGTM

@rambohe-ch rambohe-ch added approved approved lgtm lgtm labels Nov 26, 2024
@rambohe-ch rambohe-ch merged commit 09078b4 into openyurtio:master Nov 26, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved approved lgtm lgtm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature request]Improve pod eviction policy during node heartbeat lost
3 participants