Zero downtime upgrade for Azure VMSS

Siddharth Singha Roy 20 Reputation points Microsoft Employee
2023-03-15T10:31:20.9466667+00:00

We want to achieve zero downtime upgrade for Azure VMSS. Currently, our VMSS runs a lightweight webserver which handles long running connections. The upgrade policy for VMSS is set to Rolling Upgrade. During a rolling upgrade, the VMs are updated batchwise without any connection draining and all the existing running connections are terminated. Our goal is to prevent existing connections from being terminated during upgrade. VMSS should wait for all the connections to drain from that VM before updating it.

Is there any native solution provided by Azure VMSS to tackle this or any recommended approach to handle this scenario?

Azure Virtual Machine Scale Sets
Azure Virtual Machine Scale Sets
Azure compute resources that are used to create and manage groups of heterogeneous load-balanced virtual machines.
355 questions
0 comments No comments
{count} votes

Accepted answer
  1. deherman-MSFT 34,036 Reputation points Microsoft Employee
    2023-03-15T21:36:24.6366667+00:00

    @Siddharth Singha Roy

    There is no native solution that handles connection draining for VMSS currently. However, here are some possible workarounds that you can try:

    • One method is to block health probe IP address 168.63.129.16 on the VMs that you want to upgrade, so they will be marked as “unhealthy” by the load balancer or application gateway and no new traffic will be sent while old existing traffic will still be active. You can use a custom script extension or a run command to block the IP address before upgrading and unblock it after upgrading.
    • Another method is to use Azure Application Gateway with connection draining enabled on the backend http setting. This feature allows you to gracefully remove backend pool members during planned service updates by giving them a timeout period to finish processing requests before terminating them. You can configure the connection draining timeout value from 1 second to 3600 seconds.
    • A third method is to use instance protection for Azure VMSS instances. This feature allows you to protect specific instances from being deleted or deallocated by scale-in operations. You can enable instance protection on the instances that you want to upgrade and disable it after upgrading.

    For product feedback and feature requests I will refer you to our feedback forum. This allows the community to add their voice and upvote popular ideas. The forums are monitored and responded to by our product teams.

    https://feedback.azure.com

    Hope this helps. Let me know if you still have questions and I will do my best to assist.


    Please don’t forget to "Accept the answer" and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Martin Søstrand Helgesen 0 Reputation points
    2024-04-10T07:00:25.49+00:00

    We use the health probe approach which works quite nice.

    In a legacy environment we have hundreds of classic IIS webapps on Windows VMs behind Azure Application Gateway, and our rolling deployment process is faulting the health probe, and then waiting for the connections to drop, and then deploy, and then do warmup calls, and then enables it again.

    This works also with multiple simultaneous deployments. Try doing that with updating the AGW resource....

    0 comments No comments