add detailed explanation of GameServer hot-update (#70)

Signed-off-by: ChrisLiu <chrisliu1995@163.com>
This commit is contained in:
ChrisLiu 2023-06-29 16:06:33 +08:00 committed by GitHub
parent 056cbe351c
commit f2b81a993f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 638 additions and 60 deletions

View File

@ -1,71 +1,377 @@
## Feature overview
In gaming scenarios, game server scripts and scenario resources are hot update files and are often deployed as sidecar containers in pods.
When these files need to be updated, we hope that the main program on the engine side of the game server is not negatively affected.
However, in a Kubernetes-native cluster, updates to a container in a pod can cause the pod to be recreated. This fails to meet the requirements of in-update.
Game server update is a crucial part of game server application delivery.
As a stateful type of service, game server updates often require higher demands on cloud-native infrastructure.
This article mainly introduces how to use OKG's in-place update capability to achieve hot updates of game servers.
OpenKruiseGame provides the in-place upgrade capability, which allows you to update a specific container in a pod without affecting the lifecycle of the pod as a whole.
As shown in the following figure, a hot update operation is performed on the container in blue, and no hot update operation is performed on the container in orange. After we update the version of the Game Script container from V1 to V2, the whole pod is not recreated, the container in orange is not affected, and Game Engine runs normally and smoothly.
## GameServer & Container
Before introducing the hot update method, we may need to clarify the relationship between game servers and containers.
In the concept of OKG, a game server (GameServer) can contain multiple containers, each container serving a different function and corresponding to different container images.
Of course, a game server can also contain only one container.
Whether a game server contains one or multiple containers corresponds to two different architectural concepts.
For game servers with only one container, they are closer to the management approach of virtual machines.
Whether it is state management or hot updates of small versions, they do not rely on the capabilities of Kubernetes, but follow the traditional management approach.
For example, in the single container of a game server, there may be multiple processes, scripts or configuration files.
The game engine's resident process is usually implemented by building a new container for new version releases, and updates to new scripts, resources, or configurations often rely on object storage volumes or dynamic pulling of self-developed programs.
And the update situation is judged by the business itself, and the entire process is carried out in a non-cloud-native manner.
We call this type of game server a rich container. The problem with hot updates of rich containers is that:
- It is impossible to perform cloud-native version management on scripts/resources/configuration files. Since the container image has not changed, the version of the script files running in the current container is unknown to the operations personnel. After the game is launched, the iteration of small versions is very frequent. When a fault occurs, a system without version management will be difficult to locate the problem, which greatly increases the complexity of operations.
- It is difficult to locate the update status. Even if the files in the container have been updated and replaced, it is difficult to determine whether the current hot update file has been mounted when executing the reload command. The maintenance of the success or failure of this update status needs to be managed by the operations personnel, which also increases the complexity of operations to a certain extent.
- It is impossible to perform gradual upgrades. When updating, in order to control the scope of impact, it is often necessary to update the game servers with low importance first, and then gradually update the remaining game servers after confirmation. However, it is difficult to achieve gradual release, whether it is through object storage mounts or program pulling. Once there is a problem with a full release, the impact will be significant.
- When the container is abnormal, the pod rebuilds and pulls up the old version image, and the hot update file is not continuously saved.
For the hot update scenario of game servers, a more ideal approach is to use a multi-container game server architecture, where the hot update part is deployed as a sidecar container along with the main container in the same game server (GameServer), and the two share the hot update files through emptyDir. Only the sidecar container needs to be updated during updates. In this way, the hot update of game servers will be carried out in a cloud-native manner:
- The sidecar container image has version attributes, solving the version management problem.
- After the Kubernetes container update is successful, it is in the Ready state and can perceive whether the sidecar update is successful.
- OKG provides various update strategies, and the release objects can be controlled according to the release requirements to complete the gradual release.
- Even if the container is abnormally restarted, the hot update file is continuously saved along with the solidification of the image.
## Hot updates of game servers based on in-place update
### In-place update
In standard Kubernetes, application updates are implemented by changing the Image field in the resource object.
However, in the native workload managed by Deployment or StatefulSet, the pod will be rebuilt after updating the Image, and the lifecycle of the pod is coupled with the lifecycle of the container.
The multi-container architecture of game server hot updates mentioned earlier becomes a joke under the native workload of Kubernetes.
OKG's GameServerSet provides the ability of in-place upgrade, which can update a specific container without recreating the entire game server while ensuring that the entire game server lifecycle remains unchanged.
During the update process of the sidecar container, the game server runs normally, and the players will not be affected.
As shown in the figure below, the blue part is the hot update part, and the orange part is the non-hot update part.
After we update the Game Script container from version V1 to version V2, the entire pod will not be rebuilt, and the orange part will not be affected.
The Game Engine runs smoothly and normally.
![hot-update.png](../../images/hot-update.png)
## Example
### Example Usage
In this article, we will use the 2048 web version as an example.
In the example, we will see how to update game scripts without affecting the lifecycle of the game server.
Deploy the game server with a sidecar container using GameServerSet as the game server workload, and set:
- Choose in-place upgrade as the pod update strategy
- Use AlibabaCloud-SLB network model to expose services
- There are two containers, where app-2048 is the main container that carries the main game logic, and the sidecar is the companion container that stores the hot update files. The two containers share a file directory through emptyDir.
- When the sidecar starts, it synchronizes the files in the directory that stores the hot update files (/app/js) to the shared directory (/app/scripts), and sleeps without exiting after synchronization.
- The app-2048 container uses the game scripts under the /var/www/html/js directory.
Deploy a game server with sidecar containers. In this step, use GameServerSet as the workload of the game server, and set podUpdatePolicy to InPlaceIfPossible.
```bash
cat <<EOF | kubectl apply -f -
apiVersion: game.kruise.io/v1alpha1
kind: GameServerSet
metadata:
name: minecraft
name: gss-2048
namespace: default
spec:
replicas: 3
replicas: 1
updateStrategy:
rollingUpdate:
podUpdatePolicy: InPlaceIfPossible
network:
networkType: AlibabaCloud-SLB
networkConf:
- name: SlbIds
value: lb-bp1oqahx3jnr7j3f6vyp8
- name: PortProtocols
value: 80/TCP
gameServerTemplate:
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/acs/minecraft-demo:1.12.2
name: minecraft
- image: registry.cn-hangzhou.aliyuncs.com/gs-demo/sidecar:v0.1
- image: registry.cn-beijing.aliyuncs.com/acs/2048:v1.0
name: app-2048
volumeMounts:
- name: shared-dir
mountPath: /var/www/html/js
- image: registry.cn-beijing.aliyuncs.com/acs/2048-sidecar:v1.0
name: sidecar
args:
- bash
- -c
- rsync -aP /app/js/* /app/scripts/ && while true; do echo 11;sleep 2; done
volumeMounts:
- name: shared-dir
mountPath: /app/scripts
volumes:
- name: shared-dir
emptyDir: {}
EOF
```
Three game servers and three corresponding pods are created:
Create one GameServer and its corresponding Pod.
```bash
kubectl get gs
NAME STATE OPSSTATE DP UP
minecraft-0 Ready None 0 0
minecraft-1 Ready None 0 0
minecraft-2 Ready None 0 0
NAME STATE OPSSTATE DP UP AGE
gss-2048-0 Ready None 0 0 13s
kubectl get pod
NAME READY STATUS RESTARTS AGE
minecraft-0 2/2 Running 0 13s
minecraft-1 2/2 Running 0 13s
minecraft-2 2/2 Running 0 13s
NAME READY STATUS RESTARTS AGE
gss-2048-0 2/2 Running 0 13s
```
When a hot update is required, we want to update the sidecar containers without affecting the lifecycle of the whole pod. In this case, we need only to update the container image version corresponding to GameServerSet.
At this point, access the game webpage (the game server network-related content can refer to the network model documentation), and when the game ends, the `Game over!` message will be displayed.
![2048-v1.png](../../images/2048-v1.png)
Next, we want to update the game server script to change the display message when the game ends to `*_* Game over!`
Modify the corresponding script file html_actuator.js, and build a new sidecar image, naming the image tag as v2.0. (In actual production, this process can be completed through the CI process.)
After building & pushing the image, only need to update the container image tag corresponding to the GameServerSet.
```bash
kubectl edit gss minecraft
kubectl edit gss gss-2048
...
- image: registry.cn-hangzhou.aliyuncs.com/gs-demo/sidecar:v0.2
- image: registry.cn-beijing.aliyuncs.com/acs/2048-sidecar:v2.0
name: sidecar
...
```
Wait for a period of time, then you can find that the three pods are all updated, the values of RESTARTS are all changed to 1, but the values of Age have increased. The hot update of the game server is complete.
After a period of time, it was noticed that the gs has changed from Updating to Ready, the Pod has been updated, the restarts count has changed to 1, but the Age has not decreased.
```bash
kubectl get pod
NAME READY STATUS RESTARTS AGE
minecraft-0 2/2 Running 1 (33s ago) 8m55s
minecraft-1 2/2 Running 1 (37s ago) 8m54s
minecraft-2 2/2 Running 1 (49s ago) 8m54s
NAME READY STATUS RESTARTS AGE
gss-2048-0 2/2 Running 1 (33s ago) 8m55s
```
At this point, execute the reload command on the app-2048 container.
```bash
kubectl exec gss-2048-0 -c app-2048 -- /usr/sbin/nginx -s reload
```
Open an incognito browser, play the game, and the updated message will be displayed when the game ends.
![2048-v2.png](../../images/2048-v2.png)
### Reload methods after file hot update
In the example above, the exec command was used to reload a single pod.
However, when managing in batches, the reload operation becomes too cumbersome and complex.
Below are a few file hot reload methods for reference.
#### Manual batch reload
When all game servers are updated to Ready, use the batch management tool `kubectl-pexec` to execute the exec reload command in the container in batches.
#### Track the hot update file directory through inotify
inotify is a Linux file monitoring system framework. Through inotify, the main game server business container can listen for changes in files in the hot update file directory, triggering an update.
To use inotify, you need to install inotify-tools in the container:
```bash
apt-get install inotify-tools
```
Taking the 2048 game as an example, on top of the original image, the app-2048 container listens to the /var/www/html/js/ directory, and automatically executes the reload command when it detects changes in the file.
The script is shown below and can be executed when the container starts.
It is worth noting that the reload command should be idempotent.
```shell
inotifywait -mrq --timefmt '%d/%m/%y %H:%M' --format '%T %w%f%e' -e modify,delete,create,attrib /var/www/html/js/ | while read file
do
/usr/sbin/nginx -s reload
echo "reload successfully"
done
```
The above program is integrated into the image and a new image `registry.cn-beijing.aliyuncs.com/acs/2048:v1.0-inotify` is built.
In the subsequent experiment (with other fields unchanged), it can be observed that the entire hot update process no longer requires manual input of the reload command after the sidecar image is replaced with v2.0.
The complete YAML is as follows:
```yaml
kind: GameServerSet
metadata:
name: gss-2048
namespace: default
spec:
replicas: 1
updateStrategy:
rollingUpdate:
podUpdatePolicy: InPlaceIfPossible
network:
networkType: AlibabaCloud-SLB
networkConf:
- name: SlbIds
value: lb-bp1oqahx3jnr7j3f6vyp8
- name: PortProtocols
value: 80/TCP
gameServerTemplate:
spec:
containers:
- image: registry.cn-beijing.aliyuncs.com/acs/2048:v1.0-inotify
name: app-2048
volumeMounts:
- name: shared-dir
mountPath: /var/www/html/js
- image: registry.cn-beijing.aliyuncs.com/acs/2048-sidecar:v1.0 #Replace with v2.0 during hot update
name: sidecar
args:
- bash
- -c
- rsync -aP /app/js/* /app/scripts/ && while true; do echo 11;sleep 2; done
volumeMounts:
- name: shared-dir
mountPath: /app/scripts
volumes:
- name: shared-dir
emptyDir: {}
```
### Triggering HTTP requests in the sidecar
The main game server business container exposes an HTTP interface, and the sidecar sends a reload request to 127.0.0.1 after it successfully starts.
Due to the fact that the containers in the pod share the same network namespace, the main container will perform file reloads upon receiving the request.
Taking the 2048 game as an example, on top of the original image:
- The app-2048 container adds a reload interface. Below is an example of the JS code:
```js
var http = require('http');
var exec = require('child_process').exec;
var server = http.createServer(function(req, res) {
if (req.url === '/reload') {
exec('/usr/sbin/nginx -s reload', function(error, stdout, stderr) {
if (error) {
console.error('exec error: ' + error);
res.statusCode = 500;
res.end('Error: ' + error.message);
return;
}
console.log('stdout: ' + stdout);
console.error('stderr: ' + stderr);
res.statusCode = 200;
res.end();
});
} else {
res.statusCode = 404;
res.end('Not found');
}
});
server.listen(3000, function() {
console.log('Server is running on port 3000');
});
```
- At the same time, the sidecar container adds a request script request.sh. After the container is started, the postStart hook is used to add the command to send the request, as shown below:
```yaml
...
name: sidecar
lifecycle:
postStart:
exec:
command:
- bash
- -c
- ./request.sh
...
```
The corresponding request.sh script is shown below, which has a retry mechanism and will exit only after confirming that the reload was successful.
```shell
#!/bin/bash
# 循环发送 HTTP 请求,直到服务器返回成功响应为止
while true; do
response=$(curl -s -w "%{http_code}" http://localhost:3000/reload)
if [[ $response -eq 200 ]]; then
echo "Server reloaded successfully!"
break
else
echo "Server reload failed, response code: $response"
fi
sleep 1
done
```
In this way, automatic reload can be achieved after file updates.
The above program is integrated into the image and a new image is built as follows:
- `registry.cn-beijing.aliyuncs.com/acs/2048:v1.0-http`
- `registry.cn-beijing.aliyuncs.com/acs/2048-sidecar:v1.0-http`
- `registry.cn-beijing.aliyuncs.com/acs/2048-sidecar:v2.0-http`
Replace the new image and run the experiment again (note that the sidecar in the YAML needs to add the lifecycle field).
After replacing the v1.0-http sidecar image with v2.0-http, it can be observed that the entire hot update process no longer requires manual input of the reload command.
The complete YAML is as follows:
```yaml
kind: GameServerSet
metadata:
name: gss-2048
namespace: default
spec:
replicas: 1
updateStrategy:
rollingUpdate:
podUpdatePolicy: InPlaceIfPossible
network:
networkType: AlibabaCloud-SLB
networkConf:
- name: SlbIds
value: lb-bp1oqahx3jnr7j3f6vyp8
- name: PortProtocols
value: 80/TCP
gameServerTemplate:
spec:
containers:
- image: registry.cn-beijing.aliyuncs.com/acs/2048:v1.0-http
name: app-2048
volumeMounts:
- name: shared-dir
mountPath: /var/www/html/js
- image: registry.cn-beijing.aliyuncs.com/acs/2048-sidecar:v1.0-http #Replace with v2.0-http during hot update
name: sidecar
lifecycle:
postStart:
exec:
command:
- bash
- -c
- ./request.sh
args:
- bash
- -c
- rsync -aP /app/js/* /app/scripts/ && while true; do echo 11;sleep 2; done
volumeMounts:
- name: shared-dir
mountPath: /app/scripts
volumes:
- name: shared-dir
emptyDir: {}
```
#### Fully managed hot reload
OKG has the ability to trigger the execution of commands in containers.
Based on this feature, OKG can provide fully automated hot update capabilities, allowing users to no longer overly concern themselves with hot reload issues.
If you have such requirements, you can submit an issue on GitHub and discuss the OKG hot reload feature development roadmap with community developers.
### In-place hot update during server downtime
In a gaming scenario, hot update in a narrow sense refers to updates that do not affect players' normal gameplay without stopping the server.
However, in some scenarios, game server downtime updates also require in-place upgrade capabilities.
#### Network metadata remains unchanged
The stateful nature of game servers is often reflected in network information.
Since each game server is unique and cannot use the concept of k8s service load balancing, game developers often implement routing and distribution mechanisms based on IP.
In this case, when updating the game, we need to avoid changes in the game server IP information.
OKG's in-place upgrade capability can meet the above requirements.
#### Shared memory is not lost
After the game server is created, it is scheduled to a certain host, and the game business uses shared memory to reduce data write latency, so that the game server adds a layer of cache locally.
During game server updates, although there may be a short service interruption, due to the existence of the cache, the game server has a fast termination and startup speed, and the downtime is greatly reduced.
The implementation of shared memory also depends on OKG's in-place upgrade capabilities to ensure that the corresponding cache data is not lost.

BIN
docs/images/2048-v1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

BIN
docs/images/2048-v2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

View File

@ -1,71 +1,343 @@
## 功能概述
在游戏场景下游戏服脚本、场景资源等属于热更文件时常以sidecar的形式部署在pod中。
在更新这些文件时,我们往往希望不影响主程序(游戏服引擎侧)的正常运行。
然而在原生Kubernetes集群更新pod中任意容器都会导致pod重建无法满足游戏热更场景。
游戏服更新是游戏服应用交付中尤为重要的一环。作为有状态类型业务游戏服的更新往往对云原生基础设施有着更高的要求。本文主要介绍如何利用OKG的原地升级能力实现游戏服热更新。
## 游戏服与容器
在介绍热更方法之前或许我们需要先明确游戏服与容器的关系。在OKG的概念里一个游戏服(GameServer)中可以包含多个容器,每个容器功能作用不尽相同,各自对应不同的容器镜像。当然,一个游戏服也可以只包含一个容器。游戏服包含一个容器、还是包含多个容器对应着两种不同的架构思想。
单容器的游戏服更贴近虚拟机的运维管理方式。无论是状态的管理、或者小版本的热更都不借助Kubernetes的能力沿用过去的运维方式进行。比如游戏服的单容器中存在多个进程多个脚本文件或配置文件游戏服引擎常驻进程通常会通过构建新的容器进行实现新版发布而新的脚本、资源、或配置的更新往往依赖对象存储的挂载、或是自研程序的动态拉取。并且更新的情况由业务自行判断整个过程以非云原生的方式进行。在业内我们称这种游戏服为富容器。富容器热更新的问题在于
- 无法对脚本/资源/配置文件进行云原生化的版本管理。由于容器镜像并没有发生变化,运维人员对当前容器中运行的脚本文件等版本不得而知。游戏上线后小版本的迭代十分频繁,当故障出现时,没有版本管理的系统将难以定位问题,这很大程度上提高了运维复杂度。
- 更新状态难以定位。即使对容器中的文件进行了更新替换,但执行重载命令时难以确定当前热更文件是否已经挂载完毕,这种更新成功与否的状态维护需要交给运维者额外管理,也一定程度上提高了运维复杂度。
- 无法灰度升级。在更新时,为了控制影响面,往往需要先更新低重要性的游戏服,确认无误后再灰度其余游戏服。但无论是对象存储挂载的方式还是程序拉取的方式很难做到灰度发布。一旦全量发布出现问题,故障影响面是非常大的。
- 在容器异常时pod重建拉起旧版本的镜像热更文件并未能持续化保留。
针对游戏服热更场景更理想的做法是使用多容器的游戏服架构将热更的部分作为sidecar容器与main容器一同部署在同一个游戏服(GameServer)中二者通过emptyDir共享热更文件。更新时只需更新sidecar容器即可。这样一来游戏服的热更将以云原生的方式进行
- sidecar容器镜像具有版本属性解决了版本管理问题。
- Kubernetes容器更新成功后处于Ready状态能够感知sidecar更新是否成功。
- OKG提供多种更新策略可按照发布需求自行控制发布对象完成灰度发布。
- 即使容器异常发生重启,热更文件随着镜像的固化而持续化保留了下来。
## 基于原地升级的游戏服热更新
### 原地升级
在标准的Kubernetes中应用的更新是通过更改资源对象中Image字段实现的。但原生的workload如Deployment或StatefulSet管理的pod在更新了Image之后会出现重建的情况pod的生命周期与容器的生命周期耦合在一起上文提到的多容器架构的游戏服热更新在原生Kubernetes的workload下变成了无稽之谈。
OKG的GameServerSet提供了一种原地升级的能力在保证整个游戏服生命周期不变的情况下定向更新其中某一个容器不会导致游戏服重新创建。sidecar容器更新过程游戏服正常运行玩家不会收到任何影响。
OKG 提供的原地升级能力可以针对性定向更新pod中某一个容器不影响整个pod的生命周期。
如下图所示蓝色部分为热更部分橘色部分为非热更部分。我们将Game Script容器从版本V1更新至版本V2后整个pod不会重建橘色部分不受到任何影响Game Engine正常平稳运行
![hot-update.png](../../images/hot-update.png)
## 使用示例
部署带有sidecar容器的游戏服使用GameServerSet作为游戏服负载pod更新策略选择原地升级
### 使用示例
本文使用2048网页版作为示例。在示例中我们将看到如何在不影响游戏服生命周期的前提条件下更新游戏脚本。
部署带有sidecar容器的游戏服使用GameServerSet作为游戏服负载设置
- pod更新策略选择原地升级
- 使用AlibabaCloud-SLB网络模型暴露服务
- 两个容器其中app-2048为主容器承载主要游戏逻辑sidecar为伴生容器存放热更文件。二者通过emptyDir共享文件目录
- sidecar启动时将存放热更文件的目录下文件/app/js同步至共享目录下/app/scripts同步后sleep不退出
- app-2048容器使用/var/www/html/js目录下的游戏脚本
```bash
cat <<EOF | kubectl apply -f -
apiVersion: game.kruise.io/v1alpha1
kind: GameServerSet
metadata:
name: minecraft
name: gss-2048
namespace: default
spec:
replicas: 3
replicas: 1
updateStrategy:
rollingUpdate:
podUpdatePolicy: InPlaceIfPossible
network:
networkType: AlibabaCloud-SLB
networkConf:
- name: SlbIds
value: lb-bp1oqahx3jnr7j3f6vyp8
- name: PortProtocols
value: 80/TCP
gameServerTemplate:
spec:
containers:
- image: registry.cn-hangzhou.aliyuncs.com/acs/minecraft-demo:1.12.2
name: minecraft
- image: registry.cn-hangzhou.aliyuncs.com/gs-demo/sidecar:v0.1
- image: registry.cn-beijing.aliyuncs.com/acs/2048:v1.0
name: app-2048
volumeMounts:
- name: shared-dir
mountPath: /var/www/html/js
- image: registry.cn-beijing.aliyuncs.com/acs/2048-sidecar:v1.0
name: sidecar
args:
- bash
- -c
- rsync -aP /app/js/* /app/scripts/ && while true; do echo 11;sleep 2; done
volumeMounts:
- name: shared-dir
mountPath: /app/scripts
volumes:
- name: shared-dir
emptyDir: {}
EOF
```
生成3个GameServer以及对应的3个Pod
生成1个GameServer以及对应的1个Pod
```bash
kubectl get gs
NAME STATE OPSSTATE DP UP
minecraft-0 Ready None 0 0
minecraft-1 Ready None 0 0
minecraft-2 Ready None 0 0
NAME STATE OPSSTATE DP UP AGE
gss-2048-0 Ready None 0 0 13s
kubectl get pod
NAME READY STATUS RESTARTS AGE
minecraft-0 2/2 Running 0 13s
minecraft-1 2/2 Running 0 13s
minecraft-2 2/2 Running 0 13s
NAME READY STATUS RESTARTS AGE
gss-2048-0 2/2 Running 0 13s
```
当产生热更需求我们希望只更新sidecar容器而不影响整个pod的生命周期此时只需更新GameServerSet对应的容器镜像版本即可
此时访问游戏网页(游戏服网络相关内容可参考网络模型文档),游戏结束时显示`Game over!`字样:
![2048-v1.png](../../images/2048-v1.png)
接下来,我们希望更新游戏服脚本,将游戏结束时的显示字样变为 `*_* Game over!`
修改对应脚本文件html_actuator.js并构建新的sidecar镜像将镜像tag命名为v2.0。在实际生产中这一过程可通过CI流程完成
镜像更新后只需更新GameServerSet对应的容器镜像版本即可
```bash
kubectl edit gss minecraft
kubectl edit gss gss-2048
...
- image: registry.cn-hangzhou.aliyuncs.com/gs-demo/sidecar:v0.2
- image: registry.cn-beijing.aliyuncs.com/acs/2048-sidecar:v2.0
name: sidecar
...
```
一段时间过后发现Pod已经更新完毕restarts次数变为1但Age并没有减少。游戏服完成了热更新:
一段时间过后,发现gs已从Updating变为ReadyPod已经更新完毕restarts次数变为1但Age并没有减少。
```bash
kubectl get pod
NAME READY STATUS RESTARTS AGE
minecraft-0 2/2 Running 1 (33s ago) 8m55s
minecraft-1 2/2 Running 1 (37s ago) 8m54s
minecraft-2 2/2 Running 1 (49s ago) 8m54s
NAME READY STATUS RESTARTS AGE
gss-2048-0 2/2 Running 1 (33s ago) 8m55s
```
此时对app-2048容器执行重载命令
```bash
kubectl exec gss-2048-0 -c app-2048 -- /usr/sbin/nginx -s reload
```
打开无痕浏览器,进行游戏,游戏结束时提示字样已更新:
![2048-v2.png](../../images/2048-v2.png)
### 文件热更后的重载方式
在上面的示例中对单个pod使用exec执行命令的方式重载。
而在批量管理时,重载操作太过繁琐复杂。下面提供了几种文件热更后的重载方式,以供参考。
#### 手动批量重载
当全部游戏服更新Ready后可借助批量管理工具kubectl-pexec批量在容器中执行exec重载命令。完成游戏服热重载。
#### 通过inotify跟踪热更文件目录
inotify是Linux文件监控系统框架。通过inotify主游戏服业务容器可以监听热更文件目录下文件的变化进而触发更新。
使用inotify需要在容器中安装inotify-tools:
```bash
apt-get install inotify-tools
```
以上述2048游戏为例在原镜像基础之上app-2048容器监听 /var/www/html/js/ 目录,当发现文件变化时自动执行重载命令。脚本如下所示,在容器启动时执行即可。值得注意的是重载命令应为幂等的。
```shell
inotifywait -mrq --timefmt '%d/%m/%y %H:%M' --format '%T %w%f%e' -e modify,delete,create,attrib /var/www/html/js/ | while read file
do
/usr/sbin/nginx -s reload
echo "reload successfully"
done
```
将上述程序固化至镜像中,构建出新的镜像`registry.cn-beijing.aliyuncs.com/acs/2048:v1.0-inotify`再次实验其他字段不变将sidecar镜像从v1.0替换到v2.0后,会发现已经不需要手动输入重载命令已完成全部热更过程。
完整的yaml如下
```yaml
kind: GameServerSet
metadata:
name: gss-2048
namespace: default
spec:
replicas: 1
updateStrategy:
rollingUpdate:
podUpdatePolicy: InPlaceIfPossible
network:
networkType: AlibabaCloud-SLB
networkConf:
- name: SlbIds
value: lb-bp1oqahx3jnr7j3f6vyp8
- name: PortProtocols
value: 80/TCP
gameServerTemplate:
spec:
containers:
- image: registry.cn-beijing.aliyuncs.com/acs/2048:v1.0-inotify
name: app-2048
volumeMounts:
- name: shared-dir
mountPath: /var/www/html/js
- image: registry.cn-beijing.aliyuncs.com/acs/2048-sidecar:v1.0 #热更时替换成v2.0
name: sidecar
args:
- bash
- -c
- rsync -aP /app/js/* /app/scripts/ && while true; do echo 11;sleep 2; done
volumeMounts:
- name: shared-dir
mountPath: /app/scripts
volumes:
- name: shared-dir
emptyDir: {}
```
#### sidecar触发http请求
主游戏服业务容器暴露一个http接口sidecar在启动成功后向本地127.0.0.1发送重载请求由于pod下容器共享网络命名空间主容器接收到请求后进行文件重载。
以上述2048游戏为例在原镜像基础之上
- app-2048容器新增reload接口以下是js代码示例
```js
var http = require('http');
var exec = require('child_process').exec;
var server = http.createServer(function(req, res) {
if (req.url === '/reload') {
exec('/usr/sbin/nginx -s reload', function(error, stdout, stderr) {
if (error) {
console.error('exec error: ' + error);
res.statusCode = 500;
res.end('Error: ' + error.message);
return;
}
console.log('stdout: ' + stdout);
console.error('stderr: ' + stderr);
res.statusCode = 200;
res.end();
});
} else {
res.statusCode = 404;
res.end('Not found');
}
});
server.listen(3000, function() {
console.log('Server is running on port 3000');
});
```
- 同时sidecar容器新增请求脚本request.sh容器启动后利用postStart增加发送请求命令如下所示
```yaml
...
name: sidecar
lifecycle:
postStart:
exec:
command:
- bash
- -c
- ./request.sh
...
```
对应request.sh脚本如下所示具有重试机制确认重载成功再退出
```shell
#!/bin/bash
# 循环发送 HTTP 请求,直到服务器返回成功响应为止
while true; do
response=$(curl -s -w "%{http_code}" http://localhost:3000/reload)
if [[ $response -eq 200 ]]; then
echo "Server reloaded successfully!"
break
else
echo "Server reload failed, response code: $response"
fi
sleep 1
done
```
这样一来,在文件更新后也可完成自动重载。
将上述程序固化至镜像中,构建出以下新的镜像:
- `registry.cn-beijing.aliyuncs.com/acs/2048:v1.0-http`
- `registry.cn-beijing.aliyuncs.com/acs/2048-sidecar:v1.0-http`
- `registry.cn-beijing.aliyuncs.com/acs/2048-sidecar:v2.0-http`
替换新镜像再次实验注意yaml中sidecar需要增加lifecycle字段。将sidecar镜像从v1.0-http替换到v2.0-http后会发现已经不需要手动输入重载命令已完成全部热更过程。
完整的yaml如下:
```yaml
kind: GameServerSet
metadata:
name: gss-2048
namespace: default
spec:
replicas: 1
updateStrategy:
rollingUpdate:
podUpdatePolicy: InPlaceIfPossible
network:
networkType: AlibabaCloud-SLB
networkConf:
- name: SlbIds
value: lb-bp1oqahx3jnr7j3f6vyp8
- name: PortProtocols
value: 80/TCP
gameServerTemplate:
spec:
containers:
- image: registry.cn-beijing.aliyuncs.com/acs/2048:v1.0-http
name: app-2048
volumeMounts:
- name: shared-dir
mountPath: /var/www/html/js
- image: registry.cn-beijing.aliyuncs.com/acs/2048-sidecar:v1.0-http #热更时替换成v2.0-http
name: sidecar
lifecycle:
postStart:
exec:
command:
- bash
- -c
- ./request.sh
args:
- bash
- -c
- rsync -aP /app/js/* /app/scripts/ && while true; do echo 11;sleep 2; done
volumeMounts:
- name: shared-dir
mountPath: /app/scripts
volumes:
- name: shared-dir
emptyDir: {}
```
#### 全托管的热更重载
OKG具备触发容器中执行命令的能力基于该功能OKG可提供全自动化的热更新能力让用户不再过度关心热更重载问题。如若您有这方面需求可以在GitHub提交issue和社区开发者一起讨论OKG热更功能演进路线。
### 停服原地热更
游戏场景下狭义上的热更是指不影响玩家正常游戏的不停服更新。然而在有些场景下,游戏服停服更新也需要依赖原地升级能力。
#### 网络元数据不变
游戏服的有状态特性时常体现在网络信息上。由于每个游戏服都是独特的无法使用k8s svc负载均衡的概念往往游戏开发者会基于IP实现路由分发机制这时我们需要在游戏更新时避免游戏服IP信息变化。OKG的原地升级能力能够满足上述需求。
#### 共享内存不丢失
游戏服创建后调度到某宿主机上游戏业务利用共享内存降低数据落盘延迟这样一来相当于游戏服在本地增加了一层缓存。在游戏服更新时即时出现短暂的服务暂停时间但由于缓存的存在游戏服的终止以及启动速度较快停服时间也会大大减少。共享内存的实现也依赖于OKG的原地升级能力保证对应缓存数据不会丢失。