This module provides proactive health detect for back-end node, the back-end node can be Nginx upstream servers (support http upstream && stream upstream) which added when parsing upstream config or added by dynamic restful APIs
(English language see here)
该模块可以提供主动式后端节点健康检查的功能,后端节点可以是Nginx upstream负载节点,在解析upstream配置时自动注册,保证新的请求直接发送到一个健康的后端节点,也可以通过Restful API动态注册后端节点,以便实时查看节点健康状态
该模块可以提供主动式后端节点健康检查的功能
动态
增加/删除后端节点,修改后端节点探测策略以及查询后端节点状态git clone https://github.com/nginx/nginx.git
git clone https://github.com/alexzzh/ngx_health_detect_module.git
cd nginx/;
git checkout branches/stable-x.x.x
//apply patch or adjust nginx code according to the patch file
git apply ../ngx_health_detect_module/patch/nginx_healthdetect_for_nginx_x.xx+.patch
auto/configure --with-stream --add-module=../ngx_health_detect_module
make && make install
如果patch文件夹下没有对应nginx版本patch或者需要基于定制化版本的nginx制作patch,可以通过下述步骤快速制作:
1 git clone https://github.com/nginx/nginx.git or customized nginx repo
2 cd nginx or customized nginx directory
3 git checkout branches/stable-x.y.z(目标版本)
4 adjust nginx source code according to other version patch, eg : nginx_healthdetect_for_nginx_1.26+.patch
5 git diff * > nginx_healthdetect_for_nginx_x.y+.patch
6 upload this patch to this repo if you want
nginx.conf 样例
user root;
worker_processes 4;
error_log logs/error.log info;
#pid logs/nginx.pid;
events {
worker_connections 32768;
}
http {
health_detect_shm_size 10m; #指定用于存放后端节点探测策略以及健康状态的共享内存大小
health_detect_max_history_status_count 5; #指定单个后端节点历史状态变化的次数
server {
listen 641;
server_name localhost;
location /http_api {
root html;
index index.html index.htm;
health_detect_dynamic_api check_only=false; #提供http模块的restful API接口
}
location /tcp_api {
root html;
index index.html index.htm;
stream_health_detect_dynamic_api check_only=false; #提供stream模块的restful API接口
}
location /build-in {
proxy_pass http://httpbackend;
}
}
upstream httpbackend {
server 1.1.1.1:11111 max_fails=0 fail_timeout=20s;
# 指定当前upstream启用该模块以及自动注册的节点策略各字段值
health_detect_check type=http alert_method=syslog rise=2 fall=3 interval=1000 timeout=5000 keepalive=true keepalive_time=500000;
# 当探测类型为http时,指定期望后端响应的http响应码
health_detect_http_expect_alive http_2xx http_3xx;
# 当探测类型为http时,指定发送的http请求时开启keep-alive,注意当"health_detect_check"指令的keepalive字段为true时使能keep-alive才有意义
health_detect_http_send "GET / HTTP/1.0\r\nConnection: keep-alive\r\n\r\n";
}
}
stream {
health_detect_shm_size 10m; #指定用于存放后端节点探测策略以及健康状态的共享内存大小
health_detect_max_history_status_count 10; #指定单个后端节点历史状态变化的次数
server {
listen 642 ;
proxy_pass tcpbackend;
}
upstream tcpbackend {
server 2.2.2.2:22222 max_fails=0 fail_timeout=20s;
# 指定当前upstream启用该模块以及自动注册的节点策略各字段值
health_detect_check type=tcp alert_method=syslog rise=2 fall=3 interval=1000 timeout=5000 keepalive=true keepalive_time=500000;
}
}
语法
{"type":"tcp|http","peer_addr":"ip:port","send_content":"xxx","alert_method":"log|syslog","expect_response_status":"http_2xx|http_3xx|http_4xx|http_5xx","interval":milliseconds,"timeout":milliseconds , "keepalive": "true"|"false", "keepalive_time": milliseconds , "rise":count, "fall":count, "default_down": "true"|"false"}
只有"type" 和 "peer_addr"是
必选
字段,其他字段不指定时使用默认值
默认值
:
{"send_content":"","alert_method":"log","expect_response_status":"","interval":30000,"timeout":3000 , "keepalive": "false", "keepalive_time": 3600000 , "rise":1, "fall":2, "default_down":"false"}
{"send_content":"GET / HTTP/1.0\r\nConnection:close\r\n\r\n","alert_method":"log","expect_response_status":"http_2xx","interval":30000,"timeout":3000 , "keepalive": "true", "keepalive_time": 3600000 , "rise":1, "fall":2, "default_down":"false"}
详细参数
http keepalive长连接
,需指定发送内容为"GET / HTTP/1.0\r\nConnection:keep-alive\r\n\r\n"。推荐
使用短连接。send_content
指定使用http keepalive
时,需要设置长连接。不推荐
使用长连接。因为tcp长连接建立后,探活机制使用的是peek函数,此时即便防火墙会拦截请求包,peek仍然成功,直到超过keepalive_time
,在此期间探测状态可能有误,设置更短的"keepalive_time" 可以降低该问题带来的影响ip:port/http_api/control?cmd=add&name=node_name
ip:port/http_api/control?cmd=delete&name=node_name
ip:port/http_api/control?cmd=delete_all
ip:port/http_api/control?cmd=status&name=node_name[&format=json|html]
ip:port/http_api/control?cmd=status_all[&status=down|up][&format=json|html]
curl -X POST -H 'Content-Type: application/json' -d '{"type":"http","peer_addr":"10.0.229.100:34001","send_content":"GET / HTTP/1.0\r\nConnection:keep-alive\r\n\r\n","alert_method":"log","expect_response":"http_2xx","check_interval":5000,"check_timeout":3000, "need_keepalive": 1, "keepalive_time": 200000, "rise":1, "fall":2}' '10.0.229.99:641/http_api/control?cmd=add\&name=nginx4001'
add or update node success
curl -X DELETE '10.0.229.99:641/http_api/control?cmd=delete\&name=nginx4001'
delete node success
curl -X DELETE '10.0.229.99:641/http_api/control?cmd=delete_all'
delete all node success
curl http://10.0.229.99:641/http_api/control?cmd=status_all
{
"total": 151,
"up": 150,
"down": 1,
"max": 6000,
"items": [
{"name": "nginx81","addr": "10.0.229.100:30081","access_time": 2023/05/06 16:50:04, "status": "up"},
{"name": "nginx66","addr": "10.0.229.100:30066","access_time": 2023/05/06 16:50:04, "status": "up"},
{"name": "nginx85","addr": "10.0.229.100:30085","access_time": 2023/05/06 16:50:04, "status": "up"},
{"name": "nginx62","addr": "10.0.229.100:30062","access_time": 2023/05/06 16:50:04, "status": "up"},
{"name": "nginx37","addr": "10.0.229.100:30037","access_time": 2023/05/06 16:50:04, "status": "up"},
{"name": "nginx107","addr": "10.0.229.100:30107","access_time": 2023/05/06 16:50:01, "status": "down"},
{"name": "nginx103","addr": "10.0.229.100:30103","access_time": 2023/05/06 16:50:01, "status": "down"},
...
}
curl http://10.0.229.99:641/http_api/control?cmd=status_all&format=html
curl http://10.0.229.99:641/http_api/control?cmd=status\&name=nginx100
{"peer_name": "nginx100",
"type": "http",
"peer_addr": "10.0.229.100:30100",
"alert_method": "tcp",
"expect_response_status": "http_2xx ",
"check_interval": "5000",
"check_timeout": "3000",
"need_keepalive": "1",
"keepalive_time": "200000",
"rise": "1",
"fall": "2",
"send_content": "GET / HTTP/1.0 Connection:keep-alive ",
"access_time": "2023/05/06 16:54:27",
"latest_status": "up",
"max_status_count": "5",
"history_status": {
"current_status_count": "1",
"items": [
{"access_time": 2023/05/06 16:50:01, "status": "up",}
]
}}
curl http://10.0.229.99:641/http_api/control?cmd=status\&name=nginx100\&format=html
语法
:health_detect_dynamic_api check_only=false|true;
默认值
: health_detect_dynamic_api check_only=false
上下文
: http, server, location
指定是否开启动态restful api功能,如果check_only=false
,表示只支持通过api查询后端节点状态,当后端节点都来源于upstream配置文件时,一般设置为false,反之表示还可以通过api动态
增加/删除/修改后端节点以及修改节点探测策略
语法
: health_detect_shm_size size;
默认值
: health_detect_shm_size 10m
上下文
: http/main, stream/main
指定用于存放后端节点探测策略以及健康状态的共享大小
语法
: health_detect_max_history_status_count count
默认值
: health_detect_max_history_status_count 5
上下文
: http/main, stream/main
指定记录单个后端节点历史状态变化的次数,采用lru算法记录最近的count个变化以及对应时间戳
语法
: health_detect_check type=http|tcp [alert_method=log|syslog] [interval=milliseconds] [timeout=milliseconds] [rise=count] [fall=count] [default_down=true|false][keepalive=true|false] [keepalive_time=milliseconds];
默认值
: health_detect_check type=tcp alert_method=log interval=30000 timeout=5000 rise=1 fall=2 default_down=false keepalive=false keepalive_time=3600000;
上下文
: http/upstream, stream/upstream
通过在http或stream下的upstream配置块中添加该指令来开启对该upstream中的后端节点的健康检查,各字段解释同探测策略各字段解释
语法
: health_detect_http_expect_alive http_2xx|http_3xx|http_4xx|http_5xx;
默认值
: health_detect_http_expect_alive http_2xx|http_3xx
上下文
: http/upstream, stream/upstream
当探测类型为http时,指定期望后端响应的http响应码
语法
: health_detect_http_send xxx;
默认值
: health_detect_http_send "GET / HTTP/1.0\r\nConnection: close\r\n\r\n";
上下文
: http/upstream, stream/upstream
当探测类型为http时,指定发送的http请求时内容,比如开启keep-alive, 注意当"health_detect_check"指令的keepalive字段为true时使能keep-alive才有意义
cat /proc/cpuinfo
model name : Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
cat /proc/meminfo
MemTotal: 7924144 kB
MemFree: 3156588 kB
后端节点数量 | 探测类型 | 长/短连接 | 探测间隔(s) | 进程数 | CPU单核占比 | 内存占比 |
---|---|---|---|---|---|---|
8000 | tcp | 长连接 | 1 | 4 | 5% | 0.4% |
8000 | http | 长连接 | 1 | 4 | 10% | 0.8% |
8000 | tcp | 长连接 | 5 | 4 | 1%-2% | 0.4% |
8000 | http | 长连接 | 5 | 4 | 2%-7% | 0.8% |
8000 | tcp | 短连接 | 1 | 4 | 10% | 0.4% |
8000 | http | 短连接 | 1 | 4 | 20% | 0.8% |
8000 | tcp | 短连接 | 5 | 4 | 3%-5% | 0.4% |
8000 | http | 短连接 | 5 | 4 | 5% | 0.8% |
这个项目还在开发中完善中,欢迎贡献代码,或报告bug。一起使它变得更好。 有意愿一起开发完善的同学或者有疑问的可以联系我:
QQ
:122968309mail
: [email protected]
报告错误
提交你的修复补丁
This module is licensed under the BSD license.
Copyright (C) 2023, by Alex zhang [email protected]
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.