分布式限流实战：Nginx + Tomcat + Guava + Redis Lua

本文写于 2021 年 9 月——彼时 Sentinel 已经在阿里系大规模落地，但中小厂仍以"自研 + Redis Lua"为主流。

一、限流分层架构

限流不是"做一次就够"，需要从边缘到核心层层设防：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
                  ┌──────────┐
   用户请求 ───>  │   CDN    │  第 1 层：CDN 限流（按地区/ISP）
                  └────┬─────┘
                       ▼
                  ┌──────────┐
                  │  Nginx   │  第 2 层：边缘网关限流（IP / 域名）
                  └────┬─────┘
                       ▼
                  ┌──────────┐
                  │  Tomcat  │  第 3 层：Web 容器限流（maxThreads）
                  └────┬─────┘
                       ▼
                  ┌──────────┐
                  │  网关层  │  第 4 层：Spring Cloud Gateway / Sentinel
                  └────┬─────┘
                       ▼
                  ┌──────────┐
                  │ 单体应用 │  第 5 层：Guava RateLimiter
                  └────┬─────┘
                       ▼
                  ┌──────────┐
                  │ 分布式   │  第 6 层：Redis + Lua 全局限流
                  │  Redis   │
                  └──────────┘

核心思想：越靠近用户限流越粗（IP / 域名），越靠近业务限流越细（接口 / 用户）。

二、Nginx 限流（边缘网关层）

2.1 核心指令

Nginx 自带两个限流模块：

limit_req：限制请求速率（QPS）
limit_conn：限制并发连接数

2.2 完整配置

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# ===== 第一步：定义限流 zone（http 块）=====

# 根据 IP 限速：每秒 10 个请求
limit_req_zone $binary_remote_addr zone=iplimit:20m rate=10r/s;

# 根据 server 限速：每秒 1 个请求
limit_req_zone $server_name zone=serverlimit:10m rate=1r/s;

# 根据 IP 限并发：每个 IP 最多 100 个连接
limit_conn_zone $binary_remote_addr zone=perip:20m;

# 根据 server 限并发：每个 server 最多 100 个连接
limit_conn_zone $server_name zone=perserver:20m;

# ===== 第二步：在 location 应用 =====

server {
    server_name www.test.com;
    
    # 接口限流
    location /access-limit/ {
        proxy_pass http://127.0.0.1:8080/;
        
        # IP 限速（burst=2 允许瞬时 2 个排队，nodelay 不延迟立即处理）
        limit_req zone=iplimit burst=2 nodelay;
        # server 限速
        limit_req zone=serverlimit burst=2 nodelay;
        # IP 限并发 100
        limit_conn zone=perip 100;
        # server 限并发 100
        limit_conn zone=perserver 1;
        # 限流触发时返回 504（默认 503）
        limit_req_status 504;
        limit_conn_status 504;
    }
    
    # 限速下载
    location /download/ {
        # 前 100m 不限速
        limit_rate_after 100m;
        # 之后限速 256k/s
        limit_rate 256k;
    }
}

2.3 关键参数

burst=2：允许瞬时超出 rate 2 个请求排队等待
nodelay：排队不延迟，超出的请求立即返回 503（默认是延迟处理）
rate=10r/s：每秒 10 个请求（也支持 30r/m 每分钟 30 个）

三、Tomcat 限流（Web 容器层）

3.1 核心配置

Tomcat 8.5 通过 conf/server.xml 配置最大线程数：

1
2
3
4
5
6
<Connector port="8080"
           protocol="HTTP/1.1"
           connectionTimeout="20000"
           maxThreads="150"          <!-- 最大线程数 -->
           acceptCount="100"          <!-- 等待队列长度 -->
           redirectPort="8443" />

工作原理：

当并发请求 > maxThreads（默认 150）时，请求进入等待队列
等待队列满后，新请求被拒绝
天然限流

3.2 调优建议

小贴士：maxThreads 默认 150（Tomcat 8.5.42），但不是越大越好：

每开 1 个线程 → 1MB JVM 内存（线程栈）
线程越多 → GC 负担越重
OS 限制：Windows 每进程 ≤ 2000 线程，Linux 每进程 ≤ 1000 线程

经验值：8 核 16G 机器配 maxThreads=400 较合理，再多就是 GC 抖动的开始。

四、单体应用限流（Guava RateLimiter）

4.1 引入依赖

1
2
3
4
5
6
<!-- https://mvnrepository.com/artifact/com.google.guava/guava -->
<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>31.1-jre</version>
</dependency>

4.2 简单使用

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
@Slf4j
@RestController
@RequestMapping("/limit")
public class LimitController {
    
    // 限流策略：1 秒钟 2 个请求
    private final RateLimiter limiter = RateLimiter.create(2.0);
    
    @GetMapping("/test1")
    public String testLimiter() {
        // 500 毫秒内，没拿到令牌就降级
        boolean tryAcquire = limiter.tryAcquire(500, TimeUnit.MILLISECONDS);
        
        if (!tryAcquire) {
            log.warn("进入服务降级，时间{}", LocalDateTime.now());
            return "当前排队人数较多，请稍后再试！";
        }
        
        log.info("获取令牌成功，时间{}", LocalDateTime.now());
        return "请求成功";
    }
}

4.3 Guava 限流模式

模式	类	特点
平滑突发（SmoothBursty）	`RateLimiter.create(permitsPerSecond)`	启动即可满速率
平滑预热（SmoothWarmingUp）	`RateLimiter.create(permitsPerSecond, warmupPeriod, unit)`	启动后有预热期，逐步提到配置速率

4.4 自定义注解 + AOP（接口级限流）

注解：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.METHOD})
@Documented
public @interface Limit {
    String key() default "";                 // 资源 key
    double permitsPerSecond();               // QPS
    long timeout();                          // 最大等待时间
    TimeUnit timeunit() default TimeUnit.MILLISECONDS;
    String msg() default "系统繁忙,请稍后再试.";
}

AOP 切面：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
@Slf4j
@Aspect
@Component
public class LimitAop {
    
    private final Map<String, RateLimiter> limitMap = Maps.newConcurrentMap();
    
    @Around("@annotation(com.example.limit.Limit)")
    public Object around(ProceedingJoinPoint joinPoint) throws Throwable {
        MethodSignature signature = (MethodSignature) joinPoint.getSignature();
        Method method = signature.getMethod();
        Limit limit = method.getAnnotation(Limit.class);
        
        if (limit != null) {
            String key = limit.key();
            RateLimiter rateLimiter = limitMap.computeIfAbsent(
                key, k -> RateLimiter.create(limit.permitsPerSecond())
            );
            
            // 拿令牌
            boolean acquire = rateLimiter.tryAcquire(limit.timeout(), limit.timeunit());
            if (!acquire) {
                log.debug("令牌桶={}，获取令牌失败", key);
                this.responseFail(limit.msg());
                return null;
            }
        }
        return joinPoint.proceed();
    }
    
    private void responseFail(String msg) {
        HttpServletResponse response = ((ServletRequestAttributes) 
            RequestContextHolder.getRequestAttributes()).getResponse();
        ResultData<Object> resultData = ResultData.fail(ReturnCode.LIMIT_ERROR.getCode(), msg);
        WebUtils.writeJson(response, resultData);
    }
}

Controller：

1
2
3
4
5
6
7
@GetMapping("/test2")
@Limit(key = "limit2", permitsPerSecond = 1, timeout = 500, 
       timeunit = TimeUnit.MILLISECONDS, msg = "排队中，请稍候")
public String limit2() {
    log.info("limit2 获取令牌成功");
    return "ok";
}

4.5 IP 级限流

思路：每个 IP 一个 RateLimiter，缓存到 LoadingCache。

限流工具类：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
public class RateLimitUtil {
    /** 缓存：1000 个 IP，过期 1 天 */
    private static LoadingCache<String, RateLimiter> limitCaches = CacheBuilder.newBuilder()
            .maximumSize(1000)
            .expireAfterWrite(1, TimeUnit.DAYS)
            .build(new CacheLoader<String, RateLimiter>() {
                @Override
                public RateLimiter load(String key) {
                    double perSecond = RateLimitUtil.getCacheKeyPerSecond(key);
                    return RateLimiter.create(perSecond);
                }
            });
    
    /** 唯一 key = 资源名:ip:perSecond */
    public static String generateCacheKey(Method method, HttpServletRequest request) {
        RateLimit rateLimit = method.getAnnotation(RateLimit.class);
        StringBuilder cacheKey = new StringBuilder(rateLimit.name() + ":");
        cacheKey.append(getIpAddr(request) + ":");
        cacheKey.append(rateLimit.perSecond());
        return cacheKey.toString();
    }
    
    public static double getCacheKeyPerSecond(String cacheKey) {
        return Double.parseDouble(cacheKey.split(":")[2]);
    }
    
    public static String getIpAddr(HttpServletRequest request) {
        String ip = request.getHeader("x-forwarded-for");
        if (ip == null || ip.length() == 0 || "unknown".equalsIgnoreCase(ip)) {
            ip = request.getHeader("Proxy-Client-IP");
        }
        if (ip == null || ip.length() == 0 || "unknown".equalsIgnoreCase(ip)) {
            ip = request.getRemoteAddr();
        }
        // 多级代理取第一个 IP
        if (ip != null && ip.length() > 15) {
            if (ip.indexOf(",") > 0) {
                ip = ip.substring(0, ip.indexOf(","));
            }
        }
        return ip;
    }
}

五、分布式应用限流（Redis + Lua 令牌桶）

5.1 为什么需要 Redis 限流

Guava RateLimiter 是进程内的，集群部署时每个节点都有一份独立的限流器——比如 3 节点配 100 QPS，实际可能是 300 QPS。

Redis 限流 = 全集群共享一份限流计数。

5.2 令牌桶算法（Lua 实现）

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- KEYS[1]: 接口唯一 ID
-- ARGV[1]: max_token（桶容量）
-- ARGV[2]: token_rate（每秒产生令牌数）
-- ARGV[3]: current_time（当前时间戳，毫秒）

local ratelimit_info = redis.pcall('HMGET', KEYS[1], 'last_time', 'current_token')
local last_time = ratelimit_info[1]
local current_token = tonumber(ratelimit_info[2])
local max_token = tonumber(ARGV[1])
local token_rate = tonumber(ARGV[2])
local current_time = tonumber(ARGV[3])

if current_token == nil then
    current_token = max_token
    last_time = current_time
else
    local past_time = current_time - last_time
    
    if past_time > 1000 then
        current_token = current_token + token_rate
        last_time = current_time
    end
    
    -- 防止溢出
    if current_token > max_token then
        current_token = max_token
        last_time = current_time
    end
end

local result = 0
if (current_token > 0) then
    result = 1
    current_token = current_token - 1
    last_time = current_time
end

redis.call('HMSET', KEYS[1], 'last_time', last_time, 'current_token', current_token)
return result

关键点：

每个接口的 key（如 uri:/api/order/create）对应一个 Hash
Hash 存 last_time（上次请求时间）+ current_token（当前桶内令牌数）
每隔 1000ms 自动补充 token_rate 个令牌
拿不到令牌返回 0，拿到了返回 1 并减 1

5.3 Spring Boot 集成

Redis 序列化配置：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
@Bean(value = "redisTemplate")
@Primary
public RedisTemplate redisTemplate(RedisConnectionFactory factory) {
    RedisTemplate<String, Object> template = new RedisTemplate<>();
    template.setConnectionFactory(factory);
    
    ObjectMapper objectMapper = new ObjectMapper();
    objectMapper.setSerializationInclusion(JsonInclude.Include.NON_NULL);
    
    StringRedisSerializer stringRedisSerializer = new StringRedisSerializer();
    Jackson2JsonRedisSerializer jsonRedisSerializer = new Jackson2JsonRedisSerializer(Object.class);
    jsonRedisSerializer.setObjectMapper(objectMapper);
    
    template.setKeySerializer(stringRedisSerializer);
    template.setHashKeySerializer(stringRedisSerializer);
    template.setValueSerializer(jsonRedisSerializer);
    template.setHashValueSerializer(jsonRedisSerializer);
    return template;
}

限流工具类：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
public class RedisLimiterUtils {
    
    private static StringRedisTemplate stringRedisTemplate = 
        ApplicationContextUtils.applicationContext.getBean(StringRedisTemplate.class);
    
    /**
     * 拿令牌
     * @param key  请求 ID（如接口 URI）
     * @param max  桶容量
     * @param rate 每秒生成令牌数
     * @return true=拿到，false=限流
     */
    public static boolean tryAcquire(String key, int max, int rate) {
        List<String> keyList = new ArrayList<>(1);
        keyList.add(key);
        
        DefaultRedisScript<Long> script = new DefaultRedisScript<>();
        script.setResultType(Long.class);
        script.setScriptText(LUA_SCRIPT);
        
        Long result = stringRedisTemplate.execute(
            script, keyList,
            Integer.toString(max),
            Integer.toString(rate),
            Long.toString(System.currentTimeMillis())
        );
        
        return Long.valueOf(1).equals(result);
    }
}

5.4 注解 + 拦截器

注解：

1
2
3
4
5
6
7
@Inherited
@Target({ElementType.TYPE, ElementType.METHOD})
@Retention(RetentionPolicy.RUNTIME)
public @interface RateLimit {
    int capacity() default 100;   // 桶容量
    int rate() default 10;         // 每秒令牌数
}

拦截器：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
@Component
public class RateLimiterIntercept implements HandlerInterceptor {
    
    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response,
                             Object handler) throws Exception {
        if (handler instanceof HandlerMethod) {
            HandlerMethod handlerMethod = (HandlerMethod) handler;
            Method method = handlerMethod.getMethod();
            
            RateLimit rateLimit = AnnotationUtils.findAnnotation(method, RateLimit.class);
            if (Objects.isNull(rateLimit)) {
                rateLimit = AnnotationUtils.findAnnotation(
                    handlerMethod.getBean().getClass(), RateLimit.class);
            }
            
            if (Objects.isNull(rateLimit)) return true;
            
            // 拿令牌，失败抛异常
            if (!RedisLimiterUtils.tryAcquire(
                    request.getRequestURI(),
                    rateLimit.capacity(),
                    rateLimit.rate())) {
                throw new TimeOutException("系统繁忙，请稍后再试");
            }
        }
        return true;
    }
}

注册拦截器：

1
2
3
4
5
6
7
8
@Configuration
public class WebConfigurer implements WebMvcConfigurer {
    @Override
    public void addInterceptors(InterceptorRegistry registry) {
        registry.addInterceptor(new RateLimiterIntercept())
                .addPathPatterns("/**");
    }
}

六、4 种限流方案对比

维度	Nginx	Tomcat	Guava	Redis + Lua
粒度	IP / Server	容器线程	接口	接口 / 用户 / 全局
作用层	边缘	容器	单体应用	分布式
性能	极高	高	高	中（网络 IO）
跨节点	是	否	否	是
实现成本	配置	配置	引入 + AOP	Lua + 工具类
推荐场景	公网入口	单体应用	单体应用	分布式集群

七、面试常问

Q：令牌桶和漏桶有什么区别？ A：

令牌桶：以恒定速率生产令牌，请求消耗令牌。允许突发流量（桶满时能瞬时处理多个请求）
漏桶：以恒定速率流出请求，水（请求）任意速率流入桶。强制平滑，不允许突发

Q：为什么 Redis 限流要选 Lua 脚本？ A：Redis 单线程 + Lua 原子执行，整个限流判断在一次 RTT 内完成。如果用 Java 代码分多次调用（HMGET → 业务判断 → HMSET），中间被打断会导致计数错乱。

Q：Guava RateLimiter 和 Redis 限流怎么选？ A：

单实例 / 性能敏感：Guava（避免 Redis 网络 IO）
多实例 / 严格全局限流：Redis（保证全集群总和）

Q：Nginx 的 burst 和 nodelay 怎么理解？ A：burst=10 nodelay 表示允许 10 个请求瞬时超出 rate，但立即处理不排队。等价于"令牌桶初始有 10 个令牌，被拿完后必须等下一个令牌生成"。

八、小结

限流分层：CDN → Nginx → Tomcat → Gateway → 单体应用 → 分布式 Redis
Nginx：边缘限流首选，配置简单性能高
Tomcat：maxThreads 调优是基本功
Guava：单体内 AOP + 注解最优雅
Redis + Lua：分布式场景的必选，原子性 + 全局一致