最近有个已经上线很久的项目,日志突然开始频繁报如下两段错误:

CommunicationsException, druid version 1.1.16, jdbcUrl : jdbc:mysql://rm-xxxxxxxxxxx.mysql.rds.aliyuncs.com:3306/db_xxxxx_prod?useSSL=false&useAffectedRows=true&useUnicode=true&characterEncoding=UTF-8, testWhileIdle true, idle millis 59435, minIdle 1, poolingCount 1, timeBetweenEvictionRunsMillis 60000, lastValidIdleMillis 59435, driver com.mysql.cj.jdbc.Driver, exceptionSorter com.alibaba.druid.pool.vendor.MySqlExceptionSorter
com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure

The last packet successfully received from the server was 59,436 milliseconds ago. The last packet sent successfully to the server was 59,437 milliseconds ago.

观察了下日志发现,是中午12点开始报错的,找到第一条报错日志观察附件执行的程序,发现是一个上线很久没怎么用过的定时任务开始跑了,业务很简单是读取某个邮件中的附件,然后解析数据入库,因为这个错误导致很多其他业务操作数据库失败,因此产生的日志较多。

经过一波翻找日志,发现日志中有一段巨长的sql,是一个批量update,具体多长呢,看了下附件大概有5000多条数据,下这个sql拼接5000多差不多是1.5M吧。。。。这就很尴尬了,居然没有分段执行。

update t_xxxxx_xxxx a join (                            select                 ? as id,                 ? as product_code,                 ? as 'status',                 ? as color_cn ,                 ? as color_en,                 ? as style,                 ? as source,                 ? as confirm_time              UNION              select                 ? as id,                 ? as product_code,                 ? as 'status',                 ? as color_cn ,                 ? as color_en,                 ? as style,                 ? as source,                 ? as confirm_time              UNION              select                 ? as id,                 ? as product_code,                 ? as 'status',                 ? as color_cn ,                 ? as color_en,                 ? as style,                 ? as source,                 ? as confirm_time              UNION              select                 ? as id,                 ? as product_code,                 ? as 'status',                 ? as color_cn ,                 ? as color_en,                 ? as style,                 ? as source,                 ? as confirm_time............................................................

综上,初步判断sql太长,传输 + 解析建立连接时间超长,导致连接被回收了,因此底层抛出通信链路故障的异常,所以大家在写sql的时候一定要注意,for循环拼接的时候一定要考虑限制长度,因为你也不知道未来别人会怎么用这段sql,入参到底有多大。

当然也可以考虑调整参数来解决,但一版这些参数默认就好,默认就是比较合理的时间参数了,因此最好还是从自己程序去调整。

GitHub 加速计划 / druid / druid
27.83 K
8.56 K
下载
阿里云计算平台DataWorks(https://help.aliyun.com/document_detail/137663.html) 团队出品,为监控而生的数据库连接池
最近提交(Master分支:1 个月前 )
f77b2f18 - 8 天前
a1536b8c - 11 天前
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐