亲宝软件园·资讯

展开

排查Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl

brucelwl 人气:0

整个问题排查过程中基本把Hikari连接池源码看完了,也发现了Hikari内部的一些实现细节,更多细节参见:Hikari 数据库连接池内部源码实现细节

问题描述及排查

最近发布项目测试环境后,发现后台经常隔28分钟时间就打印mysql 连接已经被关闭警告。(应用程序每28分钟访问一次数据库)

连接池配置如下:

spring.datasource.username=${name}
spring.datasource.password=${pwd}
spring.datasource.url=jdbc:mysql://${ip}:3306/${dbname}?useUnicode=true&characterEncoding=utf-8&useSSL=false&serverTimezone=Asia/Shanghai

spring.datasource.type=com.zaxxer.hikari.HikariDataSource
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.hikari.minimum-idle=10
spring.datasource.hikari.maximum-pool-size=20
spring.datasource.hikari.connection-test-query=SELECT 1 FROM DUAL
spring.datasource.hikari.validation-timeout=10000
spring.datasource.hikari.auto-commit=true

警告信息如下:

2021-03-19 13:25:58.486  WARN 21806 --- [scheduling-1] com.zaxxer.hikari.pool.PoolBase          : 
slave - Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl@2e2ce118 (No operations allowed after connection closed.). 
Possibly consider using a shorter maxLifetime value.

虽然这只是一个警告,并不影响程序执行,但还是很好奇想要排查一下具体原因。

warn信息的描述,程序通过已经关闭的连接来访问数据数据库,这是不允许的操作,并且警告中提醒减小maxLifetime的值来解决这个问题。

看到这个警告之后,笔者首先想到的就是mysql 8小时主动断开问题:如果hikari的maxLifetime的值超过了mysql默认的wait_timeout设置的时间28800s (8小时),实际上mysql已经主动关闭了连接。

但示例中并没有配置hikari的maxLifetime选项,采用的是默认。

查看hikari的源码发现了如下几个默认配置

com.zaxxer.hikari.HikariConfig源码中的类变量

private static final long CONNECTION_TIMEOUT = SECONDS.toMillis(30);  30s
private static final long VALIDATION_TIMEOUT = SECONDS.toMillis(5);   5s
private static final long IDLE_TIMEOUT = MINUTES.toMillis(10);        20分钟
private static final long MAX_LIFETIME = MINUTES.toMillis(30);        30分钟
private static final int DEFAULT_POOL_SIZE = 10;

hikari中默认连接存活的最大时间是30分钟,小于mysql默认的wait_timeout28800s (8小时),按道理不会是服务端主动主动关闭连接导致。

那会不会是mysqlwait_timeout被修改过??? 通过如下语句查询发现并没有被修改。

SHOW VARIABLES LIKE '%timeout%'

当排除这个常规问题后,笔者尝试搜索了有没有其它可能导致这个警告,最终还是有一个小发现:

搜索到的案例是:虽然mysql服务端通过SHOW VARIABLES LIKE '%timeout%'查出来的时间28800s,但是实际上,/etc/my.cnf中配置的超时时间更小,导致mysql服务端主动关闭连接。

但是经过和dba确认,也排除了这个情况。

既然以上两种情况都排除了,那只能深入源码,在测试环境远程断点调试。

先搜索下警告打印的位置,只有一处在isConnectionAlive方法中.

源码如下: com.zaxxer.hikari.pool.PoolBase#isConnectionAlive

boolean isConnectionAlive(final Connection connection)
{
  try {
	 try {
		setNetworkTimeout(connection, validationTimeout);

		final int validationSeconds = (int) Math.max(1000L, validationTimeout) / 1000;

		if (isUseJdbc4Validation) {
		   return connection.isValid(validationSeconds);
		}

		try (Statement statement = connection.createStatement()) {
		   if (isNetworkTimeoutSupported != TRUE) {
			  setQueryTimeout(statement, validationSeconds);
		   }

		   statement.execute(config.getConnectionTestQuery());
		}
	 }
	 finally {
		setNetworkTimeout(connection, networkTimeout);

		if (isIsolateInternalQueries && !isAutoCommit) {
		   connection.rollback();
		}
	 }

	 return true;
  }
  catch (Exception e) {
	 lastConnectionFailure.set(e);
	 logger.warn("{} - Failed to validate connection {} ({}). Possibly consider using a shorter maxLifetime value.",
				 poolName, connection, e.getMessage());
	 return false;
  }
}

该方法用于检测connection的有效性,catch到异常后,只是简单打印文章开头所提的异常警告信息,并返回false,表示连接失效。

然后会继续从连接池中获取有效的连接。

获取连接的代码如下: com.zaxxer.hikari.pool.HikariPool#getConnection(long hardTimeout)

public Connection getConnection(final long hardTimeout) throws SQLException
{
  suspendResumeLock.acquire();
  final long startTime = currentTime();

  try {
	 long timeout = hardTimeout;
	 do {
		PoolEntry poolEntry = connectionBag.borrow(timeout, MILLISECONDS);
		if (poolEntry == null) {
		   break; // We timed out... break and throw exception
		}

		final long now = currentTime();
		if (poolEntry.isMarkedEvicted() || (elapsedMillis(poolEntry.lastAccessed, now) > aliveBypassWindowMs && !isConnectionAlive(poolEntry.connection))) {
		   closeConnection(poolEntry, poolEntry.isMarkedEvicted() ? EVICTED_CONNECTION_MESSAGE : DEAD_CONNECTION_MESSAGE);
		   timeout = hardTimeout - elapsedMillis(startTime);
		}
		else {
		   metricsTracker.recordBorrowStats(poolEntry, startTime);
		   return poolEntry.createProxyConnection(leakTaskFactory.schedule(poolEntry), now);
		}
	 } while (timeout > 0L);

	 metricsTracker.recordBorrowTimeoutStats(startTime);
	 throw createTimeoutException(startTime);
  }
  catch (InterruptedException e) {
	 Thread.currentThread().interrupt();
	 throw new SQLException(poolName + " - Interrupted during connection acquisition", e);
  }
  finally {
	 suspendResumeLock.release();
  }
}

为了打印更多的异常栈信息,笔者修改了isConnectionAlive方法源码,在catch到异常后,打印整个异常栈信息,最终得到如下日志:

2021-03-19 13:25:58.486  WARN 21806 --- [scheduling-1] com.zaxxer.hikari.pool.PoolBase          : slave - Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl@2e2ce118 (No operations allowed after connection closed.). Possibly consider using a shorter maxLifetime value.
2021-03-19 13:25:58.489 ERROR 21806 --- [scheduling-1] com.zaxxer.hikari.pool.PoolBase          : poolBase:

java.sql.SQLNonTransientConnectionException: No operations allowed after connection closed.
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:110) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:89) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:63) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:73) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:73) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.jdbc.ConnectionImpl.setNetworkTimeout(ConnectionImpl.java:2495) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:563) ~[classes!/:na]
    at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:173) ~[classes!/:na]
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186) [HikariCP-3.4.5.jar!/:na]
    at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162) [HikariCP-3.4.5.jar!/:na]
    at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:100) [HikariCP-3.4.5.jar!/:na]
    at org.springframework.jdbc.datasource.lookup.AbstractRoutingDataSource.getConnection(AbstractRoutingDataSource.java:194) [spring-jdbc-5.3.4.jar!/:5.3.4]
    at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:158) [spring-jdbc-5.3.4.jar!/:5.3.4]
    at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:116) [spring-jdbc-5.3.4.jar!/:5.3.4]
    at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:79) [spring-jdbc-5.3.4.jar!/:5.3.4]
    .........................
Caused by: com.mysql.cj.exceptions.ConnectionIsClosedException: No operations allowed after connection closed.
    at sun.reflect.GeneratedConstructorAccessor32.newInstance(Unknown Source) ~[na:na]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_91]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_91]
    at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.NativeSession.checkClosed(NativeSession.java:1171) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:573) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.jdbc.ConnectionImpl.setNetworkTimeout(ConnectionImpl.java:2491) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    ... 44 common frames omitted
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure

The last packet successfully received from the server was 1,758,048 milliseconds ago. The last packet sent successfully to the server was 1,758,049 milliseconds ago.
    at sun.reflect.GeneratedConstructorAccessor31.newInstance(Unknown Source) ~[na:na]
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_91]
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_91]
    at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.exceptions.ExceptionFactory.createCommunicationsException(ExceptionFactory.java:167) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.protocol.a.NativeProtocol.send(NativeProtocol.java:577) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.protocol.a.NativeProtocol.sendCommand(NativeProtocol.java:632) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.protocol.a.NativeProtocol.sendQueryPacket(NativeProtocol.java:948) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.protocol.a.NativeProtocol.sendQueryString(NativeProtocol.java:894) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.NativeSession.execSQL(NativeSession.java:1073) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:722) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:646) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:169) ~[classes!/:na]
    ... 42 common frames omitted
Caused by: java.net.SocketException: Broken pipe
    at java.net.SocketOutputStream.socketWrite0(Native Method) ~[na:1.8.0_91]
    at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) ~[na:1.8.0_91]
    at java.net.SocketOutputStream.write(SocketOutputStream.java:153) ~[na:1.8.0_91]
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.8.0_91]
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.8.0_91]
    at com.mysql.cj.protocol.a.SimplePacketSender.send(SimplePacketSender.java:55) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.protocol.a.TimeTrackingPacketSender.send(TimeTrackingPacketSender.java:50) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    at com.mysql.cj.protocol.a.NativeProtocol.send(NativeProtocol.java:568) ~[mysql-connector-java-8.0.23.jar!/:8.0.23]
    ... 49 common frames omitted

从最后的Caused by: java.net.SocketException: Broken pipe 似乎只能看出在发送数据时socket已经被断开。

于是想到是不是socketconnectionTimesocketTimeout太短,网络较差导致发送sql请求时超时???

又是一通debug之后,深入到mysql-connector-java客户端源码中,最终发现了如下两段重要的代码,用于初始化mysql的连接。

其中成员变量propertySet则保存了全部的连接参数以及数据连接配置参数。

com.mysql.cj.NativeSession#connect源码

com.mysql.cj.protocol.StandardSocketFactory#configureSocket源码

com.mysql.cj.CoreSession#propertySet中保存的连接配置属性如下:

result = {Properties@10050}  size = 167
 "useCompression" -> "false"
 "maxRows" -> "-1"
 "tcpTrafficClass" -> "0"
 "allowNanAndInf" -> "false"
 "databaseTerm" -> "CATALOG"
 "sendFractionalSeconds" -> "true"
 "loadBalanceBlocklistTimeout" -> "0"
 "tcpKeepAlive" -> "true"
 "allowReplicaDownConnections" -> "false"
 "maxQuerySizeToLog" -> "2048"
 "verifyServerCertificate" -> "false"
 "cacheServerConfiguration" -> "false"
 "dontCheckOnDuplicateKeyUpdateInSQL" -> "false"
 "paranoid" -> "false"
 "allowSourceDownConnections" -> "false"
 "xdevapi.ssl-truststore-type" -> "JKS"
 "fallbackToSystemTrustStore" -> "true"
 "enableQueryTimeouts" -> "true"
 "blobSendChunkSize" -> "1048576"
 "useNanosForElapsedTime" -> "false"
 "readOnlyPropagatesToServer" -> "true"
 "profilerEventHandler" -> "com.mysql.cj.log.LoggingProfilerEventHandler"
 "secondsBeforeRetrySource" -> "30"
 "parseInfoCacheFactory" -> "com.mysql.cj.PerConnectionLRUFactory"
 "includeThreadNamesAsStatementComment" -> "false"
 "xdevapi.compression" -> "PREFERRED"
 "socketTimeout" -> "5000"
 "socksProxyPort" -> "1080"
 "requireSSL" -> "false"
 "largeRowSizeThreshold" -> "2048"
 "xdevapi.fallback-to-system-keystore" -> "true"
 "allowMultiQueries" -> "false"
 "rewriteBatchedStatements" -> "false"
 "enablePacketDebug" -> "false"
 "allowPublicKeyRetrieval" -> "false"
 "noDatetimeStringSync" -> "false"
 "includeThreadDumpInDeadlockExceptions" -> "false"
 "enableEscapeProcessing" -> "true"
 "failOverReadOnly" -> "true"
 "readFromSourceWhenNoReplicas" -> "false"
 "clientCertificateKeyStoreType" -> "JKS"
 "overrideSupportsIntegrityEnhancementFacility" -> "false"
 "disconnectOnExpiredPasswords" -> "true"
 "clobberStreamingResults" -> "false"
 "trustCertificateKeyStoreType" -> "JKS"
 "jdbcCompliantTruncation" -> "false"
 "useLocalTransactionState" -> "false"
 "cacheCallableStmts" -> "false"
 "rollbackOnPooledClose" -> "true"
 "cacheResultSetMetadata" -> "false"
 "loadBalanceValidateConnectionOnSwapServer" -> "false"
 "selfDestructOnPingMaxOperations" -> "0"
 "elideSetAutoCommits" -> "false"
 "loadBalanceHostRemovalGracePeriod" -> "15000"
 "serverConfigCacheFactory" -> "com.mysql.cj.util.PerVmServerConfigCacheFactory"
 "cacheDefaultTimeZone" -> "true"
 "allowLoadLocalInfile" -> "false"
 "loadBalanceAutoCommitStatementThreshold" -> "0"
 "getProceduresReturnsFunctions" -> "true"
 "forceConnectionTimeZoneToSession" -> "false"
 "useInformationSchema" -> "false"
 "reportMetricsIntervalMillis" -> "30000"
 "alwaysSendSetIsolation" -> "true"
 "tinyInt1isBit" -> "true"
 "xdevapi.ssl-keystore-type" -> "JKS"
 "xdevapi.auth" -> "PLAIN"
 "processEscapeCodesForPrepStmts" -> "true"
 "emptyStringsConvertToZero" -> "true"
 "sslMode" -> "DISABLED"
 "xdevapi.connect-timeout" -> "10000"
 "zeroDateTimeBehavior" -> "EXCEPTION"
 "sendFractionalSecondsForTime" -> "true"
 "maxReconnects" -> "3"
 "autoDeserialize" -> "false"
 "callableStmtCacheSize" -> "100"
 "prepStmtCacheSqlLimit" -> "256"
 "resultSetSizeThreshold" -> "100"
 "useSSL" -> "false"
 "metadataCacheSize" -> "50"
 "useHostsInPrivileges" -> "true"
 "tcpRcvBuf" -> "0"
 "dontTrackOpenResources" -> "false"
 "autoGenerateTestcaseScript" -> "false"
 "characterEncoding" -> "utf-8"
 "xdevapi.fallback-to-system-truststore" -> "true"
 "nullDatabaseMeansCurrent" -> "false"
 "useLocalSessionState" -> "false"
 "detectCustomCollations" -> "false"
 "useOldAliasMetadataBehavior" -> "false"
 "autoReconnect" -> "false"
 "autoReconnectForPools" -> "false"
 "traceProtocol" -> "false"
 "maxAllowedPacket" -> "16777216"
 "populateInsertRowWithDefaultValues" -> "false"
 "useUnbufferedInput" -> "true"
 "ignoreNonTxTables" -> "false"
 "useAffectedRows" -> "false"
 "ha.loadBalanceStrategy" -> "random"
 "loadBalancePingTimeout" -> "0"
 "compensateOnDuplicateKeyUpdateCounts" -> "false"

最后发现在验证connection有效性时,connectionTimesocketTimeout设置的就是数据源中配置的connection-timeoutvalidation-timeout,并通过ping命令检测发现,网络并不是很慢,所以网络请求超时这个可能也排除。

调试过程中发现mysql客户端和服务端之间Socket通信,心跳是通过TCP协议中默认的tcpKeepAlive参数来维护心跳的。

通过搜索得知,linux中默认是2小时。

会不会服务器tcp_keepalive_time时间被修改过,但因为网络缓慢,心跳未接收到,导致服务器关闭了连接???

可以通过linux的如下命令查看

cat /proc/sys/net/ipv4/tcp_keepalive_time

在笔者的应用服务器上发现这个时间确实被改成了较短的180s,那么mysql所在的机器,很可能也是被改过。

于是找到dba查看该项的配置。同样是180s,但这个时间也不至于让心跳超时啊!

最终原因: 好在dba询问了一下为什么要查这个参数,然后笔者将警告信息以及排查的过程描述了一下,结果dba那边竟然告诉我有定时任务, 删除4分钟内没有请求的连接,而我的程序访问数据库时间间隔是28分钟一次,所以早就在mysql服务端关闭了(捂脸哭的表情)

那如何让这个告警不打印呢?

既然数据库服务端会在4分钟后删除没有请求的连接,我们可以减小Hikari中maxLifetime的值小于该值,这样达到maxLifetime时Hikari连接池会通过软驱逐的方式来主动关闭该连接,下次就不会获取到因超时而关闭的连接,也就不会出现警告了。

彩蛋:Hikari中max-lifetime参数作用及实现

该参数就是用于控制连接池中一个connection的最大存活时间。

Hikari内部有一个ScheduledExecutorService对象houseKeepingExecutorService,每创建一个连接,就会被封装成一个PoolEntry对象,然后放在定时任务中,定时时间就是设置的max-lifetime。只要到达这个时间就会采取软驱逐的方式从池中移除。

除此之外,houseKeepingExecutorService还用于每隔30s来检查一次池中的空闲连接、最大连接情况,并维护池中连接数的平衡。

何谓软驱逐

源码见: com.zaxxer.hikari.pool.HikariPool#createPoolEntry

private PoolEntry createPoolEntry()
{
  try {
	 final PoolEntry poolEntry = newPoolEntry();

	 final long maxLifetime = config.getMaxLifetime();
	 if (maxLifetime > 0) {
		// variance up to 2.5% of the maxlifetime
		final long variance = maxLifetime > 10_000 ? ThreadLocalRandom.current().nextLong( maxLifetime / 40 ) : 0;
		final long lifetime = maxLifetime - variance;
		poolEntry.setFutureEol(houseKeepingExecutorService.schedule(
		   () -> {
			  if (softEvictConnection(poolEntry, "(connection has passed maxLifetime)", false /* not owner */)) {
				 addBagItem(connectionBag.getWaitingThreadCount());
			  }
		   },
		   lifetime, MILLISECONDS));
	 }

	 return poolEntry;
  }
  catch (ConnectionSetupException e) {
	 if (poolState == POOL_NORMAL) { // we check POOL_NORMAL to avoid a flood of messages if shutdown() is running concurrently
		logger.error("{} - Error thrown while acquiring connection from data source", poolName, e.getCause());
		lastConnectionFailure.set(e);
	 }
  }
  catch (Exception e) {
	 if (poolState == POOL_NORMAL) { // we check POOL_NORMAL to avoid a flood of messages if shutdown() is running concurrently
		logger.debug("{} - Cannot acquire connection from data source", poolName, e);
	 }
  }

  return null;
}

总结

整个问题排查过程中基本把Hikari连接池源码看完了,也发现了Hikari内部的一些实现细节,更多细节参见:Hikari 数据库连接池内部源码实现细节

如果Hikari连接池遇到Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl问题注意从下面三个方向排查。

以上为个人经验,希望能给大家一个参考,也希望大家多多支持。

加载全部内容

相关教程
猜你喜欢
用户评论