欢迎访问 生活随笔!

生活随笔

当前位置: 首页 >

Zookeeper的一次迁移故障

发布时间:2024/4/13 37 豆豆
生活随笔 收集整理的这篇文章主要介绍了 Zookeeper的一次迁移故障 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

前阶段同事迁移Zookeeper(是给Kafka使用的以及flume使用)后发现所有Flume-producer/consumer端集体报错:

1 2 3 4 07 Jan 2014 01:19:32,571 INFO  [conf-file-poller-0-SendThread(xxx:2181)] (org.apache.zookeeper.ClientCnxn$SendThread.startConnect:1058)  - Opening socket connection to server xxx:2181 07 Jan 2014 01:19:32,572 INFO  [conf-file-poller-0-SendThread(xxx:2181)] (org.apache.zookeeper.ClientCnxn$SendThread.primeConnection:947)  - Socket connection established to xxx:2181, initiating session 07 Jan 2014 01:19:32,573 INFO  [conf-file-poller-0-SendThread(xxx:2181)] (org.apache.zookeeper.ClientCnxn$SendThread.run:1183)  - Unable to read additional data from server sessionid 0x142f42b91871911, likely server has closed socket, closing socket connection and attempting reconnect 07 Jan 2014 01:19:32,845 INFO  [conf-file-poller-0-SendThread(xxx:2181)] (org.apache.zookeeper.ClientCnxn$SendThread.startConnect:1058)  - Opening socket connection to server xxx:2181

一直在不断的重试连接失败再重试,问同事说:网路连通性早就验证过,然后查看server端日志发现:

1 2 3 4 5 6 7 8 2014-01-06 23:59:59,987 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /xxx:45282 2014-01-06 23:59:59,987 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@793] - Connection request from old client xxx:45282; will be dropped if server is in r-o mode 2014-01-06 23:59:59,987 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@812] - Refusing session request for client xxx:45282 as it has seen zxid 0x60fd15564 our last zxid is 0x10000000f client must try another server 2014-01-06 23:59:59,987 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket connection for client xxx:45282 (no se ssion established for client) 2014-01-06 23:59:59,989 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from xxx:45285

发现Flume还是保留原来的zxid,但是现在的zxid竟然是0,所以抛出异常!

1 2 3 4 5 6 7 8 9 10 11 if (connReq.getLastZxidSeen() > zkDb.dataTree.lastProcessedZxid) {             String msg = "Refusing session request for client "                 + cnxn.getRemoteSocketAddress()                 + " as it has seen zxid 0x"                 + Long.toHexString(connReq.getLastZxidSeen())                 + " our last zxid is 0x"                 + Long.toHexString(getZKDatabase().getDataTreeLastProcessedZxid())                 + " client must try another server";             LOG.info(msg);             throw new CloseRequestException(msg);         }

   后来问同事是怎么做的迁移:先启动一套新的集群,然后关闭老的集群,同时在老集群的一个IP:2181起了一个haproxy代理新集群以为这样,可以做到透明迁移=。=,其实是触发了ZK的bug-832导致不停的重试连接,只有重启flume才可以解决

   正确的迁移方式是,把新集群加入老集群,然后修改Flume配置等一段时间(flume自动reconfig)后再关闭老集群就不会触发这个问题了.



本文转自MIKE老毕 51CTO博客,原文链接:http://blog.51cto.com/boylook/1365364,如需转载请自行联系原作者


总结

以上是生活随笔为你收集整理的Zookeeper的一次迁移故障的全部内容,希望文章能够帮你解决所遇到的问题。

如果觉得生活随笔网站内容还不错,欢迎将生活随笔推荐给好友。