WebClient在多线程、使用代理情况下 socket closed 问题的一个解决办法[htmlunit]

通过 WebClient 的内置浏览器,可以执行页面抓取工作,有时可能需要设置代理,
WebClient webClient = new WebClient(BrowserVersion.x);
webClient.setProxyConfig(ProxyConfig pc);
在单线程情况下,使用这样创建的webClient不会有问题:客户端到代理服务器的连接能够很有次序的建立、关闭。

考虑这样的情况:多个线程并发地访问 WebClient,可能就会报下面的异常:

 

[Thread-7] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute  - Closing connection [HttpRoute[{}->http://192.168.5.29:3128->http://58.223.139.151:8080]][null]
........
2012-04-19 10:31:32,926 [Thread-8] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute  - Total connections kept alive: 0
2012-04-19 10:31:32,926 [Thread-8] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute  - Total issued connections: 0
2012-04-19 10:31:32,926 [Thread-8] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute  - Total allocated connection: 0 out of 20
2012-04-19 10:31:32,926 [Thread-8] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute  - No free connections [HttpRoute[{}->http://192.168.5.29:3128->http://58.223.139.151:8080]][null]
2012-04-19 10:31:32,926 [Thread-8] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute  - Available capacity: 2 out of 2 [HttpRoute[{}->http://192.168.5.29:3128->http://58.223.139.151:8080]][null]
2012-04-19 10:31:32,926 [Thread-8] DEBUG org.apache.http.impl.conn.tsccm.ConnPoolByRoute  - Creating new connection [HttpRoute[{}->http://192.168.5.29:3128->http://58.223.139.151:8080]]
2012-04-19 10:31:32,926 [Thread-6] DEBUG org.apache.http.impl.client.DefaultHttpClient  - socket closed
java.net.SocketException: socket closed
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:130)
at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:127)
at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:233)
at org.apache.http.impl.conn.LoggingSessionInputBuffer.readLine(LoggingSessionInputBuffer.java:100)
at org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:98)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:210)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:271)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:227)
at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:209)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:292)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:126)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:483)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:641)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:597)
at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:134)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1406)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1460)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1325)

at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:304)

at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:370)

异常信息显示:thread-6使用webclient时,检测到 socket closed异常,查看上面的异常,存在 socket (http://192.168.5.29:3128->http://58.223.139.151:8080)被 thread-7 关闭的情况,thread-8 创建了新 socket,可能之后某个时间点,socket又被关闭,导致 thread-6 报socket closed异常。
通过使用ThreadLocal为不同的线程创建各自独立的 WebClient 对象,就能避免上述问题:

    1. // 每个线程保持一个独立的 WebClient 对象,防止线程共用一个浏览器相互干扰  
    2.     private ThreadLocal<WebClient> client = new ThreadLocal<WebClient>() {  
    3.         protected synchronized WebClient initialValue(){  
    4.             WebClient webClient = new WebClient(version);  
    5.               
    6.              //设置webClient的相关参数  
    7.             webClient.set...;  
    8.           
    9.             return webClient;  
    10.         }  
    11.     };  
    12.       
    13.     public void setWebClient(WebClient wc) {  
    14.         client.set(wc);  
    15.     }  
    16.       
    17.     public WebClient getWebClient() {  
    18.         return client.get();  
    19.     }  

郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。