Http原理及PHP中cURL的使用
为了给接下来的教程做好铺垫,本文将讲述如何用PHP发出Http请求进行模拟登录,顺带会讲一些Http请求原理。模拟登录…就是模拟浏览器登录嘛,所谓请求,只不过是你向网站发一些字,网站又给你回复一些字,这一般都是基于Http或Https协议的。平时是浏览器帮我们做好了这些工作,封装数据发送到指定网站,然后接收,最后编译成网页显示出来。在模拟登录中,呵呵,这些都要我们自己做,只是最后不用编译…只要提取到需要的数据就行了。
PHP中模拟登录有三种方式。第一是直接用file_get_contens(网站)这个函数,这个..用起来很简单,不说了;第二种是用socket,按照套接字的规定把要发送的字符一个个打上去,再发出去,这个..没多研究,也不说了;最后就当然是用PHP自带的CURL工具了。这个工具可以根据不同的需求,设置消息包头信息、发送字流等等,也很方便。至于Http数据包的格式是怎么样的,这是Http协议的基本内容,不多说。下面用CURL模拟发起一次对百度的请求:
1 $curl = curl_init() //初始化实例 2 curl_setopt($curl, CURLOPT_URL, http://www.baidu.com) //设置URL地址 3 curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 5); //5秒连接超时 4 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); //设为1返回Http响应结果 5 //伪造客户端,最好设一下,有些网站会根据客户端来阻隔请求的 6 curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘); 7 $response = curl_exec($curl); //curl执行http请求,响应存到$response变量中 8 $state = curl_getinfo($curl, CURLINFO_HTTP_CODE); //可以用这句来获取响应的状态码 9 curl_close($curl); //释放curl资源
至此一次请求就完成了,$response变量是响应结果,也就是百度页面的html源码(字符串)。Http请求中有两种请求方式,一是GET,另一是POST(具体的看Http原理去),以上对百度的是GET方法,POST方法不同就在于,要把参数作为消息报内容发送出去,参数流按照Http协议的规定,p1=v1&p2=v2&p3=v3…,p是参数名v是值,搞不清楚的看懂Http原理再接着看。记参数为$param变量(字符串),那就
curl_setopt($curl, CURLOPT_POSTFIELDS, $param);
如果要设置请求头部:
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
其中$header为数组类型,比如要写入CLIENT-IP和X_FORWARDED-FOR这两个头信息,那就$header = array(‘CLIENT-IP: ‘=>‘value‘, ‘X-FORWARDED-FOR: ‘=>‘value‘).
curl还有很多CURLOPT预设值给curl_setopt使用,具体的我不写出来了..自己找吧
接下来,既然curl已经知道怎么用了,能不能用curl写一个模拟登录的工具类呢?
我把这个类叫RequestClient,一般请求关系到三方面:url地址、请求方法、请求参数,请求头部可要可不要,所以也写下去;至于接收到的响应,就取响应数据报、状态码。综上,定义这个类的成员变量:
private $response = null; private $url; private $header = null; private $parameter = null; private $method = ‘GET‘; //默认使用GET方法请求 private $state = null;
实例化时要指定url,也可以通过set的方式设定
public function __construct($url) { $this->url = $url; } public function setUrl($url) { $this->url = $url; }
Header的setter($header按照上面提到的格式):
public function setHeader($header) { $this->header = $header; }
以及各种getter:
public function getUrl() { return $this->url; } public function getParameter() { return $this->parameter; } public function getHeader() { return $this->header; }public function getMethod() { return $this->method; } public function getState() { return $this->state; } public function getResponse() { return $this->response; }
接下来设置参数了,设置参数有两种方式,一是通过传递数组,再把数据信息转化为参数字符串,二是直接传递字符串,数组格式为array(“p1”=>”value1”, “p2”=>”value2”…),encode可选择是否对参数进行url编码(默认是)
1 public function setParameter($parameter = null, $encode = true) { 2 if (is_array($parameter)) { 3 $temp = ‘‘; 4 if ($encode) { 5 foreach ($parameter as $key => $value) { 6 $temp .= "$key=".urlencode($value) ."&"; 7 } 8 } else { 9 foreach ($parameter as $key => $value) { 10 $temp .= "$key=$value&"; 11 } 12 } 13 $this->parameter = substr($temp, 0, -1); 14 } elseif (is_string($parameter)) { 15 $this->parameter = $parameter; 16 } 17 }
下面是get和post方法,模拟发出get、post请求,响应报文放在$response,状态码放在$state
1 public function get($timeout=5) { 2 $this->method = ‘GET‘; 3 if ($this->parameter != null) { //get在有参数的情况下,把参数附在url上 4 $this->url .= (‘?‘.$this->parameter); 5 } 6 $curl = curl_init(); 7 curl_setopt($curl, CURLOPT_URL, $this->url); 8 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 9 curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout); 10 if ($this->header!=null) { //有头信息时才设置 11 curl_setopt($curl, CURLOPT_HTTPHEADER, $this->header); 12 } 13 curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘); 14 $this->response = curl_exec($curl); 15 $this->state = curl_getinfo($curl, CURLINFO_HTTP_CODE); 16 curl_close($curl); 17 return $this->response; 18 } 19 20 public function post($timeout=5) { 21 $this->method = ‘POST‘; 22 $curl = curl_init(); 23 curl_setopt($curl, CURLOPT_URL, $this->url); 24 curl_setopt($curl, CURLOPT_HEADER, 1); 25 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 26 curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout); 27 curl_setopt($curl, CURLOPT_POSTFIELDS, $this->parameter); 28 if ($this->header!=null) { 29 curl_setopt($curl, CURLOPT_HTTPHEADER, $this->header); 30 } 31 curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘); 32 $this->response = curl_exec($curl); 33 $this->state = curl_getinfo($curl, CURLINFO_HTTP_CODE); 34 curl_close($curl); 35 return $this->response; 36 }
这样,一个用于Http请求的类就基本完成了,但是根据实际情况,对不同的需求可以提供不同的功能,比如说获取网页的标题(<title>的内容):
1 public function getTitle() { 2 $source = $this->response; 3 $start = stripos($source, ‘<title‘); 4 $source = substr($source, $start); 5 $start = stripos($source, ‘>‘) + 1; 6 $end = stripos($source, ‘<‘, $start); 7 return substr($source, $start, $end-$start); 8 }
获取cookie返回字符串(CURL提供了一个获取Cookie很方便快捷的方法,在setopt中用CURLOPT_COOKIEJAR和CURLOPT_COOKIE变量获取就可以了,Cookie信息会写在指定的文件中,发出请求时直接调用这个文件上传就可以了,但是由于个人习惯,我还是喜欢把cookie当字符串提取出来,设置在$header头信息的Cookie中,这样比较灵活吧,以下函数就是把cookie串提取出来,以[cookie1=value1; cookie2=value2; …]这个格式返回string):
1 public function getCookie() { 2 $content = $this->response; //$response中包含响应头信息 3 $start = 0; 4 $rt = ‘‘; 5 while (($start = stripos($content, ‘Set-Cookie: ‘, $start)) != false) { //不断搜索’Set-Cookie’字段 6 $start += 12; //从$start位置开始忽略Set-Cookie这12个字符 7 $end = stripos($content, ‘;‘, $start); 8 $rt .= substr($content, $start, $end-$start).‘; ‘; 9 } 10 return substr($rt, 0, -2); //丢掉最后的分号和空格 11 }
调用时就是
$client = new RequestClient(“这里是网址”);
$client->setHeader(头信息);
$client->setParameter(参数);
$client->get() 或者 $client->post();
至此这个类就完成了。最后要说的一点是,这个封装功能的思路和代码实现毕竟都是我凭经验总结出来的,不免会有一点差错或者有点不完善。总之就是,在实际应用中要根据自己的需求改善,增加一些功能,更好地去适应自己的程序。
最后的完整代码:
1 <?php 2 class RequestClient { 3 private $response = null; 4 private $url; 5 private $header = null; //type: array 6 private $parameter = null; //type String 7 private $proxy = null; //代理 8 private $method = ‘GET‘; //default GET method 9 private $state = null; 10 11 12 // a static function to create a new object with parameters url, parameters, and cookie(path) 13 public static function newClient($url, $parameter=null, $header=null) { 14 $client = new RequestClient($url); 15 $client->setParameter($parameter); 16 $client->setHeader($header); 17 return $client; 18 } 19 20 // constructor, with a only parameter url 21 public function __construct($url) { 22 $this->url = $url; 23 } 24 25 public function __destruct() { 26 $this->clear(); 27 } 28 29 // setter 30 public function setUrl($url) { 31 $this->url = $url; 32 } 33 34 public function setHeader($header) { 35 $this->header = $header; 36 } 37 38 public function setProxy($proxy) { 39 $this->proxy = $proxy; 40 } 41 42 public function getCookie() { 43 $content = $this->response; 44 $start = 0; 45 $rt = ‘‘; 46 while (($start = stripos($content, ‘Set-Cookie: ‘, $start)) != false) { 47 $start += 12; 48 $end = stripos($content, ‘;‘, $start); 49 $rt .= substr($content, $start, $end-$start).‘; ‘; 50 } 51 return substr($rt, 0, -2); 52 } 53 54 55 public function setParameter($parameter = null, $encode = true) { 56 if (is_array($parameter)) { //change to ‘string‘ if the type is ‘array‘ 57 $temp = ‘‘; 58 if ($encode) { 59 foreach ($parameter as $key => $value) { 60 $temp .= "$key=".urlencode($value) ."&"; //change to string 61 } 62 } else { 63 foreach ($parameter as $key => $value) { 64 $temp .= "$key=$value&"; 65 } 66 } 67 $this->parameter = substr($temp, 0, -1); 68 } elseif (is_string($parameter)) { 69 $this->parameter = $parameter; 70 } 71 } 72 73 // request in method ‘GET‘, set the response content to $this->reponse and return it 74 public function get($timeout=5) { 75 $this->method = ‘GET‘; 76 $this->handleParameter(); 77 $curl = curl_init(); 78 curl_setopt($curl, CURLOPT_URL, $this->url); 79 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 80 curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout); 81 if ($this->header!=null) { 82 curl_setopt($curl, CURLOPT_HTTPHEADER, $this->header); 83 } 84 if ($this->proxy!=null) { 85 curl_setopt($curl, CURLOPT_PROXY, $this->proxy); 86 } 87 curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘); 88 $this->response = curl_exec($curl); 89 $this->state = curl_getinfo($curl, CURLINFO_HTTP_CODE); 90 curl_close($curl); 91 return $this->response; 92 } 93 94 // request in method ‘POST‘, set the response content to $this->reponse and return it 95 public function post($timeout=5) { 96 $this->method = ‘POST‘; 97 $curl = curl_init(); 98 curl_setopt($curl, CURLOPT_URL, $this->url); 99 curl_setopt($curl, CURLOPT_HEADER, 1); 100 curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); 101 curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, $timeout); 102 curl_setopt($curl, CURLOPT_POSTFIELDS, $this->parameter); 103 if ($this->header!=null) { 104 curl_setopt($curl, CURLOPT_HTTPHEADER, $this->header); 105 } 106 if ($this->proxy!=null) { 107 curl_setopt($curl, CURLOPT_PROXY, $this->proxy); 108 } 109 curl_setopt($curl, CURLOPT_USERAGENT, ‘User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0‘); 110 $this->response = curl_exec($curl); 111 $this->state = curl_getinfo($curl, CURLINFO_HTTP_CODE); 112 curl_close($curl); 113 return $this->response; 114 } 115 116 // get the title 117 public function getTitle() { 118 $source = $this->response; 119 $start = stripos($source, ‘<title‘); 120 $source = substr($source, $start); 121 $start = stripos($source, ‘>‘) + 1; 122 $end = stripos($source, ‘<‘, $start); 123 return substr($source, $start, $end-$start); 124 } 125 126 // reset state of the object, only url remain 127 public function clear() { 128 $this->parameter = null; 129 $this->header = null; 130 $this->response = null; 131 $this->proxy = null; 132 $this->method = ‘GET‘; 133 } 134 135 // getter 136 public function getUrl() { return $this->url; } 137 public function getParameter() { return $this->parameter; } 138 public function getHeader() { return $this->header; } 139 public function getProxy() { return $this->proxy; } 140 public function getMethod() { return $this->method; } 141 public function getState() { return $this->state; } 142 public function getResponse() { return $this->response; } 143 144 // private function, mix the parameter with url if the method is ‘GET‘ 145 private function handleParameter() { 146 if ($this->parameter != null) { 147 if ($this->method == ‘GET‘) { 148 $this->url .= (‘?‘.$this->parameter); 149 } 150 } 151 } 152 } 153 ?>
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。