【网页抓取】判断URL是否有效并可提供下载
问题的产生:
今天在提供API接口给客户的时候,客户提出了一个要求,有一个接口返回的语音文件的URL地址需要做有效性验证,这里所指的有效是指请求这个URL后能直接下载语音文件,反之则视为无效。
先来看看两个请求语音文件的URL地址:
有效的:http://xxx.xxx.xxx.xxx:60000/GetRecord.ashx?cid=1586887989,访问后页面输出,如下图所示:
无效的:http://xxx.xxx.xxx.xxx:60000/GetRecord.ashx?cid=1768218725
显而易见,由于cid参数传入的不同,造成有些请求不会得到对应的文件输出,那么该如何快速判断该URL是有效的并可提供下载呢?
解决方案思路:
1、采用Syste.Net.HttpWebRequest类进行http的请求,并获取请求后的返回类型,不需要通过流去读取服务器端的文件内容到内存中,只需要获取服务器端的请求后响应的文件类型即可,如果返回的是wav文件,ContentType="audio/wav",若是html页面,ContentType="text/html"。
2、对照MIME内容类型表,对响应的文件类型进行匹配。
3、以下列出常用的MIME(Multipurpose Internet Mail Extensions)内容类型表
扩展名 | 类型/子类型 |
---|---|
application/octet-stream | |
323 | text/h323 |
acx | application/internet-property-stream |
ai | application/postscript |
aif | audio/x-aiff |
aifc | audio/x-aiff |
aiff | audio/x-aiff |
asf | video/x-ms-asf |
asr | video/x-ms-asf |
asx | video/x-ms-asf |
au | audio/basic |
avi | video/x-msvideo |
axs | application/olescript |
bas | text/plain |
bcpio | application/x-bcpio |
bin | application/octet-stream |
bmp | image/bmp |
c | text/plain |
cat | application/vnd.ms-pkiseccat |
cdf | application/x-cdf |
cer | application/x-x509-ca-cert |
class | application/octet-stream |
clp | application/x-msclip |
cmx | image/x-cmx |
cod | image/cis-cod |
cpio | application/x-cpio |
crd | application/x-mscardfile |
crl | application/pkix-crl |
crt | application/x-x509-ca-cert |
csh | application/x-csh |
css | text/css |
dcr | application/x-director |
der | application/x-x509-ca-cert |
dir | application/x-director |
dll | application/x-msdownload |
dms | application/octet-stream |
doc | application/msword |
dot | application/msword |
dvi | application/x-dvi |
dxr | application/x-director |
eps | application/postscript |
etx | text/x-setext |
evy | application/envoy |
exe | application/octet-stream |
fif | application/fractals |
flr | x-world/x-vrml |
gif | image/gif |
gtar | application/x-gtar |
gz | application/x-gzip |
h | text/plain |
hdf | application/x-hdf |
hlp | application/winhlp |
hqx | application/mac-binhex40 |
hta | application/hta |
htc | text/x-component |
htm | text/html |
html | text/html |
htt | text/webviewhtml |
ico | image/x-icon |
ief | image/ief |
iii | application/x-iphone |
ins | application/x-internet-signup |
isp | application/x-internet-signup |
jfif | image/pipeg |
jpe | image/jpeg |
jpeg | image/jpeg |
jpg | image/jpeg |
js | application/x-javascript |
latex | application/x-latex |
lha | application/octet-stream |
lsf | video/x-la-asf |
lsx | video/x-la-asf |
lzh | application/octet-stream |
m13 | application/x-msmediaview |
m14 | application/x-msmediaview |
m3u | audio/x-mpegurl |
man | application/x-troff-man |
mdb | application/x-msaccess |
me | application/x-troff-me |
mht | message/rfc822 |
mhtml | message/rfc822 |
mid | audio/mid |
mny | application/x-msmoney |
mov | video/quicktime |
movie | video/x-sgi-movie |
mp2 | video/mpeg |
mp3 | audio/mpeg |
mpa | video/mpeg |
mpe | video/mpeg |
mpeg | video/mpeg |
mpg | video/mpeg |
mpp | application/vnd.ms-project |
mpv2 | video/mpeg |
ms | application/x-troff-ms |
mvb | application/x-msmediaview |
nws | message/rfc822 |
oda | application/oda |
p10 | application/pkcs10 |
p12 | application/x-pkcs12 |
p7b | application/x-pkcs7-certificates |
p7c | application/x-pkcs7-mime |
p7m | application/x-pkcs7-mime |
p7r | application/x-pkcs7-certreqresp |
p7s | application/x-pkcs7-signature |
pbm | image/x-portable-bitmap |
application/pdf | |
pfx | application/x-pkcs12 |
pgm | image/x-portable-graymap |
pko | application/ynd.ms-pkipko |
pma | application/x-perfmon |
pmc | application/x-perfmon |
pml | application/x-perfmon |
pmr | application/x-perfmon |
pmw | application/x-perfmon |
pnm | image/x-portable-anymap |
pot, | application/vnd.ms-powerpoint |
ppm | image/x-portable-pixmap |
pps | application/vnd.ms-powerpoint |
ppt | application/vnd.ms-powerpoint |
prf | application/pics-rules |
ps | application/postscript |
pub | application/x-mspublisher |
qt | video/quicktime |
ra | audio/x-pn-realaudio |
ram | audio/x-pn-realaudio |
ras | image/x-cmu-raster |
rgb | image/x-rgb |
rmi | audio/mid |
roff | application/x-troff |
rtf | application/rtf |
rtx | text/richtext |
scd | application/x-msschedule |
sct | text/scriptlet |
setpay | application/set-payment-initiation |
setreg | application/set-registration-initiation |
sh | application/x-sh |
shar | application/x-shar |
sit | application/x-stuffit |
snd | audio/basic |
spc | application/x-pkcs7-certificates |
spl | application/futuresplash |
src | application/x-wais-source |
sst | application/vnd.ms-pkicertstore |
stl | application/vnd.ms-pkistl |
stm | text/html |
svg | image/svg+xml |
sv4cpio | application/x-sv4cpio |
sv4crc | application/x-sv4crc |
swf | application/x-shockwave-flash |
t | application/x-troff |
tar | application/x-tar |
tcl | application/x-tcl |
tex | application/x-tex |
texi | application/x-texinfo |
texinfo | application/x-texinfo |
tgz | application/x-compressed |
tif | image/tiff |
tiff | image/tiff |
tr | application/x-troff |
trm | application/x-msterminal |
tsv | text/tab-separated-values |
txt | text/plain |
uls | text/iuls |
ustar | application/x-ustar |
vcf | text/x-vcard |
vrml | x-world/x-vrml |
wav | audio/x-wav |
wcm | application/vnd.ms-works |
wdb | application/vnd.ms-works |
wks | application/vnd.ms-works |
wmf | application/x-msmetafile |
wps | application/vnd.ms-works |
wri | application/x-mswrite |
wrl | x-world/x-vrml |
wrz | x-world/x-vrml |
xaf | x-world/x-vrml |
xbm | image/x-xbitmap |
xla | application/vnd.ms-excel |
xlc | application/vnd.ms-excel |
xlm | application/vnd.ms-excel |
xls | application/vnd.ms-excel |
xlt | application/vnd.ms-excel |
xlw | application/vnd.ms-excel |
xof | x-world/x-vrml |
xpm | image/x-xpixmap |
xwd | image/x-xwindowdump |
z | application/x-compress |
zip | application/zip |
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。