学习common-upload源码,理解上传原理

浏览数：53 / 时间：2015年06月09日

之前介绍了只使用一段代码实现上传文件的方法。最近又试了几次，才发现这个是有问题的。
例如，要上传原文件如下的文件：
技术分享
使用jsp上传后发现文件变成了下面这个样子，图片下面的像素没有了。

然后我又用common-upload1.2试了下，能够正常上传，文件不会变样。怎么会这个样子呢？

要解决这个问题先了解下背景知识，浏览器发http请求时一般的post方法会把输入控件的name与value拼成一个字符串做为请求体传给服务器。而对于method="post" enctype="multipart/form-data"这种上传文件的http请求的请求头中加了类似这样的信息：
Content-Length    5137
Content-Type    multipart/form-data; boundary=---------------------------203012335616103

这个boundary代表请求体里的分隔符，就是整个请求体以这个分隔符的值把真正的文件内容包起来。
请求体是如下格式：

-----------------------------221842651231148
Content-Disposition: form-data; name="upfile"; filename="srcfile (2).jpg"
Content-Type: image/jpeg
<!—这里上传文件内容（字节数组）—》
-----------------------------221842651231148

可以看到分隔符里的内容除了文件字节,还在文件内容之前加一些文件基本信息。

下面分析下为什么jsp没有正确上传文件，先看看jsp上传代码:
1.取得contextLength的所有字节数组
in = new DataInputStream(request.getInputStream());
                int dataLength = request.getContentLength();
while (totalBytesRead < dataLength) {
                    byteRead = in.read(dataBytes, totalBytesRead,
                            dataLength);
                    totalBytesRead += byteRead;
                }
2.取文件的开始位置与结束位置
int pos;
                pos = file.indexOf("filename=\"");
                pos = file.indexOf("\n", pos) + 1;
                pos = file.indexOf("\n", pos) + 1;
                pos = file.indexOf("\n", pos) + 1;
                int boundaryLocation = file.indexOf(boundary, pos) - 4;                int startPos = ((file.substring(0, pos)).getBytes()).length;
                int endPos = ((file.substring(0, boundaryLocation))
                        .getBytes()).length;
3.把开始位置到结束位置的字节数组转换为文件，输出到磁盘。
FileOutputStream fos=new FileOutputStream(fileName)
fos.write(dataBytes, startPos, (endPos - startPos));
                fileOut.close();

那么common-upload又怎么做的呢，代码较多，这里只分析一些关键逻辑代码：
1.    解析文件，并请求头中取出分隔符
//rquest对象转为FileItemStreamIterator对象
org.apache.commons.fileupload.servlet.ServletFileUpload[line:146]
FileItemIterator fii = new ServletFileUpload().getItemIterator(request);
//取Header
org.apache.commons.fileupload.FileUploadBase[line:976]
FileItemHeaders headers = getParsedHeaders(multi.readHeaders());
//取Header里的ContentType里的分隔符
org.apache.commons.fileupload.FileUploadBase[line:397]
protected byte[] getBoundary(String contentType) {

2.    找出请求体里分隔符之间的有效数据。
//找分隔符的方法如下，buffer里的字节要与boundary里的字节完全一样才认为是分隔符：
org.apache.commons.fileupload.MultipartStream[line:708]
protected int findSeparator() {
        int first;
        int match = 0;
        int maxpos = tail - boundaryLength;
        for (first = head;
        (first <= maxpos) && (match != boundaryLength);
        first++) {
            first = findByte(boundary[0], first);
            if (first == -1 || (first > maxpos)) {
                return -1;
            }
            for (match = 1; match < boundaryLength; match++) {
                if (buffer[first + match] != boundary[match]) {
                    break;
                }
            }
        }
        if (match == boundaryLength) {
            return first - 1;
        }
        return -1;
}
3.    把分隔符分隔的每段有效数据汇总成为一个字节数组，最后形成文件。这里使用了如下的类和方法来汇总：
org.apache.commons.fileupload.MultipartStream[line:784]
public class ItemInputStream extends InputStream implements Closeable
public int read(byte[] b, int off, int len) throws IOException {
            if (closed) {
                throw new FileItemStream.ItemSkippedException();
            }
            if (len == 0) {
                return 0;
            }
            int res = available();
            if (res == 0) {
                res = makeAvailable();
                if (res == 0) {
                    return -1;
                }
            }
            res = Math.min(res, len);
            System.arraycopy(buffer, head, b, off, res);
            head += res;
            total += res;
            return res;
        }
最后total数就是最终的文件字节数了，再用FileOutputStream即可写为文件。这个available函数和怎么覆盖InputStream的还没弄得太明白，还需进一步了解流的底层流程。
由上面可以看出jsp上传代码错误主要是分隔符识别不正确导致的。