【开源一个小工具】一键将网页内容推送到Kindle

浏览数：28 / 时间：2015年06月09日

最近工作上稍微闲点，这一周利用下班时间写了一个小工具，其实功能挺简单但也小折腾了会。

工具名称：Simple Send to Kindle

Github地址：https://github.com/zhanjindong/SimpleSendToKindle

功能：Windows下一个简单的将网页内容推送到Kindle的工具。

写这个工具的是满足自己的需求。自从买了Kindle paperwhite 2，它就成了我使用率最高的一个电子设备。相信很多Kindle拥有者和我一样都有这样一个需求：就是白天网上看到了一些好文章没时间看，就想把它推送到Kindle上，晚上睡觉前躺在床上慢慢看。之前我一直用的是一个叫KindleMii的工具，但是发现经常推送的内容图片丢失了，Chrome应用商店里有一个叫做Send to Kindle的工具但是装了之后不知道什么原因用不了，于是我就想不如自己动手写一个，名字就叫Simple Send to Kindle。

原理

原理很简单，就是通过Chrome扩展程序将网页链接发送给本地的一个Java写的程序，这个程序将网页内容下载下来并转换为Kindle的mobi格式，然后再通过kindle的邮箱发送给Kindle设备。

工具的核心功能是利用Amazon提供的一个叫kindlegen的程序生成mobi文件，大家也可以离线使用这个工具将网页内容生成各种Kindle支持的格式，另外一个核心是Chrome扩展和本地程序的Native Messaging这个浪费了我挺长时间，后面会简单介绍下。

如何使用

1、用mvn assembly打包，打包后目录如下：

2、工具可以放到任何地方，然后执行setup.bat这个脚本。

3、安装Chrome扩展。在Chrome里输入chrome://extension就可以进入扩展管理：点加载正在开发的扩展程序，选择ext下的Chrome目录就可以以开发者模式加载扩展程序了，可以看到每个扩展都有一个唯一标识ID，这个后面配置会用到。

加载成功就可以在浏览器地址栏右边看到这个logo了：

4、工具已经安装成功了下面进行一些简单配置就可以了：

1)打开SimpleSendToKindle.json这个文件：将allowed_origins里面的内容修改为上面Chrome扩展的ID。

2)sstk.properties里面是一些工具的通用配置：

#整个服务的超时时间
sstk.service.timeout = 120000
#网页内容或图片的下载超时时间
sstk.download.timeout = 15000
#是否删除临时目录
sstk.download.deleteTmpDir = false

mail.smtp.starttls.enable=true
mail.smtp.socketFactory.port=25
mail.smtp.host=smtp.126.com
mail.host=smtp.126.com
mail.smtp.auth=true
mail.transport.protocol=smtp
mail.userName=XXX
mail.password=iflytek
mail.from=XXX@126.com
mail.to=[email protected]

#debug
sstk.debug.sendMail = false

主要配置的就是邮箱这块，mail.to配置是你的Kindle邮箱，mail.from是用来发送的邮箱，我这里用的是126，其他邮箱也都支持smtp，有Kindle的同学都知道要想Kindle收到邮件发送的内容必须将发送油箱添加到Amazon认可的邮箱列表中。

都配置好后看到你想要推送的页面，只要轻轻点击下就Ok了。

稍等片刻，查看你的Kindle，效果如下：

遇到的一些问题

工具虽然简单，但是从思路到成型，过程也遇到了一些问题，这里跟大家分享下，有兴趣的同学可以接着往下看。

实现思路

有了想法后首先要想的就是实现思路，一开始想用JavaScript写，最后只要安装一个Chrome扩展程序就可以了，这样肯定是Simple的，但是最后还是放弃这个想法，一来我对JS基本不会，二来写这个工具的目的是为了满足自己的需求，怎么快怎么来，什么技术熟悉就用什么，所以最后还是决定用Chrome扩展和Java程序通信这种方式。但这过程发现了一些很有用的工具，我在最后会推荐给大家。

Chrome扩展开发

我一直用的都是chrome，所以想到了开发Chrome下的插件（Chrome下叫Extension扩展）。那首先要解决的就是如何开发Chrome插件？开发chrome扩展很简单，官方有一个入门例子非常简单，一看就懂http://chrome.liuyixi.com/getstarted.html。这里推荐园子里的一篇文章：Chrome插件（Extensions）开发攻略。

Chrome扩展和本地程序通信

官方术语叫做Native Messaging，具体技术细节这里不啰嗦了，有兴趣的同学可以网上搜下，这里指简单介绍下。chrome扩展在Windows下是通过HKEY_CURRENT_USER\Software\Google\Chrome\NativeMessagingHosts\这个注册表下面的内容和一个.json的清单文件来找到你的Native App的。上面的setup.bat就是用来写入注册表的，SimpleSendToKind.json就是清单文件：

@echo off
reg add HKEY_CURRENT_USER\Software\Google\Chrome\NativeMessagingHosts\so.zjd.sstk /ve /t REG_SZ /d %~dp0\SimpleSendToKindle.json /f

setup.bat将so.zjd.sstk这个“程序”注册到chrome关心的注册表下，Chrome通过它找到标识应用程序信息的清单文件：

{
    "name":"so.zjd.sstk",
    "description":"Simple Send to Kindle(by zjd.so)",
    "path":"startup.exe",
        "type":"stdio",
    "allowed_origins":[
        "chrome-extension://jnihbngmnjbmchfhcdfabofamnfcljaf/"

    ]
}

path是本地程序的路径，除了注意程序的权限问题外，还要注意这里path里面如果有路径分隔符必须是双斜杠“//”。

Chrome是通过系统的标准输入输出和本地程序进行通信，具体协议如下：

Chrome 浏览器在单独的进程中启动每一个原生消息通信宿主，并使用标准输入（stdin）与标准输出（stdout）与之通信。向两个方向发送消息时使用相同的格式：每一条消息使用 JSON 序列化，以 UTF-8 编码，并在前面附加 32 位的消息长度（使用本机字节顺序）。

协议其实很简单，但是这块却浪费了我好长时间，我用Java死活无法读取Chrome写入标准输入的内容，总是报下面的错误：

一开始怀疑自己的写的代码有问题，网上搜了半天有说是JDK的问题，我重装还是不行。后来我发现Chrome传给程序其实有两个参数，一个windwos的句柄，一个Chrome扩展的ID：

arg 0:--parent-window=3349886
arg 1:chrome-extension://oojaanpmaapemaihjbebgojmblljbhhh/

所以我就想Java能不能直接从Windows句柄读数据，因为Java确实提供了一个FileDescriptor类，但折腾了半天发现原生的Java并不支持这么干。最后没办法下，想出了非常丑陋的解决办法，利用C#来做下中转，所以才多了个startup.exe，C#代码写的很顺利，这也让我对Java是累感不爱啊。

  1 using System;
  2 using System.Collections.Generic;
  3 using System.Linq;
  4 using System.Text;
  5 using System.IO;
  6 using System.Diagnostics;
  7 
  8 namespace Startup
  9 {
 10     class Program
 11     {
 12         static void Main(string[] args)
 13         {
 14             try
 15             {
 16                 if (!Directory.Exists(System.AppDomain.CurrentDomain.BaseDirectory + "\\log"))
 17                 {
 18                     Directory.CreateDirectory(System.AppDomain.CurrentDomain.BaseDirectory + "\\log");
 19                 }
 20 
 21                 if (args.Length == 0)
 22                 {
 23                     WriteStandardStreamOut("Missing parameter.");
 24                     Log2File("Missing parameter.");
 25                     return;
 26                 }
 27 
 28                 string url = ReadStandardStreamIn();
 29                 Log2File("Running SimpleSendToKindle.jar with url:" + url);
 30                 string ret = RunJar(url);
 31                 Log2File("Completed with return msg:" + ret);
 32                 WriteStandardStreamOut("{\"text\":\"" + ret + "\"}");
 33             }
 34             catch (Exception ex)
 35             {
 36                 Log2File("Error:" + ex.ToString());
 37                 WriteStandardStreamOut("{\"text\":\"" + "Error." + ex.Message + "\"}");
 38             }
 39         }
 40 
 41         static string RunJar(string arg)
 42         {
 43             ProcessStartInfo startInfo = new ProcessStartInfo()
 44             {
 45                 WorkingDirectory = System.AppDomain.CurrentDomain.BaseDirectory,
 46                 UseShellExecute = false,//要重定向 IO 流，Process 对象必须将 UseShellExecute 属性设置为 False。
 47                 CreateNoWindow = true,
 48                 RedirectStandardOutput = true,
 49                 //RedirectStandardInput = false,
 50                 WindowStyle = ProcessWindowStyle.Normal,
 51                 FileName = "java.exe",
 52                 Arguments = @" -Dfile.encoding=utf-8 -jar SimpleSendToKindle.jar " + arg,
 53             };
 54             //启动进程
 55             using (Process process = Process.Start(startInfo))
 56             {
 57                 process.Start();
 58                 //process.WaitForExit();
 59                 using (StreamReader reader = process.StandardOutput)
 60                 {
 61                     return reader.ReadToEnd();
 62                 }
 63             }
 64         }
 65 
 66         static void Log2File(string s)
 67         {
 68             FileStream fs = new FileStream(System.AppDomain.CurrentDomain.BaseDirectory + @"log/startup.log", FileMode.Append);
 69             StreamWriter sw = new StreamWriter(fs, Encoding.UTF8);
 70             sw.WriteLine(s);
 71             sw.Close();
 72             fs.Close();
 73         }
 74 
 75         static string ReadStandardStreamIn()
 76         {
 77             using (Stream stdin = Console.OpenStandardInput())
 78             {
 79                 int length = 0;
 80                 byte[] bytes = new byte[4];
 81                 stdin.Read(bytes, 0, 4);
 82                 length = System.BitConverter.ToInt32(bytes, 0);
 83 
 84                 byte[] msgBytes = new byte[length];
 85                 stdin.Read(msgBytes, 0, length);
 86 
 87                 string decodeMsg = Microsoft.JScript.GlobalObject.decodeURI(System.Text.Encoding.UTF8.GetString(msgBytes));
 88                 return decodeMsg;
 89             }
 90         }
 91 
 92         static void WriteStandardStreamOut(string msg)
 93         {
 94             int length = msg.Length;
 95             byte[] lenBytes = System.BitConverter.GetBytes(length);
 96             byte[] msgBytes = System.Text.Encoding.UTF8.GetBytes(msg);
 97             byte[] wrapBytes = new byte[4 + length];
 98             Array.Copy(lenBytes, 0, wrapBytes, 0, 4);
 99             Array.Copy(msgBytes, 0, wrapBytes, 4, length);
100 
101             using (Stream stdout = Console.OpenStandardOutput())
102             {
103                 stdout.Write(wrapBytes, 0, wrapBytes.Length);
104             }
105         }
106     }
107 }

View Code

Chrome扩展获取当前页面的url

园子里那个例子里是在content_script.js里用document.URL，但是我发现这有个问题，每次必须重新加载页面，不然这个值好像全局就一个。发现用chrome.tabs.getSelected这个事件监听更好些：

chrome.tabs.getSelected(null,function(tab) {
    var port = null;
    var nativeHostName = "so.zjd.sstk";
    port = chrome.runtime.connectNative(nativeHostName);

    port.onMessage.addListener(function(msg) { 
        //console.log("Received " + msg); 
        $("#message").text(msg.text);
    });

    port.onDisconnect.addListener(function onDisconnected(){
        //console.log("connetct native host failure:" + chrome.runtime.lastError.message);
        port = null;
        //$("#message").text("Finished!");
    });
     
    port.postMessage(encodeURI(tab.url)) 

});

popup.js

图片解析

其实右键将网页另存为为html后就能利用kindlegen生成mobi文件了，或者利用Amazon的邮箱服务直接将html文件发送给Kindle，也能自动转换成mobi。但是之所以要写这个工具的原因就是kindlegen也好，kindle邮箱服务也好都不会去主动下载页面里的图片，kindlegen需要你将页面里图片或其他资源的地址转换成相对路径，然后将资源统一放在一个文件家里。

所以处理也很简单解析页面img元素内容，自己将图片下载下来然后将src替换成相对路径就OK了，需要注意的就是网页图片引用的几种方式：

./images/mem/figure9.png
images/mem/figure9.png
/images/mem/figure9.png

代码大致如下：

private static final char[] IMG_START_TAG = new char[] { ‘<‘, ‘i‘, ‘m‘, ‘g‘ };
private static final char[] END_TAG = new char[] { ‘/‘, ‘>‘ };

protected void processResources(PageEntry page) {
        StringBuilder processed = new StringBuilder();
        StringBuilder element = new StringBuilder();
        StringBuilder content = page.getContent();
        for (int i = 0; i < content.length(); i++) {
            char c = content.charAt(i);
            if (imgIndex < 4 && c == IMG_START_TAG[imgIndex]) {
                imgIndex++;
            } else {
                imgIndex = 0;
            }
            if (imgIndex == 4) {// <img
                processed.delete(processed.length() - 3, processed.length());
                element.append("<img");
                imgIndex = 0;
                while (i < content.length() - 1) {
                    c = content.charAt(++i);
                    element.append(c);
                    if (imgIndex < 2 && c == END_TAG[imgIndex]) {
                        imgIndex++;
                    } else {
                        imgIndex = 0;
                    }
                    if (imgIndex == 2) {
                        imgIndex = 0;
                        break;
                    }
                }
                processed.append(downloadResource(element.toString(), 0));
                element.delete(0, element.length());
            } else {
                processed.append(c);
            }
        }
        page.setContent(processed);
    }
    private String downloadResource(String element, int type) {
        String pattern = "";
        if (type == 0) {
            pattern = "(?<=src=\").*?(?=\")";
        } else if (type == 1) {
            pattern = "(?<=href=\").*?(?=\")";
        } else {
            return element;
        }

        List<String> matchs = RegexUtils.findAll(pattern, element, false);
        if (matchs.isEmpty()) {
            return element;
        }
        String url = processRelativeUrl(matchs.get(0));
        final String fileName = getFileName(url);
        final String result = RegexUtils.replaceAll(pattern, element, GlobalConfig.RESOURCE_DIR_NAME + "/" + fileName,
                false);
        final ResourceEntry res = new ResourceEntry(fileName, url, page.getResourceDir() + fileName);
        FutureTask<Boolean> task = new FutureTask<>(new Callable<Boolean>() {
            @Override
            public Boolean call() throws Exception {
                try (OutputStream os = new FileOutputStream(res.getSavePath())) {
                    HttpHelper.download(res.getDownloadUrl(), GlobalConfig.DOWNLOAD_TIMEOUT, os);
                    LOGGER.debug("downloaded resource:" + res.toString());
                } catch (Exception e) {
                    LOGGER.error("download resource error:" + res.getDownloadUrl(), e);
                }
                return true;
            }
        });
        downloaders.submit(task);
        futureTasks.add(task);
        return result;
    }

    // ./images/mem/figure9.png
    // images/mem/figure9.png
    // /images/mem/figure9.png
    private String processRelativeUrl(String url) {
        if (url.startsWith("http://")) {
            return url;
        }
        String pageUrl = page.getUrl();

        if (url.startsWith("/")) {
            int index = pageUrl.indexOf(‘/‘, 7);
            pageUrl = pageUrl.substring(0, index + 1);
        } else {
            int index = pageUrl.lastIndexOf("/");
            pageUrl = pageUrl.substring(0, index + 1);
        }
        imgIndex = url.indexOf("/");
        if (imgIndex != -1) {
            url = url.substring(imgIndex + 1);
        }
        url = pageUrl + url;

        return url;
    }