使用lucene query的CharFilter 去掉字符中的script脚本和html标签
1.准备数据,这里我从数据库读取一个带有html标签和script脚本的数据
代码:
@Before public void init(){ SQLService sqlService = new SQLService(); sqlService.regist(null); BaseDao bd = new BaseDao(); String sql = "select * from t where title like ‘% 每天读一遍,舌头更无敌%‘"; lists = bd.getList(sql); System.out.println(lists.size()); content = lists.get(0).get("content").toString(); // System.out.println(content); }
2. 使用字符过滤器-HTMLStripCharFilter 和 MappingCharFilter.由于这些字符过滤器都是继承Reader的.所以可以像读取reader那样处理.
代码:
@Test public void test2() throws IOException{ StringBuilder sb = new StringBuilder(); // html过滤 HTMLStripCharFilter htmlscript = new HTMLStripCharFilter(new StringReader(content)); //增加映射过滤 主要过滤掉换行符 NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder(); builder.add( "\r", "" );//回车 builder.add( "\t", "" );//横向跳格 builder.add( "\n", "" );//换行 CharFilter cs = new MappingCharFilter( builder.build(),htmlscript ); char[] buffer = new char[10240]; int count; while ((count = cs.read(buffer)) != -1) { sb.append(new String(buffer, 0, count)); } System.out.println(sb.toString()); cs.close(); // String keywords = HanLP.extractKeyword(sb.toString(), 20).toString(); // System.out.println(keywords); }
处理结果:
亲爱的小伙伴们,累了,就放松一下吧!1. Can you can a can as a canner can can a can?你能够像罐头工人一样装罐头吗?
2. I wish to wish the wish you wish to wish, but if you wish the wish the witch wishes, I won‘t wish the wish you wish to
wish. 我希望梦想着你梦想中的梦想,但是如果你梦想着女巫的梦想,我就不想梦想着你梦想中的梦想。3. I scream, you scream, we all scream
for ice-cream! 我叫喊,你叫喊,我们都喊着要冰淇淋!4. How many cookies could a good cook cook if a good cook could cook cookies?
A good cook could cook as much cookies as a good cook who could cook cookies. 如果一个好的厨师能做小甜饼,那么他能做多少小甜饼呢?
一个好的厨师能做出和其它好厨师一样多的小甜饼。5. The driver was drunk and drove the doctor‘s car directly into the deep ditch.
这个司机喝醉了,他把医生的车开进了一个大深沟里。6. Whether the weather be fine or whether the weather be not.Whether the weather
be cold or whether the weather be hot.We‘ll weather the weather whether we like it or not.无论是晴天或是阴天。无论是冷或是暖,
不管喜欢与否,我们都要经受风霜雨露。7. Peter Piper picked a peck of pickled peppers. A peck of pickled peppers Peter Piper
picked. If Peter Piper picked a peck of pickled peppers, Where‘s the peck of pickled peppers Peter Piper picked?
彼德派柏捏起一撮泡菜。 彼德派柏捏起的是一撮泡菜。 那么彼德派捏起的泡菜在哪儿?8. I thought a thought. But the thought I thought
wasn‘t the thought I thought I thought. If the thought I thought I thought had been the thought I thought, I wouldn‘t
have thought so much. 我有一种想法,但是我的这种想法不是我曾经想到的那种想法。如果这种想法是我曾经想到的想法,我就不会想那么多了。
9. Amid the mists and coldest frosts, With barest wrists and stoutest boasts, He thrusts his fists against the posts,
And still insists he sees the ghosts. 雾蒙蒙,冰霜冻, 手腕儿空空,话儿涌, 只见他猛所拳头往柱子上砸, 直说自己把鬼碰。
10. Badmin was able to beat Bill at billiards, but Bill always beat Badmin badly at badminton.
巴德明在台球上能够打败比尔,但是打羽毛球比尔常常大败巴德明。11. Betty beat a bit of butter to make a better butter.
贝蒂敲打一小块黄油要做一块更好的奶油面。12. Rita repeated what Reardon recited when Reardon read the remarks.
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。