jsoup 1.8.1 发布，极大的性能提升！

作者：小梦来源: 网络时间: 2024-07-31 阅读: 大中小

第29期OSC源创会#南京#开始报名，AngularJS、Netty 等

jsoup 1.8.1 发布啦！

jsoup 1.8.1 显著提升了文本和树序列化的性能；可以选择 HTML 或者 XML 输出；还有大量的功能改进和 bug 修复。此版本现已提供下载。

更新内容如下：

改进

可以选择 HTML 或者 XML 输出，默认是 HTML
Element.text() 性能改进
Element.html() 性能改进
缩短文件读的时间，同时也改进了文件解析器，提升大概 10% 的速度
添加 Element.cssSelector()
Tightened the scope of what characters are escaped in attributes and textnodes, to align with the spec.
如果禁用了 pretty-print，将不会去除 Element.html() 以外的空格
HTML Cleaner 中允许基础白名单中带有 span 标签，relaxed whitelist 中带有 span 和 div 标签
放松 doctype 验证，可以不指定名称
CSS Selectors 支持 quoted 属性值

Bug 修复

Fixed an issue where <svg><img/></svg> was parsed as <svg><image/></svg>
Fixed an issue where a UTF-8 BOM character was not detected if the HTTP response did not specify a charset, and the HTML body did, leading to the head contents incorrectly being parsed into the body. Changed the behavior so that when the UTF-8 BOM is detected, it will take precedence for determining the charset to decode with.
Fixed an issue in parsing a base URI when loading a URL containing a http-equiv element.
Fixed an issue for Java 1.5 / Android 2.2 compatibility, and verify it doesn't regress.
Fixed an issue that would throw an NPE when trying to set invalid HTML into a title element.
Fixed support for nth-of-type selectors with unknown tags.
Added support for application/*+xml mimetypes.
Fixed support for allowing script tags in cleaner whitelists.

jsoup 是一款 Java 的HTML 解析器，可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API，可通过DOM，CSS以及类似于JQuery的操作方法来取出和操作数据。

jsoup的主要功能如下：

jsoup是基于MIT协议发布的，可放心使用于商业项目。

标签:HTML lt gt jsoup 改进 收藏本文