jsoup 1.8.2 发布,HTML 解析器
4月18日 武汉 源创会开始报名,送华为开发板
jsoup 1.8.2 发布,此版本提升了 Android,HTML 解析,HTML 生成,查询等方面的性能。同时添加了文件上传,W3C DOM 互操作等功能,还有其他的改进和 bug 修复。
更新内容
改进
提升 Android 解析 HTML 的性能
提升 Android HTML 序列化的性能
加快 Andorid 上字符集编码速度
提升 Andorid 上 selector 类的性能
支持文件上传
Add a meta-charset element to documents when setting the character set
Added ability to disable TLS (SSL) certificate validation
Added ability to further tweak the canned Cleaner Whitelists by removing existing settings.
Added option in Cleaner Whitelist to allow linking to in-page anchors (#)
Use a lowercase doctype tag for HTML5 documents.
Add support for 201 Created with redirect, and other status codes
Added support for HTTP method verbs PUT, DELETE, and PATCH.
Added support for overriding the default POST character of UTF-8 in Connection.
W3C DOM support: added ability to convert from a jsoup document to a W3C document
In the HtmlToPlainText example program, added the ability to filter using a CSS selector
Improved the equals() and hashcode() methods in Node
Improved performance in Selector when searching multiple roots.
Bug 修复
Fixed validation of cookie names in HttpConnection cookie methods.
Fixed an issue where option tags would be missed when preparing a form for submission if missing a selected attribute.
Fixed an issue where submitting a form would incorrectly include radio and checkbox values without the checked attribute.
Fixed an issue where Element.classNames() would return a set containing an empty class; and may have extraneous whitespace.
Fixed an issue where attributes selected by value were not correctly space normalized.
In head+noscript elements, treat content as character data, instead of jumping out of head parsing.Fixed performance issue when parsing HTML with elements with many children that need re-parenting.
Fixed an issue where a server returning an unsupport character set response would cause a runtime
UnsupportedCharsetException, instead of falling back to the default UTF-8 charset.
Fixed an issue where Jsoup.Connection would throw an IO Exception when reading a page with zero content-length.
OSChina 使用 jsoup 来解析 HTML。
jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于JQuery的操作方法来取出和操作数据。
jsoup的主要功能如下:
从一个URL,文件或字符串中解析HTML;
使用DOM或CSS选择器来查找、取出数据;
可操作HTML元素、属性、文本;
jsoup是基于MIT协议发布的,可放心使用于商业项目。