Android 利用jsoup解析 html



原由:我在访问服务器时,回来的数据是html源文件,我需要从html源文件中获取对自己有用的信息。

一般情况下jsoup多用于JAVA方面的开发,可以利用jQuery,DOM一同使用。可以达到抓取其它网站的信息。

好了开始介绍如何使用jsoup在android中解析html的使用

1. 下载jsoup包 http://jsoup.org/download

jsoup官网:http://jsoup.org/

我这里下载的是jsoup-1.7.2.jar core library

2. 将下载下来的jar包放到放到Android工程中 libs目录下

在ADT稍微新一点的版本,放到libs下刷新一下,就可以用了

如果ADT比较老可能需要自己手动引入一下jar包(这个自己网上查一下)

3.在代码中使用

这个是一个中文开发手册

http://www.open-open.com/jsoup/ 大家可以学习一下。

简单附几句代码:

[java]view plaincopyprint?

  1. String html = "<html><head><title>First parse</title></head>"
  2. + "<body><p>Parsed HTML into a doc.</p></body></html>";
  3. Document doc = Jsoup.parse(html);
String html = "<html><head><title>First parse</title></head>"
  + "<body><p>Parsed HTML into a doc.</p></body></html>";
Document doc = Jsoup.parse(html);
上面这个例子比较简单,直接就将html转换成Document实例了,之后再通过Elements,Element来进一步解析

再给大家贴一段我自己代码中使用的代码,仅供参考:

需要解析的html文件(只是body中的一段)

[html]view plaincopyprint?

  1. <div >
  2. <a href="javascript:void(0);" onclick="window.parent.RightShow('/china/ask?action=Chat-toSay-12997025-'); window.parent.checkinform('/china/ask?action=Chat-toSay-12997025-justRemove-1-',0,0);">
  3. <div >
  4. <img src="/image/1/39.png" width="30px" height="30px">
  5. </div>
  6. <div >
  7. <div >
  8. <font color="#800000" size="3px"> HaiFei-PC</font>
  9. </div>
  10. <div >
  11. 3天前
  12. </div>
  13. </div>
  14. <div > 我爱你</div>
  15. </a>
  16. </div>
  17. <div >
  18. <a href="javascript:void(0);" onclick="window.parent.RightShow('/china/ask?action=Chat-toSay-12996969-'); window.parent.checkinform('/china/ask?action=Chat-toSay-12996969-justRemove-1-',0,0);">
  19. <div >
  20. <img src="/upload/userface/1/2/9/9/6/9/6/9/2.png" width="30px" height="30px">
  21. </div>
  22. <div >
  23. <div >
  24. <font color="#800000" size="3px"> ethen</font>
  25. </div>
  26. <div >
  27. 3天前
  28. </div>
  29. </div>
  30. <div > [上传语音]</div>
  31. </a>
  32. </div>
  33. <div >
  34. <a href="javascript:void(0);" onclick="window.parent.RightShow('/china/ask?action=Chat-toSay-12996951-'); window.parent.checkinform('/china/ask?action=Chat-toSay-12996951-justRemove-1-',0,0);">
  35. <div >
  36. <img src="/image/2/7.png" width="30px" height="30px">
  37. </div>
  38. <div >
  39. <div >
  40. <font color="#800000" size="3px"> cooler</font>
  41. </div>
  42. <div >
  43. 4天前
  44. </div>
  45. </div>
  46. <div > hello</div>
  47. </a>
  48. </div>
<div >
        <a href="javascript:void(0);" onclick="window.parent.RightShow('/china/ask?action=Chat-toSay-12997025-');            window.parent.checkinform('/china/ask?action=Chat-toSay-12997025-justRemove-1-',0,0);">
                <div >
                        <img  src="/image/1/39.png" width="30px" height="30px">
                </div>
                <div >
                        <div >
                        <font color="#800000" size="3px"> HaiFei-PC</font>
                        </div>
                        <div >
                        3天前
                        </div>
                </div>
                <div > 我爱你</div>
        </a>
        </div>
        <div >
        <a href="javascript:void(0);" onclick="window.parent.RightShow('/china/ask?action=Chat-toSay-12996969-');            window.parent.checkinform('/china/ask?action=Chat-toSay-12996969-justRemove-1-',0,0);">
                <div >
                        <img  src="/upload/userface/1/2/9/9/6/9/6/9/2.png" width="30px" height="30px">
                </div>
                <div >
                        <div >
                        <font color="#800000" size="3px"> ethen</font>
                        </div>
                        <div >
                        3天前
                        </div>
                </div>
                <div > [上传语音]</div>
        </a>
        </div>
        <div >
        <a href="javascript:void(0);" onclick="window.parent.RightShow('/china/ask?action=Chat-toSay-12996951-');            window.parent.checkinform('/china/ask?action=Chat-toSay-12996951-justRemove-1-',0,0);">
                <div >
                        <img  src="/image/2/7.png" width="30px" height="30px">
                </div>
                <div >
                        <div >
                        <font color="#800000" size="3px"> cooler</font>
                        </div>
                        <div >
                        4天前
                        </div>
                </div>
                <div > hello</div>
        </a>
        </div>

MainActivity.java中的一个函数

[java]view plaincopyprint?

  1. private void parseHtmlUseJsoup(String html) {
  2. chatListModelList.clear();
  3. Document doc = Jsoup.parse(html);
  4. Element chatcontentElement = doc.getElementById("chatcontent");
  5. Elements chatElements = chatcontentElement.getElementsByTag("a");
  6. //String linkHref1 = chatList.attr("onclick");//window.parent.RightShow('/china/ask?action=Chat-toSay-12997025-');
  7. for (Element chatElement : chatElements) {
  8. ChatListModel chatListModel = new ChatListModel();
  9. // parse token id
  10. String tokenId = chatElements.attr("onclick");
  11. final String tokenIdPrefix = "/china/ask?action=Chat-toSay-";
  12. tokenId = tokenId.substring(tokenId.indexOf(tokenIdPrefix)+tokenIdPrefix.length(), tokenId.indexOf("-');"));
  13. chatListModel.setTokenId(tokenId);
  14. // parse image url
  15. Elements imgElements = chatElement.getElementsByTag("img");
  16. String imageUrl = imgElements.attr("src");
  17. imageUrl = NavigationUrl.getBaseUrl() + imageUrl;
  18. chatListModel.setImageUrl(imageUrl);
  19. // parse name time and content
  20. String chatElementText = chatElement.text(); // ethen 4天前 [上传语音]
  21. ///IWLog.d(TAG, "chatElementText=" + chatElementText);
  22. String[] str = chatElementText.split(" ");
  23. chatListModel.setFriendName(str[0]);
  24. chatListModel.setTime(str[1]);
  25. chatListModel.setContent(str[2]);
  26. chatListModelList.add(chatListModel);
  27. }
  28. //IWLog.d(TAG, "linkText=" + chatListModelList.toString());
  29. }
private void parseHtmlUseJsoup(String html) {
                chatListModelList.clear();
                
                        Document doc = Jsoup.parse(html);
                        Element chatcontentElement = doc.getElementById("chatcontent");
                        Elements chatElements = chatcontentElement.getElementsByTag("a");
                        
                        //String linkHref1 = chatList.attr("onclick");//window.parent.RightShow('/china/ask?action=Chat-toSay-12997025-');
                        for (Element chatElement : chatElements) {
                                ChatListModel chatListModel = new ChatListModel();
                                // parse token id
                                String tokenId = chatElements.attr("onclick");
                                final String tokenIdPrefix = "/china/ask?action=Chat-toSay-";
                                tokenId = tokenId.substring(tokenId.indexOf(tokenIdPrefix)+tokenIdPrefix.length(), tokenId.indexOf("-');"));
                                chatListModel.setTokenId(tokenId);
                                
                                // parse image url
                                Elements imgElements = chatElement.getElementsByTag("img");
                                String imageUrl = imgElements.attr("src");
                                imageUrl = NavigationUrl.getBaseUrl() + imageUrl;
                                chatListModel.setImageUrl(imageUrl);
                                
                                // parse name time and content
                                String chatElementText = chatElement.text(); // ethen 4天前 [上传语音]
                                ///IWLog.d(TAG, "chatElementText=" + chatElementText);
                                String[] str = chatElementText.split(" ");
                                chatListModel.setFriendName(str[0]);
                                chatListModel.setTime(str[1]);
                                chatListModel.setContent(str[2]);
                                chatListModelList.add(chatListModel);
                        }
                        //IWLog.d(TAG, "linkText=" + chatListModelList.toString());

                }

以上代码仅用于参考,可能与您实际遇到的问题不一样。这里只是想表达Android可以利用jsoup来解析html

可以参考网址

http://jsoup.org/

http://www.open-open.com/jsoup/