Return All The HtmlPage's HTML
I want the entire HTML for a given HtmlPage object. What property should I use?
Solution 1:
In HtmlUnit, an HtmlPage
implements the Page
interface; that means that you can use Page#getWebResponse()
to get the entire web response returned to generate the HtmlPage
, and from there it's easy (WebResponse#getContentAsString()
). Here's a method that does what you want...
public String getRawPageText(WebClient client, String url)
throws FailingHttpStatusCodeException, MalformedURLException, IOException {
HtmlPage page = client.getPage(url);
return page.getWebResponse().getContentAsString();
}
Or, using an HtmlPage
object that you've already fetched:
public String getRawPageText(HtmlPage page) {
return page.getWebResponse().getContentAsString();
}
Solution 2:
The quickest way to do this is HtmlPage.asXml
-- It may not be perfect, as in, it may not exactly match what you would see if you did "View Source" in a normal browser, but I've found it to be very helpful for developing and debugging HtmlUnit code.
Post a Comment for "Return All The HtmlPage's HTML"