How To Extract Innermost Table From Html File With The Help Of The Html Agility Pack?
I am parsing the tabular information from the html file with the help of the html agility pack. Now I can do it and it works. But when the table what I want to extract is inner mo
Solution 1:
Load the document as a HtmlDocument. Then use an XPath query to find a table that contains no other tables and which has a td in the first row containing "Name".
The XPath implementation is the standard .NET one from System.Xml.XPath
, so any documentation about using XPath with XmlDocument will be applicable.
HtmlDocumentdoc=newHtmlDocument();
doc.Load("file.html");
HtmlNodeel= (HtmlNode) doc.DocumentNode.SelectSingleNode("//table[not(descendant::table) and tr[1]/td['NAME' = normalize-space()]]");
If the "Name" column was fixed, you could use something like 'Name' = normalize-space(tr[1]/td[2])
.
To find a table based on several column names, but not the inner most table condition.
HtmlNodeel= (HtmlNode) doc.DocumentNode.SelectSingleNode("//table[tr[1]/td['NAME' = normalize-space()] and tr[1]/td['ADDRESS' = normalize-space()]]");
Solution 2:
var table = doc.DocumentNode.SelectSingleNode("//table [not(descendant::table) and tr[1]/td[normalize-space()='ADDRESS'] ]");
Post a Comment for "How To Extract Innermost Table From Html File With The Help Of The Html Agility Pack?"