Skip to content Skip to sidebar Skip to footer

How To Use Htmlagilitypack To Extract Html Data

I am learning to write web crawler and found some great examples to get me started but since I am new to this, I have a few questions in regards to the coding method. The search re

Solution 1:

This little snippet should get you started:

HtmlDocumentdoc=newHtmlDocument();
WebClientclient=newWebClient();
stringhtml= client.DownloadString("http://www.nysed.gov/coms/op001/opsc2a?profcd=67&plicno=001475&namechk=WIL");
doc.LoadHtml(html);

HtmlNodeCollectionnodes= doc.DocumentNode.SelectNodes("//div");

You basically use the WebClient class to download the HTML file and then you load that HTML into the HtmlDocument object. Then you need to use XPath to query the DOM tree and search for nodes. In the above example "nodes" will include all the div elements in the document.

Here's a quick reference about the XPath syntax: http://msdn.microsoft.com/en-us/library/ms256086(v=vs.110).aspx

Post a Comment for "How To Use Htmlagilitypack To Extract Html Data"