CSharp .Net Tutorial: Read HTML Source

James Cloud
Introduction

If you are making an HTML editor or if you need to retrieve and store the HTML source code of a website inside a string in your project, then follow the step given in this tutorial.

Step One - Add a Web Browser Control

The Web Browser control that is included in the .Net Framework is perfect for this situation. So, add the web browse control to your form by clicking and dragging it from the controls toolbox. The newly placed control itself will do nothing on its own. It must be directed to navigate to the desired web page using the 'Navigate()' method found inside the control itself. Like so,

// Navigate to a Web Page
webBrowser1.Navigate("http://classics.mit.edu/Aristotle/categories.html");

This source code, when executed, will make the web browser control navigate to the URL passed through the parameter.

Step Two - Retrieving HTML Code

The web browser control store the web site's source code inside the 'DocumentText' property string. With the web browser control in place and navigated to the desired website, it is now time to read and store the web page's HTML code. To accomplish this, first, create a string and pass the value return by the function into the string. Here is an example,

// Create a Holder for the Source Code and Store the Code inside it
String HTML = webBrowser1.DocumentText;

This code will only work if it is executed after the web page has been fully loaded. Therefore, it is recommended that when gaining information from the web browser control it should be done inside the 'DocumentCompleted' event.

Conclusion

You can save the string containing the HTML code to a HTML file using this code,

System.IO.FileInfo htmlFile = new System.IO.FileInfo("C:\\source.html");
System.IO.StreamWriter Writer = htmlFile.CreateText();
Writer.Write(HTML);
Writer.Close();

The above code will create a new source.HTML file and write the string 'HTML' into it, therefore saving the HTML source to file. You can replace the 'C:\\source.HTML' with the path you want to save the HTML file to.

Published by James Cloud

I like to program and do basically anything that has to do with technology and computers.  View profile

To comment, please sign in to your Yahoo! account, or sign up for a new account.