Sunday, September 12, 2010

Scraping web pages automatically with C#

I frequently find myself writing small applications to scrape a web page to track a package, keep tabs on a product price, etc. I finally broke down and wrote a generic .NET 4 C# application that will do this for me via and XML "task" file. I execute the console application via the Windows Task Manager each day.

Your task file can include as many tasks (page scrapes) as you want and the application will notify you (via email) when it finds a successful comparison.

The magic works by you providing a web page URL to visit, a regular expression to find a pattern on the page, and a value to compare that pattern against. You can be notified when that pattern is different, is the same, or when it changes. The console application is full of nice little features that are configured via the task XML file.

Remember, this requires .NET 4.

See the default "HowToUse.htm" and the "tasks.xml" file for help on getting started. The example task file notifies you if your external IP address changes.

Let me know if you find any bugs are have a feature wish!

Download here (just three files and no install to perform)

Can't RDP? How to enable / disable virtual machine firewall for Azure VM

Oh no!  I accidentally blocked the RDP port on an Azure virtual machine which resulted in not being able to log into the VM anymore.  I did ...