PaulTechGuy's Random Thoughts

Wednesday, October 8, 2008

Fast file compare using .NET HashAlgorithm.ComputeHash

I recently had a huge set of files in dozens of directories that I knew were somehow different from a backup. I needed to know which files were different (I just needed the file names). Normally I would use software like Beyond Compare, but this was not an option since the two sets of files were in two different remote locations without an option of copying 5G worth of data from one location to the other.

The solution I came up with, which worked out great, was to modify a set of code from my C# Recipe book that computed a hash of two files and compared the hash values. The new code accepts a file pattern, recursively finds all matching files (from the current directory), then writes out each files hash value and file name.

I executed the program on both systems, redirecting the output to a text file, then just had to compare the two text files. Note: Be careful that the FileHash program itself and any redirected output files are not included in the file pattern that you are searching on.

Example:

..\FileHash.exe *.* > ..\hashvalues.systemA.txt

using System;
using System.IO;
using System.Security.Cryptography;

namespace FileHash
{
   class Program
   {
      static void Main(string[] args)
      {
         new Program().Run(args);
      }

      private void Run(string[] args)
      {
         // Create the hashing object.
         string[] files = Directory.GetFiles(@".\", args[0], SearchOption.AllDirectories);

         foreach (string file in files)
         {
            using (HashAlgorithm hashAlg = HashAlgorithm.Create())
            {
               using (FileStream fs = new FileStream(file, FileMode.Open))
               {
                  // Calculate the hash for the files.
                  byte[] hashBytes = hashAlg.ComputeHash(fs);

                  // Compare the hashes.
                  Console.WriteLine(string.Format(
                     "{0} {1}",
                     BitConverter.ToString(hashBytes),
                     file));
               }
            }
         }
      }
   }
}

Monday, September 8, 2008

Zip XML in memory for Web Service transport (SharpZipLib)

Here is a full test program that demonstrates how to use SharpZipLib to zip an XElement into a byte array. This allows you to transfer large XML items over web services, and then unzip then on the web service side. I included two methods to unzip, both back to an XElement and to an XML file. IIS 6 does allow compression as well, but the reason I had to have the functionality below was that a PC client application was required to send a host web service a large set of XML (rather than the host sending the client XML).

using System;
using System.IO;
using System.Xml.Linq;
using ICSharpCode.SharpZipLib.Zip;

namespace ConsoleTest
{
   class Program
   {
      static void Main(string[] args)
      {
         new Program().Run(args);
      }

      private void Run(string[] args)
      {
         // create some xml
         XElement xml = XElement.Parse("<xml><element>whatever</element></xml>");

         // zip xml
         string startXml = xml.ToString();
         byte[] bytes = ZipContent(xml, "TestXML");

         // unzip xml
         xml = UnzipContent(bytes);
         string endXml = xml.ToString();

         // sanity check
         System.Diagnostics.Debug.Assert(startXml == endXml);
      }

      /// <summary>
      /// Convert XML to zipped byte array.
      /// </summary>
      /// <param name="xml">XML to zip.</param>
      /// <param name="entryName">The zip entry name.</param>
      /// <returns>A byte array that contains the xml zipped.</returns>
      private byte[] ZipContent(XElement xml, string entryName)
      {
         // remove whitespace from xml and convert to byte array
         byte[] normalBytes;
         using (StringWriter writer = new StringWriter())
         {
            xml.Save(writer, SaveOptions.DisableFormatting);
            System.Text.ASCIIEncoding encoding = new System.Text.ASCIIEncoding();
            normalBytes = encoding.GetBytes(writer.ToString());
         }

         // zip into new, zipped, byte array
         using (Stream memOutput = new MemoryStream())
         using (ZipOutputStream zipOutput = new ZipOutputStream(memOutput))
         {
            zipOutput.SetLevel(9);

            ZipEntry entry = new ZipEntry(entryName);
            entry.DateTime = DateTime.Now;
            zipOutput.PutNextEntry(entry);

            zipOutput.Write(normalBytes, 0, normalBytes.Length);
            zipOutput.Finish();

            byte[] newBytes = new byte[memOutput.Length];
            memOutput.Seek(0, SeekOrigin.Begin);
            memOutput.Read(newBytes, 0, newBytes.Length);

            zipOutput.Close();

            return newBytes;
         }
      }

      /// <summary>
      /// Return zipped bytes as unzipped XML.
      /// </summary>
      /// <param name="bytes">Zipped content.</param>
      /// <returns>Unzipped XML.</returns>
      private XElement UnzipContent(byte[] bytes)
      {
         // unzip bytes into unzipped byte array
         using (Stream memInput = new MemoryStream(bytes))
         using (ZipInputStream input = new ZipInputStream(memInput))
         {
            ZipEntry entry = input.GetNextEntry();

            byte[] newBytes = new byte[entry.Size];
            int count = input.Read(newBytes, 0, newBytes.Length);
            if (count != entry.Size)
            {
               throw new Exception("Invalid read: " + count);
            }

            // convert bytes to string, then to xml
            string xmlString = System.Text.ASCIIEncoding.ASCII.GetString(newBytes);
            return XElement.Parse(xmlString);
         }
      }

      /// <summary>
      /// Save zipped bytes as unzipped file.
      /// </summary>
      /// <param name="bytes">Zipped content.</param>
      /// <param name="path">File path to save unzipped XML.</param>
      private void UnzipContent(byte[] bytes, string path)
      {
         // unzip bytes into unzipped byte array
         using (Stream memInput = new MemoryStream(bytes))
         using (ZipInputStream zipInput = new ZipInputStream(memInput))
         using (BinaryWriter writer = new BinaryWriter(File.Create(path)))
         {
            ZipEntry entry = zipInput.GetNextEntry();

            int count;
            byte[] input = new byte[1024 * 10];
            while ((count = zipInput.Read(input, 0, input.Length)) > 0)
            {
               writer.Write(input, 0, count);
            }
         }
      }
   }
}

Wednesday, September 3, 2008

Use PowerShell to capture database schema

Here is a PowerShell script to capture a database schema. Output is written to a directory/datetime file. Multiple databases/servers can be specified via the XML input file.

To run: ./CaptureSchema.ps1 databases.xml

Here is the PowerShell code:


param ([string]$xmlConfig = $(throw '%argument 1 must be XML configuration file path'))

[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.SMO") | out-null
[System.Reflection.Assembly]::LoadWithPartialName("System.Data") | out-null
[System.Reflection.Assembly]::LoadWithPartialName("System.Core") | out-null
[System.Reflection.Assembly]::LoadWithPartialName("System.Linq") | out-null
[System.Reflection.Assembly]::LoadWithPartialName("System.Xml.Linq") | out-null

function ScriptDatabase([string]$serverName, [string]$dbName)
{
   $fileName = [String]::Format("{0} {1}.sql", [DateTime]::Now.ToString("yyyyMMdd_HHmmss"), $dbName)
   [Console]::Write("Server: $serverName, Database: $dbName, Output: `"$fileName`" . . . ")

   $srv = new-object "Microsoft.SqlServer.Management.SMO.Server" $serverName
   $db = new-object "Microsoft.SqlServer.Management.SMO.Database"
   $scr = New-Object "Microsoft.SqlServer.Management.Smo.Scripter"

   $db = $srv.Databases[$dbName]

   $scr.Server = $srv
   $options = New-Object "Microsoft.SqlServer.Management.SMO.ScriptingOptions"
   
   $options.ClusteredIndexes = $true
   $options.Default = $true
   $options.DriAll = $true
   $options.Indexes = $true
   $options.IncludeHeaders = $true
   $options.Triggers = $true
   $options.AppendToFile = $false
   $options.FileName = "$pwd\$fileName"
   $options.ToFileOnly = $true
   
   # output all db tables
   $scr.Options = $options
   $tables = $db.Tables
   if ($tables -ne $null)
   {
      $scr.Script($db.Tables)
   }

   # output all sprocs
   $options.AppendToFile = $true
   $sprocs = $db.StoredProcedures | where {$_.IsSystemObject -eq $false}
   if ($sprocs -ne $null)
   {
      $scr.Script($sprocs)
   }
   
   # output all db views
   $views = $db.Views | where {$_.IsSystemObject -eq $false}
   if ($views -ne $null)
   {
      $scr.Script($views)
   }
   
   "done."
   
}

function SaveSchema($xmlDb)
{
   # make folder if not exists yet
   $dbName = $xmlDb.Element("Name").Value
   $dirName = ".\$dbName"
   if ((Test-Path -path $dirName) -eq $False)
   {
      "Creating directory $dirName..."
      ni -type directory $dirName | out-null
   }
   
   # save the schema
   $serverName = $xmlDb.Element("Server").Value
   
   $prevDir = $pwd
   $prevDir
   set-location $dirName
   ScriptDatabase $serverName $dbName
   set-location $prevDir
}

#
# main
#

$xml = [System.Xml.Linq.XElement]::Load((Resolve-Path "$xmlConfig"))

foreach($db in $xml.Elements("Database"))
{
   if ($db.Attribute("Enabled").Value -eq $true)
   {
      SaveSchema $db
   }
}

exit

Here is the XML input file:

<Databases>
   <Database Enabled="true">
      <Name>DatabaseName</Name>
      <Server>ServerName</Server>
   </Database>
   <!-- repeat the Database element if more
        than one database schema to capture -->
</Databases>

Monday, August 18, 2008

Validate a file path using C#

I finally found a good algorithm to validate a file path (had to fix two bugs in it though). I use this static method in Windows Forms when the user is allowed to enter a path. I add a text changed event on the textbox and call this method to enable or disable an OK button.

public static bool ValidateFilepath(string path)
{
   if (path.Trim() == string.Empty)
   {
      return false;
   }

   string pathname;
   string filename;
   try
   {
      pathname = Path.GetPathRoot(path);
      filename = Path.GetFileName(path);
   }
   catch (ArgumentException)
   {
      // GetPathRoot() and GetFileName() above will throw exceptions
      // if pathname/filename could not be parsed.

      return false;
   }

   // Make sure the filename part was actually specified
   if (filename.Trim() == string.Empty)
   {
      return false;
   }

   // Not sure if additional checking below is needed, but no harm done
   if (pathname.IndexOfAny(Path.GetInvalidPathChars()) >= 0)
   {
      return false;
   }

   if (filename.IndexOfAny(Path.GetInvalidFileNameChars()) >= 0)
   {
      return false;
   }
   
   return true;
}

Plug-in architecture (dynamically loading DLLs) using LINQ

For implementing a plug-in architecture using the strategy pattern, this is my preferred way of loading some interfaces (DLLs) at runtime. Make sure you look at the second code example showing how to do the same thing in LINQ.

public List<T> LoadDLL<T>(string path, string pattern)
{
    List<T> plugins = new List<T>();
    foreach (string s in Directory.GetFiles(Path.GetFullPath(path), pattern))
    {
        foreach (Type t in Assembly.LoadFile(s).GetTypes())
        {
            if (!t.IsAbstract && typeof(T).IsAssignableFrom(t))
            {
                plugins.Add((T)Activator.CreateInstance(t));
            }
        }
    }

    return plugins;
}

Now using LINQ...

public List<T> LoadDLL<T>(string path, string pattern)
{
    return Directory.GetFiles(Path.GetFullPath(path), pattern)
        .SelectMany(f => Assembly.LoadFile(f).GetTypes()
            .Where(t => !t.IsAbstract && typeof(T).IsAssignableFrom(t))
            .Select(t => (T)Activator.CreateInstance(t)))
        .ToList();
}

If you want to load an assembly and all the dependent DLLs, you can use the same LINQ query, but use LoadFrom rather than LoadFile.

public List<T> LoadDLL<T>(string path, string pattern)
{
    return Directory.GetFiles(Path.GetFullPath(path), pattern)
        .SelectMany(f => Assembly.LoadFrom(f).GetTypes()
            .Where(t => !t.IsAbstract && typeof(T).IsAssignableFrom(t))
            .Select(t => (T)Activator.CreateInstance(t)))
        .ToList();
}

This can then be called using:

List<Foo> foos = LoadDLL<Foo>(@".\", "*.dll");

Thursday, August 14, 2008

Replacing delegates with lamda expressions

If you understand a lamda expression, you realized that it is just another step in the evolution of delegates. Now that Visual Studio intellisense and the .NET compiler can infer from a delegate declaration what parameters are required for a delegate, and their types, we no longer have to use the "delegate" keyword or the parameter types...we just need to specify some parameter names.

For example, an older style event delegate would be done like:

x.Completed += delegate(object sender, EventArgs e) { ... };

Now all we need to code is:

x.Completed += (sender, e) => { ... };

Again, the development environment already knows that the Completed event needs two parameters, a object and a EventArgs; there is no need for us to supply those pieces of information.

Monday, August 11, 2008

.NET 3.5 ListView and LinqDataSource

Pretty good videos on the new ListView and LinqDataSource controls in .NET 3.5.

ListView

ListView with DataPager

LinqDataSource

Saturday, August 9, 2008

PowerShell and CSV files

If you need to process a CSV file, you can use PowerShell's import-csv command. If headers exist in the first line of the CSV, then they will be used as property names on the resulting import-csv output. For example, if you CSV looks like:

Last,First,Middle
Jones,Fred,S
Smith,Sally,M
Johnson,Bob,L

Then you can output the full names like:

import-csv employees.csv |% `
{[string]::format("{0} {1} {2}",$_.first, $_.middle,$_.last)}

Or if you only want last names that start with a "J", you can:

import-csv employees.csv | `
where {$_.last.startswith("J")} |% `
{[string]::format("{0} {1} {2}",$_.first, $_.middle,$_.last)}

Pretty cool, eh?

PowerShell v2 will have the ability to change what the delimiting character is.

Thursday, August 7, 2008

Format C# code for Blogger posts

This is the tool I like for formatting C# source code in my blog.

http://formatmysourcecode.blogspot.com/

Using ListView and LINQ to display multi-level relationships

This is cool! Assume we have a database that has a three-level relationship. We have a Product, Install, and Document table. Products have installations and installations have related documents for them. This means our Document table has a InstallId and our Install table has a ProductId. This is all standard relationship stuff so hopefully you are following this.

Now assume we want to use LINQ and the ListView web control to display hierarchical data and we want full control over the HTML generated (that's why we use the ListView control). The display will look like:

Product P1
  Installation I1
    Document D1
    Document D2
  Installation I2
    Document D3
Product P2
...

First, use a standard LINQ-to-SQL class in Visual Studio to create your data context object. Next, create a three-level set of ListView objects. Here's the cool part...are you ready for this?

Binding your data: Each ListView needs to be bound to a LINQ IQueryable data source. The outer most ListView, Product, can be just linked to the Products (remember the generated data context adds the plural name to the table name) like:

MultiLevelDataContext db = new MultiLevelDataContext();
IEnumerable<Product> products = db.Products;
lvProduct.DataSource = products;
lvProduct.DataBind();

If you do this in code-behind, that's all you are going to do there. The other two bindings are done in the ListView declaration itself.

The Install and Document nested ListView components just need to have their DataSource property set to the property of its outer ListView object. Remember that when the LINQ-to-SQL code was generated, it automatically added properties for relationship data. For example, the Product class has a property called Installs. The Install class has a property called Documents. These properties basically end up being IQueryable data sources. We simply use a standard Eval() binding statement in the DataSource to connect things up. This makes it incredibly easy to bind the related data into a web control like a three-level ListView structure.

Below is what is all ends up looking like. Two things to mention. Most of the aspx is table formatting. Secondly, notice the DataSource statements in the two nested ListView controls.

Code Behind (cs)

protected void Page_Load(object sender, EventArgs e)
{
  if (!IsPostBack)
  {
     MultiLevelDataContext db = new MultiLevelDataContext();
     IEnumerable<Product> products = db.Products;
     lvProduct.DataSource = products;
     lvProduct.DataBind();
  }
}

Web Page (aspx)

<asp:ListView ID="lvProduct" runat="server">
   <LayoutTemplate>
      <table cellpadding="3" cellspacing="0" border="1" style="width: 100%; background-color: Silver;">
         <tr runat="server" id="itemPlaceholder" />
      </table>
   </LayoutTemplate>
   <ItemTemplate>
      <tr>
         <td>Product:
            <%# Eval("Name") %>
         </td>
      </tr>
      <asp:ListView ID="lvInstall" runat="server" DataSource='<%# Eval("Installs") %>'>
         <LayoutTemplate>
            <tr>
               <td>
                  <table cellpadding="3" cellspacing="0" border="1" style="width: 100%; background-color: Aqua;">
                     <tr runat="server" id="itemPlaceholder" />
                  </table>
               </td>
            </tr>
         </LayoutTemplate>
         <ItemTemplate>
            <tr>
               <td>Version:
                  <%# Eval("Version") %>
               </td>
               <td>Release Date:
                  <%# Eval("ReleaseDate") %>
               </td>
            </tr>
            <asp:ListView ID="lvDocuments" runat="server" DataSource='<%# Eval("Documents") %>'>
               <LayoutTemplate>
                  <tr>
                     <td colspan="2">
                        <table cellpadding="3" cellspacing="0" border="1" style="width: 100%; background-color: Lime;">
                           <tr runat="server" id="itemPlaceholder" />
                        </table>
                     </td>
                  </tr>
               </LayoutTemplate>
               <ItemTemplate>
                  <tr valign="top">
                     <td>
                        <%# Eval("Name") %>
                     </td>
                     <td>
                        <%# Eval("Description") %>
                     </td>
                  </tr>
               </ItemTemplate>
            </asp:ListView>
         </ItemTemplate>
      </asp:ListView>
   </ItemTemplate>
</asp:ListView>

Monday, August 4, 2008

Synchronizing LINQ-to-SQL with database schema

The initial release of the LINQ-to-SQL designer support in VS 2008 doesn't have a good way of keeping changes in the database schema synchronized with the dbml (designer) data. I've really only found two products so far that claim to do this for you.

Huagati DBML Tools
http://www.huagati.com/dbmltools/

Database Restyle by Perpetuum Software
http://www.perpetuumsoft.com/Product.aspx?lang=en&pid=55

I have not personally tried either of these yet.