Wednesday, June 13, 2012

Background tasks (continuous code) in the Cloud: Cloud Foundry beats AWS and Azure

Introduction

I have a background task collection I need run, continuously, but with different timer intervals for each task. It is a critical part of my web services and provides the data gathering and transformations the make the web service valuable. How should I package the code and where should I deploy it? Only in the cloud.

 

As the cloud space is moving faster than I can write this, any of this could be outdated by the time you reader it.

Consider Time, Money, and Ease of Deployment

In struggling with where to deploy the code, I considered the cloud cost, the time to learn and build the solution as well as any cloud gothchas. I’m now on my sixth cloud provider trying to determine if they are the best at background tasks. Why? Because background tasks are where the heavy lifting happens. I want to spend my time getting that heavy lifting correct and not fighting with the cloud environment.

Background Task Defined

Just to be clear, I consider a background task any code that runs continuously. Whether it has a UX or responds to http(s) is a detail at this point.  As long as I have the continuous part, I can work around the other caveats.

Timed Events (.Net Timer, Scheduled Task, or Cron) are critical

I’ve looked at the .Net Timer class inside code and the Scheduled Task (windows) and Cron (linus). I need the timer so it is critical, how I get it is less important. However, the farther toward IT-ish settings I go from my code, the higher a chance I will forget to update the live code, or verify the timing device. So the timer does need to be contained in code but I’ll be language and platform agnostic to get this.

Basic Cloud Companies don’t care about Background Tasks

Basic cloud companies are still writing their analysis and deployment tool code. They only care enough about background tasks to point you to a framework that might, kind of, if you look the right way, consider background tasks. Good luck there.

The Big Guys know Background Tasks are important

Amazon and Azure both have some strategy for Background Tasks.

 

AWS is more IT-ish in that you have to grab an Amazon Machine Image (AMI) then dink with the system control for timers (Cron or Schedule Tasks), then deal with a Daemon or Service (yours or someone else’s), then deploy to the AMI. I just want to Git Push and skip the IT headache so thanks but no.

 

Azure (gosh love ‘em) knows we love to write code and has provided the background task concept as a worker role. Awesome! Love it! The only caveat is that the Worker Role is a very specific project framework. You have to adopt and support the framework in order to deploy your background task. That is so close to ideal. But is there anything better?

Cloud Foundry is doing Background Tasks right

Cloud Foundry has a novel approach to background tasks. You write the code and they treat it as a task. Period. No framework, no IT settings. Just code. That is doing background tasks right. Granted I haven’t deployed yet but I’ll give Cloud Foundry the benefit of the doubt.

Did I leave someone out?

I probably left someone’s favorite company off this list. Cloud providers pop up so fast, it is hard to keep up. Sorry about that. If you know of a Cloud company that does background tasks just the way you like, leave a comment below so I can investigate.

Tuesday, June 5, 2012

Steps for Consuming XML data in .Net

Introduction

While consuming third-party RSS feeds, I found I had to relearn how to deal with XML data. This post is meant to prepare any developer who needs to consume XML which they do not control. While I used RSS feeds, any XML will apply. I wanted to change the meta data and data of the XML file into a model of data that I could control with .Net classes and conventional data storage.

 

This post is organized to take you from an xml file to .Net classes able to consume, serialize, and test the xml.

Generating an XSD file from an XML file using Xsd.exe

The first step is to make sure you have the XML Schema Definition tool, xsd.exe, installed. It is part of the .Net Framework tools. Make sure the executable location is part of the system path, user path, or command prompt path. On my computer the path is “C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin” and I added it to the system path so it is available at any command prompt, regardless of the user.

 

Separate out the xml file(s) into a separate directory. At a command prompt, generate the schema file(s) from the xml file with “xsd file.xml” where file.xml is the name of your xml file. The new file just created is the schema definition files for the xml. Open it up and make sure the schema makes sense.

 

One gotcha you can spot in the schema is to make sure all content is encoded. An RSS feed content section may include html markup. Make sure the HTML is encoded. For example, make sure a <br> appears as &lt;br&gt;. If the br is un-encoded, appearing as <br>, the xsd schema definition will create a new section of the definition to deal with it when you don’t want it to be separated out either in the schema or the resulting .Net classes.

 

If there is more than one xsd file, you will need to know which is the primary for the next part of the process. The primary xsd file is the one that has the data definitions. 

Generating the .Net classes from the XSD schema file(s)

Now that the schema files are just as you want them, you generate the .Net classes containing the associated models with “xsd file.xsd  file2.xsd /classes” at the command prompt. The example assumes you have several schema definition files. Each definition file must be listed to create the classes correctly. You may have several schema definition files if your xml references more than one namespace.

Minor clean-up of the auto-generated .Net classes

The single .cs file will contain all the classes required to deserialize the xml into models. If you need to change class names, start with the parent name only and change it’s name but add the original, scheme-determined name in the XmlRootAttribute.

 

For example, the generated .cs file may produce a parent/top class name that doesn’t correspond with your current naming practices. For an rss file, it would be “rss.” The following is the top of the auto-generated file.

 

image

 

If you want to change the class name from “rss” and still parse the rss, you need to change the class name to your new name (“RssXmlModel” below) and modify the XmlRootAttribute to include the “rss” name.

 

 

image

Add a namespace to the classes.

 

You may be inclined to cleanup the auto-generated classes changing all sorts of names, definitions, etc. If you do not control the xml, but just consume it, you may have to do this cleanup again when the producer/owner of the content changes their xml. You should either only change the top xml node’s definition (class name, xmlrootattribute), or you should create an entirely new model with a process to convert between the auto-generated model and your final model.

 

Notice the auto-generated file doesn’t include the tool’s name, xsd.exe. You may want to add that for the next developer that has to deal with this file in your project.

A Generic method to Request XML using HttpWebResponse

The following method allows you to request the xml, put the response content into a string, and deserialize into the model. It assumes you already have an HttpWebRequest object correctly setup in the class before calling GetXmlRequest(). Feel free to dry out the code to suit your purposes.

 

   1:  public static T GetXmlRequest<T>(Uri uri)
   2:  {
   3:      if (uri == null)
   4:      {
   5:          throw new NullReferenceException("uri");
   6:      }
   7:   
   8:      TimeSpan timeSpan = new TimeSpan(0, 2, 0);
   9:   
  10:      HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(uri);
  11:      request.Timeout = (int)timeSpan.TotalMilliseconds;
  12:      request.ReadWriteTimeout = (int)timeSpan.TotalMilliseconds * 100;
  13:      request.Method = "GET";
  14:      request.ContentType = "text/xml";
  15:   
  16:      try
  17:      {
  18:          using (HttpWebResponse httpWebResponse = (HttpWebResponse)request.GetResponse())
  19:          {
  20:              using (StreamReader streamReader = new StreamReader(httpWebResponse.GetResponseStream()))
  21:              {
  22:                  // leave this in, to look at string in debugger
  23:                  string xml = streamReader.ReadToEnd();
  24:   
  25:                  if (string.IsNullOrEmpty(xml))
  26:                  {
  27:                      return default(T);
  28:                  }
  29:   
  30:                  T temp = Serializer.XmlDeserialize<T>(xml, Encoding.GetEncoding(httpWebResponse.CharacterSet));
  31:   
  32:                  // DFB: Object couldn't be deserialized
  33:                  if (EqualityComparer<T>.Default.Equals(temp, default(T)))
  34:                  {
  35:                      Debug.WriteLine("default T");
  36:                  }
  37:   
  38:                  return temp;
  39:              }
  40:          }
  41:      }
  42:      catch (WebException webException)
  43:      {
  44:          if (webException.Response != null)
  45:          {
  46:              using (Stream responseStream = webException.Response.GetResponseStream())
  47:              {
  48:                  if (responseStream != null)
  49:                  {
  50:                      using (StreamReader reader = new StreamReader(responseStream))
  51:                      {
  52:                          Trace.TraceError(reader.ReadToEnd());
  53:                      }
  54:                  }
  55:              }
  56:          }
  57:   
  58:          throw;
  59:      }
  60:  }

A Generic method to Deserialize into the auto-generated Classes

Once you have the xml (line 23 above), you can deserialize into the auto-generated classes (line 30).

 

   1:  public static T XmlDeserialize<T>(string xml, Encoding encoding)
   2:  {
   3:      try
   4:      {
   5:          T obj = Activator.CreateInstance<T>();
   6:   
   7:          XmlSerializer serializer = new XmlSerializer(obj.GetType());
   8:   
   9:          using (MemoryStream memoryStream = new MemoryStream(encoding.GetBytes(xml)))
  10:          {
  11:              T temp = (T)serializer.Deserialize(memoryStream);
  12:              return temp;
  13:          }
  14:      }
  15:  }
 

A Unit Test library to View the Rss xml in the Generated Models

The project containing this code is available on GitHub. Download, build and run the test named “StringToObject” found in the UnitTest1.cs file of the XmlTestProject. Set a breakpoint on Assert.IsNotNull(newRssObject) and add the newRssObject to the Watch window.

 

You can see the data in the classes via the Watch window below.

 

image

 

The test reads an xml file using auto-generated classes.

 

Summary

This example shows how to take a raw xml string and convert it into  C# .Net classes you can use to deserialize the xml. Now that the data is in a model, you can put the data in any traditional data store. My most common next steps are adding Ling to choose some interesting queries and serializing back to a file on disk.