While consuming third-party RSS feeds, I found I had to relearn how to deal with XML data. This post is meant to prepare any developer who needs to consume XML which they do not control. While I used RSS feeds, any XML will apply. I wanted to change the meta data and data of the XML file into a model of data that I could control with .Net classes and conventional data storage.
This post is organized to take you from an xml file to .Net classes able to consume, serialize, and test the xml.
Generating an XSD file from an XML file using Xsd.exe
The first step is to make sure you have the XML Schema Definition tool, xsd.exe, installed. It is part of the .Net Framework tools. Make sure the executable location is part of the system path, user path, or command prompt path. On my computer the path is “C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin” and I added it to the system path so it is available at any command prompt, regardless of the user.
Separate out the xml file(s) into a separate directory. At a command prompt, generate the schema file(s) from the xml file with “xsd file.xml” where file.xml is the name of your xml file. The new file just created is the schema definition files for the xml. Open it up and make sure the schema makes sense.
One gotcha you can spot in the schema is to make sure all content is encoded. An RSS feed content section may include html markup. Make sure the HTML is encoded. For example, make sure a <br> appears as <br>. If the br is un-encoded, appearing as <br>, the xsd schema definition will create a new section of the definition to deal with it when you don’t want it to be separated out either in the schema or the resulting .Net classes.
If there is more than one xsd file, you will need to know which is the primary for the next part of the process. The primary xsd file is the one that has the data definitions.
Generating the .Net classes from the XSD schema file(s)
Now that the schema files are just as you want them, you generate the .Net classes containing the associated models with “xsd file.xsd file2.xsd /classes” at the command prompt. The example assumes you have several schema definition files. Each definition file must be listed to create the classes correctly. You may have several schema definition files if your xml references more than one namespace.
Minor clean-up of the auto-generated .Net classes
The single .cs file will contain all the classes required to deserialize the xml into models. If you need to change class names, start with the parent name only and change it’s name but add the original, scheme-determined name in the XmlRootAttribute.
For example, the generated .cs file may produce a parent/top class name that doesn’t correspond with your current naming practices. For an rss file, it would be “rss.” The following is the top of the auto-generated file.
If you want to change the class name from “rss” and still parse the rss, you need to change the class name to your new name (“RssXmlModel” below) and modify the XmlRootAttribute to include the “rss” name.
Add a namespace to the classes.
You may be inclined to cleanup the auto-generated classes changing all sorts of names, definitions, etc. If you do not control the xml, but just consume it, you may have to do this cleanup again when the producer/owner of the content changes their xml. You should either only change the top xml node’s definition (class name, xmlrootattribute), or you should create an entirely new model with a process to convert between the auto-generated model and your final model.
Notice the auto-generated file doesn’t include the tool’s name, xsd.exe. You may want to add that for the next developer that has to deal with this file in your project.
A Generic method to Request XML using HttpWebResponse
The following method allows you to request the xml, put the response content into a string, and deserialize into the model. It assumes you already have an HttpWebRequest object correctly setup in the class before calling GetXmlRequest(). Feel free to dry out the code to suit your purposes.
1: public static T GetXmlRequest<T>(Uri uri)
3: if (uri == null)
5: throw new NullReferenceException("uri");
8: TimeSpan timeSpan = new TimeSpan(0, 2, 0);
10: HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(uri);
11: request.Timeout = (int)timeSpan.TotalMilliseconds;
12: request.ReadWriteTimeout = (int)timeSpan.TotalMilliseconds * 100;
13: request.Method = "GET";
14: request.ContentType = "text/xml";
18: using (HttpWebResponse httpWebResponse = (HttpWebResponse)request.GetResponse())
20: using (StreamReader streamReader = new StreamReader(httpWebResponse.GetResponseStream()))
22: // leave this in, to look at string in debugger
23: string xml = streamReader.ReadToEnd();
25: if (string.IsNullOrEmpty(xml))
27: return default(T);
30: T temp = Serializer.XmlDeserialize<T>(xml, Encoding.GetEncoding(httpWebResponse.CharacterSet));
32: // DFB: Object couldn't be deserialized
33: if (EqualityComparer<T>.Default.Equals(temp, default(T)))
35: Debug.WriteLine("default T");
38: return temp;
42: catch (WebException webException)
44: if (webException.Response != null)
46: using (Stream responseStream = webException.Response.GetResponseStream())
48: if (responseStream != null)
50: using (StreamReader reader = new StreamReader(responseStream))
A Generic method to Deserialize into the auto-generated Classes
Once you have the xml (line 23 above), you can deserialize into the auto-generated classes (line 30).
1: public static T XmlDeserialize<T>(string xml, Encoding encoding)
5: T obj = Activator.CreateInstance<T>();
7: XmlSerializer serializer = new XmlSerializer(obj.GetType());
9: using (MemoryStream memoryStream = new MemoryStream(encoding.GetBytes(xml)))
11: T temp = (T)serializer.Deserialize(memoryStream);
12: return temp;
A Unit Test library to View the Rss xml in the Generated Models
The project containing this code is available on GitHub. Download, build and run the test named “StringToObject” found in the UnitTest1.cs file of the XmlTestProject. Set a breakpoint on Assert.IsNotNull(newRssObject) and add the newRssObject to the Watch window.
You can see the data in the classes via the Watch window below.
The test reads an xml file using auto-generated classes.
This example shows how to take a raw xml string and convert it into C# .Net classes you can use to deserialize the xml. Now that the data is in a model, you can put the data in any traditional data store. My most common next steps are adding Ling to choose some interesting queries and serializing back to a file on disk.