Sunday, January 26, 2014

Parsing Word Document XML via MsXml in VBA

An old friend ping me because while having some success loading word documents saved as xml in his VBA routines in word, he was having massive problems getting XPATH to work. I resolved the basic issue and then asked him to give me his hardest issue for me to show some example code for. I think that the last time that I did this was a few years ago, MsXml was (and is) a bit lame compared to later implementations, so his troubles were very reasonable to have.

 

Declaring name spaces

You can’t use XPath on a document with namespaces without providing the name spaces. The process is pretty simple, just convert the namespaces declare at the top of the XML document to a VB String. There happen to be many of them in a word xml document..

 

Dim domPrefix As String
domPrefix = "xmlns:wpc='http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas' "
domPrefix = domPrefix + "xmlns:mc='http://schemas.openxmlformats.org/markup-compatibility/2006' "
domPrefix = domPrefix + "xmlns:o='urn:schemas-microsoft-com:office:office' "
domPrefix = domPrefix + "xmlns:r='http://schemas.openxmlformats.org/officeDocument/2006/relationships' "
domPrefix = domPrefix + "xmlns:m='http://schemas.openxmlformats.org/officeDocument/2006/math' "
domPrefix = domPrefix + "xmlns:v='urn:schemas-microsoft-com:vml' "
domPrefix = domPrefix + "xmlns:wp14='http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing' "
domPrefix = domPrefix + "xmlns:wp='http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing' "
domPrefix = domPrefix + "xmlns:w10='urn:schemas-microsoft-com:office:word' "
domPrefix = domPrefix + "xmlns:w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' "
domPrefix = domPrefix + "xmlns:w14='http://schemas.microsoft.com/office/word/2010/wordml' "
domPrefix = domPrefix + "xmlns:w15='http://schemas.microsoft.com/office/word/2012/wordml' "
domPrefix = domPrefix + "xmlns:wpg='http://schemas.microsoft.com/office/word/2010/wordprocessingGroup' "
domPrefix = domPrefix + "xmlns:wpi='http://schemas.microsoft.com/office/word/2010/wordprocessingInk' "
domPrefix = domPrefix + "xmlns:wne='http://schemas.microsoft.com/office/word/2006/wordml' "
domPrefix = domPrefix + "xmlns:wps='http://schemas.microsoft.com/office/word/2010/wordprocessingShape'"
The next item is just loading the xml file, this was his existing code and worked fine
    Set xmlDoc = New MSXML2.DOMDocument60
    xmlDoc.validateOnParse = False
    xmlDoc.async = False
    xmlDoc.Load sFile    
    If Not xmlDoc.Load(sFile) Then  'strXML is the string with XML'
        sErr = xmlDoc.parseError.ErrorCode & "  " & xmlDoc.parseError.reason
        GoTo errXML
    End If
The next item was just setting two properties on the xmlDoc (unlike later implementations, there was no XmlNameSpaceManager needed). The key one is using the domPrefix string from above.
    xmlDoc.setProperty "SelectionNamespaces", domPrefix
    xmlDoc.setProperty "SelectionLanguage", "XPath"

So if he wanted to find all of the tables, it was a simple:

Dim foundNodes As MSXML2.IXMLDOMNodeList  
Set foundNodes = xmlDoc.DocumentElement.SelectNodes("//w:tbl")

His tough problem

He wanted to find all images in the document, obtain their dimensions and the associated file name, plus some related stuff. The code was pretty clean to write with xPath, as shown below

 ' Find the  actual drawing nodswe
    Set foundNodes = xmlDoc.DocumentElement.SelectNodes("//w:drawing")
        Debug.Print foundNodes.Length & " Drawings found"
        iNode2 = 1
    For Each node In foundNodes
    Debug.Print "Image #" & iNode2
    'We search from this node
    Set node2 = node.SelectSingleNode(".//wp:extent")
    Debug.Print node2.Attributes(0).BaseName & "=" & node2.Attributes(0).NodeValue
    Debug.Print node2.Attributes(1).BaseName & "=" & node2.Attributes(1).NodeValue
    Set node2 = node.SelectSingleNode(".//wp:docPr")
        Debug.Print node2.Attributes(2).BaseName & "=" & node2.Attributes(2).NodeValue
    'Directly finding parent p node does not appear to be supported so..
    Set node3 = node.ParentNode
    Do While node3.BaseName <> "p"
        Set node3 = node3.ParentNode
    Loop
    
    Set node2 = node3.SelectSingleNode(".//w:pStyle")
        Debug.Print node2.Attributes(0).BaseName & "=" & node2.Attributes(0).NodeValue
     iNode2 = iNode2 + 1
  Next node
There were a few hiccups such as finding that ancestor: and descendent: did not appear to work, but there were easy work arounds for that.
The output is shown below
output
Friend helped, some memories refreshed, and a blog post that will hopefully help someone else!

Saturday, January 18, 2014

Merging two AspNet Profiles–more complex then one would think!

During recent code revision, a developer accidentally removed in Web.Config, the applicationName on the SQLProfileProvider element. The consequence became apparent when support started to report customers saved-reports-definitions(which was stored here) had disappeared.

   <add name="ErrorProfileProvider" connectionStringName="LocalSQLServer" type="System.Web.Profile.SqlProfileProvider" />
The problem was identified, but there was a second problem…  We had a significant number of new customers that also saved reports-definitions to this error profile provider. A simple switch back by adding the applicationName would mean that these reports-definitions would disappear (which customer support did not want to see happen to new customers – a rather bad experience).
My initial take was “Oh, I have used multiple membership providers in one web application in the past, so we just need to use the same pattern…” 
  • Oops that pattern did not work, Grrr.
  • Searching stackoverflow and other sites, did not find any “canned solutions”.
  • We looked at trying to merge the data in SQL Server, it could be done – but the TSQL code was complicated and would require major setup in QA to test properly. Not a good scenario for an agile solution.
What I ended up doing was my original pattern but I had to go to lower level coding to do it (i.e. using inherited classes features). 
First, I indeed put multiple profiles in web.config as was my original plan:
<profile inherits="AdminConsole.CustomProfile" defaultProvider="GoodProfileProvider" enabled="true">
      <providers>        
<clear />        
<add name=GoodProfileProvider" applicationName="/Admin Console" connectionStringName="LocalSQLServer" type="System.Web.Profile.SqlProfileProvider" />        
<add name="ErrorProfileProvider" applicationName="/" connectionStringName="LocalSQLServer" type="System.Web.Profile.SqlProfileProvider" />      
</providers>    
</profile>
The next step was to write a method (ProfileMerge) that would be called when users logged in. I added it as a static in my profile class. The code is below, the basic steps were:
  1. Check if the customer has two profiles, if not and only the correct one – then we are done
    1. Otherwise just move one into the other “in bulk”
  2. Next we need to check when the last update happened, the old one was off line, so
    1. Older then error one – we need to update
    2. Newer – we have already updated (No need to create a “updated” field to track this)
  3. We used inherited classes to call GetPropertyValues so we can process the information as collections.
  4. We walked the ErrorData (the recently created saved-reports-definitions) to find any recent definitions that we needed to merge into the old profile data. When we found one, we called a merging routine (specific to how the reports were being saved)
  5. Once all of this was done, we set the results into the good profile and returned. The profile was already being saved in the calling code, so there was not a need to do an explicit save.
   public class CustomProfile : ProfileBase
    {
        public static void ProfileMerge(string userName, bool isForced)
        {
            try
            {
                if (String.IsNullOrWhiteSpace(userName))
                    return;
                int total = 0;
                var profile = CustomProfile.Create(userName);
                if (profile==null || profile.IsAnonymous) return;
                var errorPM = (SqlProfileProvider)System.Web.Profile.ProfileManager.Providers["ErrorProfileProvider"];
                if (errorPM == null) return;
                var errorDate = DateTime.MaxValue;
                foreach (ProfileInfo item in errorPM.FindProfilesByUserName(ProfileAuthenticationOption.Authenticated, userName, 0, 1, out total))
                {
                    errorDate = item.LastUpdatedDate;
                }
                var goodPM = (SqlProfileProvider)System.Web.Profile.ProfileManager.Providers["GoodProfileProvider"];
                if (goodPM == null)
                {
                    //Should never occur
                    AdminConsole.Diagnostics.SplunkLogger.Fatal("Profile not found for " + userName);
                    return;
                }
                // If date is in the future, there is no data to merge
                if (errorDate > DateTime.Now)
                    return;
                // Already done the update BUT allow manual repeat if needed.
                if (profile.LastUpdatedDate > errorDate && ! isForced) return;
                var errorData = errorPM.GetPropertyValues(profile.Context, CustomProfile.Properties);
                var goodData = goodPM.GetPropertyValues(profile.Context, CustomProfile.Properties);
                foreach (SettingsPropertyValue item in errorData)
                {
                    if(item.PropertyValue !=null && ! String.IsNullOrWhiteSpace( item.PropertyValue.ToString()))
                    switch (item.Name)
                    {
                        case "UserSavedFilters_Consumer":
                        case "UserSavedFilters_TransactionUsage":
                        case "UserSavedFilters":
                            goodData[item.Name].PropertyValue= MergeStrings(item.Name,goodData[item.Name].PropertyValue.ToString(),item.PropertyValue.ToString());
                            break;
                    }
                }
                goodPM.SetPropertyValues(profile.Context, goodData);
            }
            catch (Exception ex)
            {
                AdminConsole.Diagnostics.SplunkLogger.Error(ex);
            }
        }
        private static string MergeStrings(string name,string oldData, string newData)
        {
            var logEntry = new Dictionary<string, string>();
            logEntry.Add("AP_name", name);
            logEntry.Add("AP_old", oldData);
            logEntry.Add("AP_current", newData);
            char[] spliter = { '$' };
            if (String.IsNullOrWhiteSpace(oldData)) return newData;
            if (String.IsNullOrWhiteSpace(newData)) return oldData;
            var oldCollection = oldData.Split(spliter, StringSplitOptions.RemoveEmptyEntries);
            var newCollection = newData.Split(spliter, StringSplitOptions.RemoveEmptyEntries);
            var merge = new HashSet<String>();
            foreach (string item in oldCollection)
            {
                if (!merge.Contains(item))
                    merge.Add(item);
            }
            foreach (string item in newCollection)
            {
                if (!merge.Contains(item))
                    merge.Add(item);
            }
            var result = new StringBuilder();
            foreach (string item in merge)
            {
                result.Append(item);
                result.Append("$");
            }
            logEntry.Add("AP_result", result.ToString());
            AdminConsole.Diagnostics.SplunkLogger.Debug(logEntry);
            return result.ToString();
        }
You may have noted that I included an ability to ignore dates and force a merge.  This was done to provide a backdoor mechanism for support if the unexpected happen. They could enter the userid on a dark page (only visible to them), and force the update to happened again (which would log the results of the merge so we could find why our code (and QA) had failed).
The good news is that the above has rolled into production and our QA person reports that customer service is happy because everything was returned as expected. We did not need to use our dark page.