Using Google to translate resource files – code example

Often there is a need to create resource files in foreign language for code-testing purposes. The typical list of languages that I like to test are:

  • Hebrew
  • Arabic
  • Simplified Chinese
  • Spanish
  • French
  • German – text is often 2x longer then English (what you get with preciseness in expression)
  • Hindi

If the code/css works for all of the above then you are likely safe for other languages. There is two pieces of code and one manual process (cut and paste – Google usually makes it hard to automate the capture of the translation).

 

Converting the Resx to a Html page

We put the Resx up – almost as is, just ditch the comments and place the items in <html> and <body> tags.

private void WriteHtml(FileInfo infile, FileInfo outHtml)
{
    XmlDocument sourceResx = new XmlDocument();
    sourceResx.Load(infile.FullName);
    XmlDocument xHtml = new XmlDocument();
    xHtml.LoadXml("<html><body/></html>");
    XmlNode body = xHtml.SelectSingleNode("//body");
    XmlNodeList list = sourceResx.SelectNodes("//data[@name]/comment");
    // remove comments
    for (int i = list.Count - 1; i >= 0;i-- )
        list[i].ParentNode.RemoveChild(list[i]);
    list = sourceResx.SelectNodes("//data[@name]");
    foreach (XmlNode node in list)
    {
        body.AppendChild(xHtml.ImportNode(node, true));
    }
    xHtml.Save(outHtml.FullName);
}

Getting the Translations

Next, we copy this to a website. I copied an example to one of my sites,  http://reddwarfdogs.com/ContentPage.html if you want to see what the output looks like.

 

Next, go to http://translate.google.com  and enter the URL and then pick the desired translation. Once the translation is presented I usually view source and then copy and paste it to a file with the cultureinfo as the name and .htm as the extension (this is assumed to happen in the next code sample). So we would have items like

  • he-IL.htm
  • ar.htm
  • es.htm

Creating the translated Resx files from the .htm files

We can now return to the world of code processing.

  • We use the CodePlex, HtmlAgilityPack library to fix the html into valid Xml so that processing is a lot easier, but before we do that we:
    • Add in a meta tag to identify the file as UTF-8 (if you forget to do this, you may get a lot of ???????? appearing instead).
  • Once we have valid Xml, we eliminate the original phrase that is put in the html from Google.
  • We then load a copy of the original Resx file and walk it, replacing the <value> with the one from the translation.
  • Just save to an appropriately named file.

The code:

void CreateTranslatedResx(FileInfo sourceFile)
{
    XmlDocument dom = new XmlDocument();
    dom.Load(sourceFile.FullName);
    string baseName = sourceFile.FullName.Substring(0, sourceFile.FullName.IndexOf("."));
    DirectoryInfo source = new DirectoryInfo(Environment.CurrentDirectory);
    FileInfo[] files = source.GetFiles("*.htm");
    foreach (FileInfo fi in files)
        if(fi.Extension==".htm")
    {
            //Update it with the encoding if not roman letters.
        string txt = File.ReadAllText(fi.FullName);
        if (!txt.Contains("utf-8"))
        {

            File.WriteAllText(fi.FullName, txt.Replace("<html>", "<html><meta http-equiv='Content-Type' content='text/html; charset=utf-8'>"));
        }
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();                
        doc.LoadHtml(File.ReadAllText(fi.FullName));
        doc.OptionOutputAsXml = true;                  
        doc.Save("temp.xml");                           
        string culture=fi.Name.Substring(0,fi.Name.IndexOf("."));
        XmlDocument htmDom = new XmlDocument();
        string xml = File.ReadAllText("temp.xml");
        htmDom.LoadXml(xml);
        XmlNodeList list = htmDom.SelectNodes("//span[@class='google-src-text']");
        for (int i = list.Count - 1; i >= 0; i--)
            list[i].ParentNode.RemoveChild(list[i]);
        XmlNodeList toMoveList = htmDom.SelectNodes("//data[@name]");
        foreach (XmlNode node in toMoveList)
        {
            XmlNode oldNode=dom.SelectSingleNode(
                string.Format("//data[@name='{0}']",node.Attributes["name"].Value));
            oldNode.SelectSingleNode("value").InnerText = node.SelectSingleNode("value").InnerText.Replace("?",string.Empty);
        }
        dom.Save(String.Format("{0}.{1}.Resx", baseName, culture));
    }            
}

Conclusion

That’s it!  The main things that can go wrong are:

  • Not saving with the correct CultureInfo name (What is the code for Welsh and Yiddish?)
  • Not saving the HTML from Google as UTF-8

Again, this is done only for testing purposes, read Googles terms of use etc if the files are to be shipped with the application or exposed on the real internet.

Comments

Popular posts from this blog

Yet once more into the breech (of altered programming logic)

Simple WP7 Mango App for Background Tasks, Toast, and Tiles: Code Explanation

How to convert SVG data to a Png Image file Using InkScape