Saturday, July 23, 2011

Tech Qu: Some SQL Server Questions

The Idiot ones

I have gathered a few from some interview sites, and the ones below are so trivial that it’s shocking that they were asked!

 

WriteTSSL to find all the duplicate email address in a table which contains only one column "email"

Code Snippet
  1. select [email]
  2. from [sometable]
  3. where [email] is not null OR len([email])=0
  4. group by [email]
  5. having count([email]) > 1

 

Many answers failed to exclude no email (could be recorded as null or an empty string), or return the count with it (which was not asked for).

 

Given a table: CustomerOrders: (Customer ID | Order ID | Order Date)

1. Write a SQL to find all customers who has placed an order today.

2. Write a SQL to find all customers who has placed an order today AND yesterday.

Again, a trivial one

Code Snippet
  1. Select CustomerId from CustomerOrders
  2. Where OrderDate >= Cast(GetDate as Date)
  3. Select CustomerId from CustomerOrders
  4. Where OrderDate >= Cast(DateAdd(dd,-1, GetDate()) as Date)

 

“Write an SQL query to select the nth row from a table. I asked the interviewer that, can I assume say the nth record is based on a primary/unique key column or any column, he said no, the question is just select the nth row.”

The interviewer is ignorant (that is the polite way to say it!). Unless we are talking the nth row according to a clustered index on the table, it’s as meaningful as asking for the nth byte in memory of the OS. The order of rows in a table without a clustered index is volatile. Two queries 1 second apart of the same code may give different results. Never the less, a simple implementation is just:

Code Snippet
  1. SELECT TOP 21 * FROM SalesLt.Product
  2. EXCEPT
  3. SELECT TOP 20 * FROM SalesLt.Product

However, there is a possibility of more than 1 row being returned…

Retrieve second highest salary and the corresponding Employee Id from a Employee_Salary table.

Many ways of doing this one -- with many of them failing on boundary conditions. The following simple solution is suggested.

Code Snippet
  1. SELECT TOP 2 Employee_ID,Salary FROM Employee order by salary desc
  2. EXCEPT
  3. SELECT TOP 1 Employee_ID,Salary FROM Employee order by salary desc

 

A boundary condition that could happen is two employees with the highest salary,  many solutions that I have seen will actually give the 3rd highest salary in this case! Oooopps. The above code will not. Smile.

A table has only one column. Retrieve all the data which are different from their previous data.

Another idiot question – likely a C# developer trying to cast a C# question (which it bears a striking resemblance to) into a database context and literally blowing it. In RDBMS theory, there is NO NATURAL ROW ORDER (something that some C++ and C# and Java types just do not comprehend!).

 

Well, here’s a solution:  Put it into a working table adding an identity column and then just compare to prior record. The very first row is always included (it is different than no row!) which is a boundary condition that may be missed. Again, two executions of this query may return different results.

 

Code Snippet
  1. Declare @workingspace table(id int Identity(1,1), data varchar(max))
  2. INSERT INTO @workingspace (data)
  3.     SELECT LastName from SalesLt.Customer
  4. Select top 1 data from @Workingspace
  5. UNION
  6. Select A.data from @workingspace A
  7. JOIN @workingspace B ON A.ID=B.Id-1

 

Any questions that deal with difference in versions of SQL Server are IMHO, bad questions. You are not evaluating their ability/skills to do the job, rather a quasi-rote factor.

 

FYI: Some questioners excludes the use of TOP (not standard sql), this makes the question unrealistic and thus inappropriate. If TOP is excluded then use the above pattern to number the rows and run with it.

 

Some better Basic questions

  • Give some code that uses cursors and ask for it to be rewritten without using cursors
  • Explain the difference between using:
    • Create Table #temp
    • Declare @Temp Table
  • Give some XML and ask them to perform an upsert of a SQL Table from it.
  • Ask them to give an example of CROSS APPLY. Or given them a question where cross apply is the easiest solution.

For some good questions (31 days of them!)

See http://blog.sqlauthority.com/2011/07/01/sql-server-interview-questions-and-answers-frequently-asked-questions-complete-downloadable-list-day-0-of-31/

Tech Qu: Removing duplicates from an integer array

Write two algorithms to remove duplicates from an integer array.

 

There are many solutions including sorting the array and then finding duplicates. My preference would be the following ones

Code Snippet
  1. public static int[] NoDuplicate1(int[] arr)
  2. {
  3.   var map = new HashSet<int>();
  4.   foreach (var i in arr)
  5.     map.Add(i);
  6.   return map.ToArray();
  7. }
  8. public static int[] NoDuplicate2(int[] arr)
  9. {
  10.   var map = new Queue<int>();
  11.   foreach (var i in arr)
  12.     if (!map.Contains(i))
  13.       map.Enqueue(i);
  14.   return map.ToArray();
  15. }

 

And testing:

Code Snippet
  1. int[] a = { 6,5,4,3,1,2,1, 2, 3, 4, 5, 6, 5, 6 };      
  2. var test = Questions.NoDuplicate1(a);
  3. foreach (var i in test)
  4. {
  5.   Console.WriteLine(i);
  6. }
  7. test = Questions.NoDuplicate2(a);
  8. foreach (var i in test)
  9. {
  10.   Console.WriteLine(i);
  11. }

 

Both approaches had the sequence maintained of the first integer found in the array.

image

Thursday, July 21, 2011

Tech Qu: Intersect two arrays

Write the code to find the intersection of two arrays, a,b.

 

The code can be very simple (or horribly complex) from the right perspective:

Code Snippet
  1. public static int[] IntersectionArray(int[] arr, int[] arr2)
  2. {
  3.   var result = new HashSet<int>();
  4.   var map = new HashSet<int>();
  5.   foreach (var i in arr)
  6.   //  if (!map.Contains(i))
  7.       map.Add(i);
  8.   foreach (var i in arr2)
  9.     if (map.Contains(i) )// && !result.Contains(i))
  10.       result.Add(i);
  11.   return result.ToArray();
  12. }

 

I have included some unneeded condition tests as comments above (HashSet does not retain duplicates and it is assumed that an explicit test would be more expensive then letting the hashset do it itself).

 

Unit testing is trivial

Code Snippet
  1. int[] a = { 1, 2, 3, 4, 5, 6, 5, 6 };
  2. int[] b = { 3, 4, 3, 4, 3, 4, 3, 4, };
  3. var test = Questions.IntersectionArray(a, b);
  4. foreach (var i in test)
  5. {
  6.   Console.WriteLine(i);
  7. }

The results in just {3,4}

Random number generator from binary input

This question is likely if you are dealing with quantum randomizers, etc. It’s irrelevant if there is an appropriate function in the language.

You are given a function that generates 0 and 1 with equal probability. Write a function that uses the above function to generate any n(1<=n<=1000) so that probability of producing any number between 1 to 1000 is equal and is 1/1000.

The solution is simple and fast using bit shifting (which gives performance, *2 could be used but that will raise flags with many interviewers)

Code Snippet
  1. static Random _Ran = new Random();
  2. public static int ZeroOne()
  3. {
  4.   return _Ran.Next(2);
  5. }
  6. /// <summary>
  7. /// We bitshift for optimal performance
  8. /// </summary>
  9. /// <returns></returns>
  10. public static int Random1to1000()
  11. {
  12.   var num = 0;
  13.   var outofrange = true;
  14.   while (outofrange)
  15.   {
  16.     num = ZeroOne() << 1;
  17.     for (int i = 0; i < 8; i++)
  18.     {
  19.       num = (num | ZeroOne()) << 1;
  20.     }
  21.     num = num | ZeroOne();
  22.     outofrange = num > 1000 || num < 1;
  23.   }
  24.   return num;
  25. }

Often the ZeroOne would be put internal to the Random1to1000().  If this is done, then to ask for it to be moved to an external function ZeroOne() will usually reveal whether there is understanding of Random issues (i.e. initializing Random inside of ZeroOne will not only result in poorer performance, but will also loose randomness!).

 

The testing is simple:

Test Code
  1. var cnt = new int[1002];
  2. for (var i = 1; i < 10000; i++)
  3. {
  4.   var num = Questions.Random1to1000();
  5.   cnt[num]++;
  6. }
  7. for (var i = 0; i < cnt.Length; i++)
  8. {
  9.   Console.WriteLine(string.Format("{0}: {1}", i, cnt[i]));
  10. }
  11. Console.WriteLine(string.Format("{0}: {1}", 0, cnt[0]));
  12. Console.WriteLine(string.Format("{0}: {1}", 1, cnt[1]));
  13. Console.WriteLine(string.Format("{0}: {1}", 1000, cnt[1000]));
  14. Console.WriteLine(string.Format("{0}: {1}", 1001, cnt[1001]));

 

The results allows spot testing:

image

Often people will assume that zero(0) will not occur from the code (to see a lot of code that allows a zero to be returned, see this).

The bizarreness of coding questions–it is time to cut the garbage?

There have been a trend in the industry to evolve to pro-forma interviews running off stock questions that are actually rote based and often irrelevant to the position. 

 

For an example that may make sense to you:

Asking for any job that require writing emails, the following questions dealing with literacy:

  • What is a Oxford Comma?
  • What is a split infinitive and give me an example of a valid exception to the rule.
  • What is the difference between obtuse and abstruse?
  • What is the collective noun for a group of crows? (murder); quail? (covey); magpies? (tidings), etc

If someone fails these questions, people could assert they are low literacy, and thus not qualified. In reality a person that aces these questions may write incomprehensible emails, while someone that funks these questions may write elegant clear emails.

 

In terms of coding questions, for a C# position you may get asked about single linked lists and how to do an AddBefore or other silliness. System.Collections.Generic.LinkedList is the standard linked list implementation and it is in that class, so no rational developer would ever code it -- it's a test for irrelevant information (rote learning from computer science courses) that would rarely (if ever) occur doing the job.

 

image

 

Testing for irrelevant information or skills not required to actually perform the job, borders on discrimination in some cases. If the question is one that a recent graduate from a university can easy answer, but a graduate from a different generation may not (for example, object orientated programming did not exist when I went through University), the issue of age discrimination may arise.

 

The root problem is that many interviewers do not know how to do test/interview construction. When I did my education courses (with a bunch of fellow math nerds), we were ruthless in identifying what each question on a test actually tested. Many questions were actually rote, consider this:

  • Sum 5+7+9+…. + 2343

This is a rote question for 98% of the students that requires memorizing this formula.

 S_n=\frac{n}{2}( a_1 + a_n).

 

The best interview questions that I have seen are when the interviewer asks problems that they are trying to solve themselves. There is no answer to compare to. It does exhibit the person ability to dissect a problem and do analysis.

 

Asking puzzle, math and computer science problems often detects nothing more than whether they have been exposed to this problem (for example, on http://www.careercup.com or http://www.glassdoor.com/ ) before and remember the solution or solution pattern (like the sum of an arithmetic series above).

 

To quote one of my old profs (Head of the  Bank of Israel IT Department), “You can teach a physicist or a mathematician to code, you rarely can teach a coder to think!”.

 

As a consultant, I have never been asked to answer such questions in dozens of gigs (including to Microsoft). On the rare occasions that I have considered a FTE position and been interviewed, I have been hit with such questions about 30% of the time. It’s interesting that my reaction to such questions is often to be put off the position because it telegraphs that bureaucrazy and politics may be problematic in the firm.

 

So why are you interviewing rote skills and not thinking skills?

Using Elmah with Azure Table Storage

Article Summary
This article will explain how to extend Elmah to log to and view errors from Windows Azure Table Storage. 

Introduction
Elmah is an exception handling and logging tool that plugs into ASP.NET and ASP.NET MVC applications. When an error is thrown, Elmah grabs all the information including the stack trace, server variables, and query string. Then this information is entered into the data container of your choice. When you want to review these exceptions, the tool provides a web interface to display the errors.

clip_image002

Your Windows Azure Usage
This article assumes you already have a Windows Azure account and know how to manage data in Azure Table Storage. I use either the Visual Studio Server Explorer or the Azure Storage Explorer (codeplex) to look at the tables. A longer list of Storage Viewer applications is referenced at the bottom of this article.

clip_image004
Visual Studio Server Explorer

clip_image006
Azure Storage Explorer

Steps to Connect Elmah to Windows Azure Table Storage
The Elmah download provides extensions that allow you to store your errors in various containers, including SQL Server and XML files. However, it doesn’t currently allow you to store your errors in Windows Azure Table Storage. Conveniently you can extend the Elmah framework and program your own storage containers. This article post will show you how to extend Elmah for Windows Azure Table Storage.

In order to get the Elmah code base to work with Windows Azure Table Storage, several steps have to be completed:
  1. Download the Elmah framework to get the Elmah.dll file.
  2. Create a class library project implementing Elmah for Windows Azure Table Storage. We will show you how in the next section.
  3. Reference the newly created class library into an existing ASP.NET or ASP.NET MVC application along with the Elmah framework class libraries.
  4. Configure web.config to use the Elmah library.
  5. Verify/Test class library works by throwing error then view it in the Elmah Web tool.
This article will focus on the last three. After you download Elmah, move to the next step.

Create a New Class Library Project
In your solution, add a new class library project. This project will produce the library that your web application will reference. When the new library is finally built, you will reference it in your web.config.

Make sure to add Elmah.dll and Microsoft.WindowsAzure.StorageClient.dll as references to this new class library project.

We have to create two classes to make Elmah work. The first is the entity (model) which is translated into columns in the Windows Azure Table. The second class is the Error Log extension which Elmah instantiates . The Error log extension needs three methods at the very least: Log() (add error to table), GetError() (select error from table), GetErrors() (select all errors of pagesize at pageindex). I used LINQ as the expression engine in this sample application to access Windows Azure Table Storage

My project manages the Azure Table connection in the constructor.

Create an entity for the Elmah data model
ElmahEntity.cs

namespace ElmahAzureTableStorage
{
    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using Microsoft.WindowsAzure.StorageClient;

    /// <summary>
    /// Descriptions of the Entity Written To Windows Azure Table Storage
    /// </summary>
    public class ErrorEntity : TableServiceEntity
    {
        [System.Obsolete("Provided For Serialization From Windows Azure Do No Call Directly")]
        public ErrorEntity()
        {
        }

        /// <summary>
        /// Initialize a new instance of the ErrorEntity class. 
        /// </summary>
        /// <param name="timeUtc"></param>
        /// <param name="Id"></param>
        public ErrorEntity(DateTime timeUtc, Guid Id)
            : base(ErrorEntity.GetParitionKey(timeUtc), ErrorEntity.GetRowKey(Id))
        {
            this.TimeUtc = timeUtc;
            this.Id = Id;
        }

        /// <summary>
        /// Given a DateTime Return a Parition Key
        /// </summary>
        /// <param name="time"></param>
        /// <returns></returns>
        public static string GetParitionKey(DateTime time)
        {
            return time.ToString("yyyyMMddHH");
        }

        /// <summary>
        /// Given a Error Identifier Return A Parition Key
        /// </summary>
        /// <param name="id"></param>
        /// <returns></returns>
        public static string GetRowKey(Guid id)
        {
            return id.ToString().Replace("-", "").ToLower();
        }

        /// <summary>
        /// Unique Error Identifier
        /// </summary>
        public Guid Id { get; set; }

        /// <summary>
        /// Get or set
        /// </summary>
        public string HostName { get; set; }

        /// <summary>
        /// Get or set
        /// </summary>
        public string Type { get; set; }

        /// <summary>
        /// Get or set
        /// </summary>
        public string Source { get; set; }

        /// <summary>
        /// Get or set
        /// </summary>
        public string Message { get; set; }

        /// <summary>
        /// Get or set
        /// </summary>
        public string User { get; set; }

        /// <summary>
        /// Get or set
        /// </summary>
        public DateTime TimeUtc { get; set; }

        /// <summary>
        /// Get or set
        /// </summary>
        public int StatusCode { get; set; }

        /// <summary>
        /// Get or set
        /// </summary>
        public string ErrorXml { get; set; }
    }
}




Create an Elmah Error Log Extension
This is the code that will eventually be called when an error is thrown in your app.

The private read-only string “tablename” contains the Azure table name (“Elmah”) and the constructor reads the web.config of the web site project (not the class library project) for the Azure Table Storage connection string.

The web page to display the captured errors returns in reverse chronology on purpose. There is a performance hit for returning in reverse chronology order because we need to fetch all the rows from Windows Azure Table Storage in order to sort them but for my purposes, I want to see the newest errors first.

ElmahProvider.cs
namespace ElmahAzureTableStorage
{
    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.Collections.Specialized;
    using System.Data.Services.Client;
    using System.Text;
    using System.Linq;
    using Microsoft.WindowsAzure.StorageClient;
    using Elmah;
    using Microsoft.WindowsAzure;
    using System.Text.RegularExpressions;
    using System.Xml;

    public class WindowsAzureErrorLogs : ErrorLog
    {
        /// <summary>
        /// Table Name To Use In Windows Azure Storage
        /// </summary>
        private readonly string tableName = "Elmah";

        /// <summary>
        /// Cloud Table Client To Use When Accessing Windows Azure Storage
        /// </summary>
        private readonly CloudTableClient cloudTableClient;

        /// <summary>
        /// Initialize a new instance of the WindowsAzureErrorLogs class.
        /// </summary>
        /// <param name="config"></param>
        public WindowsAzureErrorLogs(IDictionary config)
        {
            if (!(config["connectionString"] is string))
            {
                throw new Elmah.ApplicationException("Connection string is missing for the Windows Azure error log.");
            }

            if (string.IsNullOrWhiteSpace((string)config["connectionString"]))
            {
                throw new Elmah.ApplicationException("Connection string is missing for the Windows Azure error log.");
            }

            CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse((string)config["connectionString"]);
            this.cloudTableClient = cloudStorageAccount.CreateCloudTableClient();

            this.cloudTableClient.CreateTableIfNotExist(this.tableName);
        }


        /// <summary>
        /// 
        /// </summary>
        /// <param name="error"></param>
        /// <returns></returns>
        public override string Log(Error error)
        {
            ErrorEntity entity = new ErrorEntity(error.Time, Guid.NewGuid())
            {
                HostName = error.HostName,
                Type = error.Type,
                ErrorXml = ErrorXml.EncodeString(error),
                Message = error.Message,
                StatusCode = error.StatusCode,
                User = error.User,
                Source = error.Source
            };

            TableServiceContext tableServiceContext = this.cloudTableClient.GetDataServiceContext();
            tableServiceContext.AddObject(this.tableName, entity);
            tableServiceContext.SaveChanges();

            return entity.Id.ToString();
        }

        /// <summary>
        /// Get a Error From Windows Azure Storage
        /// </summary>
        /// <param name="id">Error Identifier (Guid)</param>
        /// <returns>Error Fetched (or Null If Not Found)</returns>
        public override ErrorLogEntry GetError(string id)
        {
            TableServiceContext tableServiceContext = this.cloudTableClient.GetDataServiceContext();

            var query = from entity in tableServiceContext.CreateQuery<ErrorEntity>(this.tableName).AsTableServiceQuery()
                        where ErrorEntity.GetRowKey(Guid.Parse(id)) == entity.RowKey
                        select entity;

            ErrorEntity errorEntity = query.FirstOrDefault();
            if (errorEntity == null)
            {
                return null;
            }

            return new ErrorLogEntry(this, id, ErrorXml.DecodeString(errorEntity.ErrorXml));
        }

        /// <summary>
        /// Get A Page Of Errors From Windows Azure Storage
        /// </summary>
        /// <param name="pageIndex">Page Index</param>
        /// <param name="pageSize">Size Of Page To Return</param>
        /// <param name="errorEntryList">List of Errors Returned</param>
        /// <returns>Total Count of Errors</returns>
        public override int GetErrors(int pageIndex, int pageSize, System.Collections.IList errorEntryList)
        {
            if (pageIndex < 0)
                throw new ArgumentOutOfRangeException("pageIndex", pageIndex, null);

            if (pageSize < 0)
                throw new ArgumentOutOfRangeException("pageSize", pageSize, null);

            TableServiceContext tableServiceContext = this.cloudTableClient.GetDataServiceContext();

            // WWB: Server Side Call To Get All Data
            ErrorEntity[] serverSideQuery = tableServiceContext.CreateQuery<ErrorEntity>(this.tableName).AsTableServiceQuery().Execute().ToArray();

            // WWB: Sorted in Reverse Order So Oldest are First
            var sorted = serverSideQuery.OrderByDescending(entity => entity.TimeUtc);

            // WWB: Trim To Just a Page From The End
            ErrorEntity[] page = sorted.Skip(pageIndex * pageSize).Take(pageSize).ToArray();

            // WWB: Convert To ErrorLogEntry classes From Windows Azure Table Entities
            IEnumerable<ErrorLogEntry> errorLogEntries = page.Select(errorEntity => new ErrorLogEntry(this, errorEntity.Id.ToString(), ErrorXml.DecodeString(errorEntity.ErrorXml)));

            // WWB: Stuff them into the class we were passed
            foreach (var errorLogEntry in errorLogEntries)
            {
                errorEntryList.Add(errorLogEntry);
            };

            return serverSideQuery.Length;
        }
    }
}


Changes to Web.Config
Make sure to alter the <errorlog> to contain the correct endpoints, account name, and account key.

Notice that the only reference to this new Elmah Azure table storage library is in <elmah><errorlog>. The rest of the references to elmah are to the elmah library itself.

Elmah captures the errors then hooks up to the new elmah Azure table storage library in order to insert the error into the Azure table. This is true for either inserting into the table or reading information out for the error log web page.


<!-- Begin: Add for Elmah (child of <configuration>-->
  <configSections>
    <sectionGroup name="elmah">
      <section name="security" requirePermission="false" type="Elmah.SecuritySectionHandler, Elmah" />
      <section name="errorLog" requirePermission="false" type="Elmah.ErrorLogSectionHandler, Elmah" />
      <section name="errorMail" requirePermission="false" type="Elmah.ErrorMailSectionHandler, Elmah" />
      <section name="errorFilter" requirePermission="false" type="Elmah.ErrorFilterSectionHandler, Elmah" />
    </sectionGroup>
  </configSections>
  <elmah>
    <security allowRemoteAccess="yes" />
    <errorLog type="ElmahAzureTableStorage.WindowsAzureErrorLogs, ElmahAzureTableStorage" 
              connectionString="DefaultEndpointsProtocol=https;AccountName=YOURACCOUNTNAME;AccountKey=YOURACCOUNTKEY" />
  </elmah>
  <!-- Begin: End for Elmah –>
<!-- Begin: Add for Elmah (child of <system.web>) -->
    <httpHandlers>
      <add verb="POST,GET,HEAD" path="elmah.axd" type="Elmah.ErrorLogPageFactory, Elmah" />
    </httpHandlers>
    <httpModules>
      <add name="ErrorLog" type="Elmah.ErrorLogModule, Elmah"/>
    </httpModules>
    <!-- Begin: End for Elmah -->
<!-- Begin: Add for Elmah (child of <system.webServer>)-->
    <handlers>
      <add name="Elmah" verb="POST,GET,HEAD" path="elmah.axd" type="Elmah.ErrorLogPageFactory, Elmah" />
    </handlers>
    <modules runAllManagedModulesForAllRequests="true" >
      <add name="ErrorLog" type="Elmah.ErrorLogModule, Elmah" />
    </modules>
    <!-- Begin: End for Elmah -->

Build and Test

At this point, the Elmah Windows Azure Table Storage extension should be hooked up and the brand new web site should build and display.
clip_image008

The easiest test is a 404/File Not Found. At the URL bar, enter a short, fake web page such as http://localhost/ERRORPAGE. Since there is no page or controller for this, it should return an error page. Since no other extensions or handlers have been added to the web site, the generic error displays:

clip_image010

The point of displaying this is to show that Elmah doesn’t fix your display after an unhandled exception. You must do that yourself. Start with this article for gracefully responding to unhandled exceptions.

In order to see what Elmah and the Azure Table Storage Elmah extension did, Check the log to see what happened by going to your web site [http://localhost/elmah.axd].

clip_image012


What did Elmah do?
Elmah intercepted the unhandled exception, regardless of type or where it was thrown, and captured the relevant information for you, the developer, to review. Elmah captured the error but didn’t handle it at all – you must do that.

There are plenty of articles out there to make Elmah your own: MVC-ish, capture client-side errors, etc.

Additional Reading Material
· Elmah Codeplex Project & Wiki Page
· Windows Azure Storage Team Blog
· How to get most out of Windows Azure Tables
· Windows Azure Storage Explorers
· Using HTTP Modules and Handlers to Create Pluggable ASP.NET Components


The Downloadable Sample Project
The sample app at the bottom of the article was built using Visual Studio 2010 Professional and Elmah v 1.2.13605.0. Make sure you unblock the following dlls in the ElmahToAzureTableStorage project directory: Elmah.dll & Microsoft.WindowsAzure.StorageClient.dll.

Sample App
ElmahToAzureTableStorage.zip

Sample App on MSDN
This sample app is also hosted on MSDN: http://code.msdn.microsoft.com/windowsazure/Elmah-Exception-Handling-c6077265

Wednesday, July 20, 2011

Tech Qu: Finding the minimum of the maximum difference.

Given 3 arrays A,B,C find three indices i,j,k such that the value of I = max((a-b),(b-c),(c-a)) is minimized.

i.e.

  • a = A[i]
  • b = B[j]
  • c = C[k]

This is one of those problems that you can accidentally head down a rabbit hole. Remember the KISS principle. Instead of trying to do an elegant solution, write a direct solution and then attempt optimize it if need be.

 

Code Snippet
  1. public static string MinOf3Arrays(int[] a, int[] b, int[] c)
  2.     {
  3.       var results = "no result";
  4.       var min = int.MaxValue;
  5.       foreach (var ai in a)
  6.         foreach (var bi in b)
  7.           foreach (var ci in c)
  8.           {
  9.             var max = Math.Max(Math.Max(ai - bi, bi - ci), ci - ai);
  10.             if (max < min)
  11.             {
  12.               min = max;
  13.               results = string.Format("{3} Min from: A@i={0}, B@j={1}, C@k={2}", ai, bi, ci, min);              
  14.             }
  15.           }
  16.       return results;
  17.     }

 

And a simple test:

Code Snippet
  1. int[] a = { 8, 19, 31, 38, 65, 85, 110, 150 };
  2. int[] b = { 24, 51, 90, 130, 140, 180, 190, 220, 250, 260 };
  3. int[] c = { 24, 51, 90, 130, 140, 180, 190, 220, 250, 260 };      
  4. Console.WriteLine(Questions.MinOf3Arrays(a, b, c));

The answer came out as:
5 Min from: A@i=19, B@j=24, C@k=24

Seeing the answer suggests that the maximum minimum difference between elements of two arrays is what you need to search for. Deducing this algorithm from the statement would be very difficult for most people, but once the first solution is done and tested, it can be seen easily. This would reduce the search time:

  • from a.Length * b.Length * c.Length 
  • to a.Length* b.Length + a.Length*c.Length+ b.Length*c.length

If a,b,c are known to be sorted, then we can further improve performance to

  • to (a.Length+ b.Length) + (a.Length+c.Length)+ (b.Length+c.length)

With a little more complex coding, you can half this again to:

  • a.Length+ b.Length + c.Length

Tech Qu: Merge two sorted arrays

Sorted Array Merge Problem: Design an algorithm to merge two sorted positive integer arrays A and B. Size of array A is n and size of array B is m. Array A has m empty blocks(-1) at random places. Merge array B into A such that array A is sorted and does not have empty blocks.

 

The approach is simple and reminds me of my old tape handling days. Would be fun if someone describes the problem as two data tapes and see where the reader goes.

 

A good start would be to define test cases (Boundary cases)

  • all of B occurs before any of A.
  • all of B occurs after all of A

A simple routine would be to compact A so all of the empty blocks are at the front and then just walk both arrays, picking the smallest number from each array to fill in the resulting array.

 

Code Snippet
  1. public static int?[] MergeSorted(int?[] a, int?[] b)
  2. {
  3.   var ai = a.Length - 1;
  4.   var bi = 0;
  5.   var mi=0;
  6.   // Move all nulls to end
  7.   for(int i=a.Length-1; i > -1;i--)
  8.     if(a[i] != null)
  9.     {
  10.       a[ai] = a[i];
  11.       if(ai!=i)
  12.         a[i] = null;
  13.       ai--;          
  14.     }
  15.     //point to first non-empty one
  16.   ai++;
  17.   for(var i=0; i < a.Length; i++)
  18.   {
  19.     if (ai < a.Length && bi < b.Length)
  20.     {
  21.       if (a[ai] < b[bi])
  22.       {
  23.         a[i] = a[ai++];
  24.       }
  25.       else
  26.       {
  27.         a[i] = b[bi++];
  28.       }
  29.     }
  30.     else
  31.     if (bi < b.Length)
  32.     {          
  33.         a[i] = b[bi++];
  34.     }
  35.   }
  36.   return a;
  37. }

 

Unit testing is simple:

Code Snippet
  1. int?[] a = {null, null, 1, 3,null,null};
  2.       int?[] b = { 3,4,5,6};
  3.       var result = Questions.MergeSorted(a, b);
  4.       foreach(var i in result)
  5.       {
  6.         Console.WriteLine(i);
  7.       }

 

Complexity is O(n) and no extra space is needed.

Tech Qu: Write a function to do RLE (Run Length Encoding)

  • Given a string, compress using run-length encoding. Example:
    Input: aaabbac
    Output: a3b2ac

First Pass – simplistic solution

Code Snippet
  1. /// <summary>
  2.  /// Assuming no digits in string.
  3.  /// </summary>
  4.  /// <param name="s"></param>
  5.  /// <returns></returns>
  6.  public static string RLEEncode1(string s)
  7.  {
  8.    var sb = new StringBuilder();
  9.    int count = 1;
  10.    char current = s[0];
  11.    for (int i = 1; i < s.Length; i++)
  12.    {
  13.      if (current == s[i])
  14.      {
  15.        count++;
  16.      }
  17.      else
  18.      {
  19.        if (count > 1)
  20.          sb.AppendFormat("{1}{0}", count, current);
  21.        else
  22.          sb.Append(current);
  23.        count = 1;
  24.        current = s[i];
  25.      }
  26.    }
  27.    if (count > 1)
  28.      sb.AppendFormat("{1}{0}", count, current);
  29.    else
  30.      sb.Append(current);  
  31.    return sb.ToString();
  32.  }

A better solution is to assume that there are no characters > 127 in the string and using the range 128-255 as counter indicators. This solution is shown below.

 

Code Snippet
  1. public static string RLEEncode2(string s)
  2.     {
  3.       var sb = new StringBuilder();
  4.       int count = 1;
  5.       char current = s[0];
  6.       for (int i = 1; i < s.Length; i++)
  7.       {
  8.         if (current == s[i])
  9.         {
  10.           count++;
  11.           if(count > 127)
  12.           {
  13.               
  14.               sb.Append(current);
  15.               sb.Append((char)(count + 127));
  16.               count = 1;
  17.           }          
  18.         }
  19.         else
  20.         {
  21.           if (count > 1)
  22.           {
  23.             
  24.             sb.Append(current);
  25.             sb.Append((char)(count + 127));
  26.           }
  27.           else
  28.             sb.Append(current);
  29.           count = 1;
  30.           current = s[i];
  31.         }
  32.       }
  33.       if (count > 1)
  34.       {
  35.         {
  36.           
  37.           sb.Append(current);
  38.           sb.Append((char)(count+127));
  39.         }          
  40.       }
  41.       else
  42.         sb.Append(current);  
  43.       return sb.ToString();
  44.     }

Inspecting the string, we see the result:

[0]    97 'a'    char
[1]    130 '‚'    char
[2]    98 'b'    char
[3]    129 ''    char
[4]    97 'a'    char
[5]    99 'c'    char

Thursday, July 14, 2011

Tech Qu: Find the Median/Percentile value of a table

Everyone knows how to find the average of a column, asking to find a median (technically the 50percentile) in TSQL  can be an exercise for some. A simple solution is shown below.

 

Code Snippet
  1. DECLARE @Percentile int=50
  2. DECLARE @PercentileTop int
  3. DECLARE @PercentileValue money
  4. SELECT @PercentileTop =count(1)
  5.   FROM [SalesLT].[Product]
  6.   WHERE LISTPRICE IS NOT NULL
  7. SET @PercentileTop =(@PercentileTop * @Percentile)/100
  8.   SELECT TOP(@PercentileTop)@PercentileValue=[ListPrice]      
  9.   FROM [SalesLT].[Product]
  10.   WHERE LISTPRICE IS NOT NULL
  11.   ORDER BY LISTPRICE
  12. SELECT @PercentileValue

 

The next question would be to encapsulate this for reuse – I will leave that for people to comment on. Let us see how creative our readers can be!

Monday, July 11, 2011

Expanding your language patterns: R

R (“GNU S”) is  a sweet language and environment for statistical computing and graphics. R is similar to the award-winning S system, which was developed at Bell Laboratories by John Chambers et al. It is also very similar to my first taught language APL/360 except it does not require special characters.

 

You can install it from http://www.r-project.org/, there is a Windows version which comes with it’s own UI.

 

As simple example.  You want to calculate   1/1 +1/2+ …. 1/1000.  A verbose solution may be:

 

> a1 <- 1:1000
> sum(1/a1)
[1] 7.485471

 

A one-liner (which APL is infamous for) would be:

> sum(1/1:1000)
[1] 7.485471

 

You can also do some really interesting stuff, like finding the sum of all the square roots from –1 to -45

 

a1 <- -1:-45
> a2 <- a1+ 0i
> sum(sqrt(a2))
[1] 0+204.3985i

 

Yes, complex number support is native!

 

Often in programming you end up with a hyped-focused on a particular language pattern such as compiled object orientated in-fix languages. Languages such as R or F# or Prolog may be incomprehensible. If you become familiar with different ways of doing things, you may find that your solutions become more elegant. Often you will look at a problem and see a simple solution in another language pattern; this may result in an atypical elegant solution. Thinking outside of the silo.

 

Download it and walk through some tutorials.  You may find R’s way of doing things very refreshing!

Saturday, July 9, 2011

Tech Qu: Find the numbers that occurs an odd number of times in a huge array of integers.

This is a more complex version of a reported Amazon question, where there is only a single one. There are two questions based on constraints:

  • Minimize the time to get the array of numbers.
  • Minimize the memory requirements.

Minimize the time to get the array of numbers: Step 1

The first step of the solution is to use two bit maps :

  1. Map A is Int.MaxValue +1 in size
  2. Map B is Int.MinValue in size

Both are initialized to zero (the default Smile ). Walk the array.

  • If not negative, toggle Map A
  • If negative, take the negative value (you should raise the issue that the language should be tested for the fastest way to do this, i.e. Math.Abs(), using binary operations, x=-x, etc).
  • NOTE: We could use 1 bitmap, but this means that we will need to do an arithmetic operation for 50% of the array. In general bit operations perform better, hence the design should be bit-operation centric.

Once the array has been walked, then walk the bitmaps for non-zero values.  This gets a solution but it is not the fastest way. Magnitude is O(n)

Minimize the time to get the array of numbers: Step 2

Step two is to split the array and hand each array to a different thread (good time to show off your multiple threading expertise). Each thread hands back the two bitmaps. The bitmaps are then XOR and then walk the bitmaps for non-zero values.

This is a better solution, but is cannot be faster than Step 1 Time/Number of Cores in the machine.

Magnitude is O(n/cores) i.e. still  O(n)

Minimize the time to get the array of numbers: Step 3

Instead of handing off to different threads, toss portions of the array into a cloud of CPUS (infinitely large potentially!).  Once a portion have been completed it is returned. The bitmaps are then XOR and then walk the bitmaps for non-zero values.  The size of the portions may need to be determine experimentally (i.e. how long does it take to move the array to the cloud and return the bitmap is a significant factor). If the optimal sizing is N items in an array shipped to the cloud, then the time saving for an array of M is approximately N/M less. This actually results in a solution that is O(1).  You can arbitrarily increase the size of the array with the processing time being relatively constant. The time to start the cloud components is O(n), but that is expected to be very minor compared to the time to walk the array which is O(1).

 

To speed operation completion, you may toss the same array segment to two or more cloud CPUs, thus reducing your variation of time and shifting expected time towards the fastest time of any cloud CPU.

 

Minimize the memory requirements

This solution is not fast, but it is compact.

  • Sort the array (best time is of the order O( log2 n * n ) see wikipedia).
  • Walk the resulting order array to determine the numbers that occur an odd number of times.

Bottom Line

The difference between finding a solution that is close to O(1) instead of O( log2 n * n ) or  O(n) is significant. The above solution is cloud-centric, which for an Amazon (or Azure) question is likely a better solution because it shows that the person can think in the cloud!

Friday, July 8, 2011

Test Qu: Division and Remainder

Given two numbers n and m, divide n by m without using the division operator. Return both the integer answer as well as the remainder.

This actually can get messy depending on your understanding of “Remainder”. One solution is shown below.

Code Snippet
  1. public static int DivideWithRemainder(int top, int bottom, out int count)
  2.     {
  3.       count = 0;
  4.       if (bottom == 0)
  5.       {
  6.         throw new DataException("cannot divide by zero");
  7.       }
  8.       bool sign = (top > 0 && bottom < 0)
  9.        || (top < 0 && bottom > 0);
  10.  
  11.       top = Math.Abs(top);
  12.       bottom = Math.Abs(bottom);
  13.  
  14.       while (top > bottom)
  15.       {
  16.         count++;
  17.         top -= bottom;
  18.       }
  19.       if (sign )
  20.       {
  21.         count = -count;
  22.       }
  23.       return top;
  24.     }

 

And for unit test:

  • Avoid the know boundary condition in the 10000 loop
  • Test explicitly for boundary condition afterwards.

 

Code Snippet
  1. var rnd = new Random();
  2. var multiple = 0;
  3. for (var i = 0; i < 10000; i++)
  4. {
  5.   var top = rnd.Next() - int.MaxValue / 2;
  6.   var bottom = rnd.Next() - int.MaxValue / 2;
  7.   if (bottom != 0)
  8.   {
  9.     var answer = Questions.DivideWithRemainder(top, bottom, out multiple);
  10.     // Remainder may not be negative, so we need to adjust modulus
  11.     // See http://en.wikipedia.org/wiki/Division_algorithm
  12.     if (Math.Abs(top % bottom) != answer)
  13.       throw new DataException("Unit test failed on remainder");
  14.     if (multiple != top / bottom)
  15.       throw new DataException("Unit test failed on result");
  16.   }
  17. }
  18. try
  19. {
  20.   var test = Questions.DivideWithRemainder(1, 0, out multiple);
  21.   Console.WriteLine("Failed Divide by zero test");
  22. }
  23. catch (Exception exc)
  24. {
  25.   Console.WriteLine("Passed Divide by zero test");
  26. }