Monday, November 16, 2015

Effortless getting data out of a JSON REST response

One of the pains with using REST is getting the data from JSON into something usable. There is a very simple solution:  Take the JSON, pass it to a magic black box and get a dataset back that has foreign keys and other joys.   That sounds very nice --

  • Make the JSON REST call and then
  • Query or filter data tables to do future processing. No need to define classes to deserialize the JSON into.....
The code is horrible and shown below...
using System;
using System.Linq;
using System.Data;
using System.Xml;
using Newtonsoft.Json;
namespace Avalara.AvaTax.JsonUtilities
{
    public static class Utility
    {
        public static DataSet ConvertJsonToDataSet(string jsonText, string rootElementName)
        {
            var xd1 = new XmlDocument();
            xd1 =JsonConvert.DeserializeXmlNode( jsonText,rootElementName);
            var result = new DataSet();
            result.ReadXml(new XmlNodeReader(xd1));
            return result;
        }  
    }
}

To put this into a fuller context, consider the code below that does a get to any REST JSON url and returns a dataset

public static DataSet GetDataSet(string url, string rootName = "myroot")
{
    var webRequest = (HttpWebRequest)WebRequest.Create(url);
    webRequest.Method = "GET";
    webRequest.ContentType = "application/json, *.*";
    webRequest.UserAgent = "Mozilla/5.0 (Windows NT 5.1; rv:28.0) Gecko/20100101 Firefox/28.0";
    webRequest.Headers.Add("AUTHORIZATION", Authorization);
    var webResponse = (HttpWebResponse)webRequest.GetResponse();
    if (webResponse.StatusCode != HttpStatusCode.OK) Console.WriteLine("{0}", webResponse.Headers);
    var json = String.Empty;
    using (StreamReader reader = new StreamReader(webResponse.GetResponseStream()))
    {
        json = reader.ReadToEnd();
        reader.Close();
    }
    // We must name the root element
    return DropUnusedTables(Utility.ConvertJsonToDataSet(json, rootName));
}

No longer do you need to deserialize to hand constructed classes to consume the data.

An example of the risk of not versioning REST

A while back I was contacted to solve a quasi-nasty issue. An existing REST implementation had been updated with an extra field being added to the response. This worked fine for 99% of the consumers, but for one consumer it broke. This consumer wrote a package that they sold on to others and their customers were screaming.

The reason it broke was that it was not coded to handled additional fields. Technically, REST is an architectural pattern without standards. Robustness in handling extra fields and data is what most developers would expect -- but that is not a requirement of REST. It is hopeful thinking.

If the REST was well versioned, this issue would not have arisen. It did arise.

While this consumer can patch their code, getting the patch to all of their customers was problematic hence there was a need to do an immediate fix, somehow. Fortunately, their package allows the REST Url to be specified and that allow a simple quick solution. Create a "Relay Website" up on Azure that relays the data from the customers and remove this extra field in the response. All of the data was in JSON which reduced the scope of the issue.

The code was actually trivial (using Newtonsoft.Json.Linq;). As you can see, it is easy to eliminate as many fields as desired by just adding case statements:


   public class CorgiController : ApiController
    {
        [HttpPost]
        public JObject Get()
        {
            var jresponse = RestEcho.EchoPost();
            foreach (JToken item in jresponse.Children())
                WalkChildrenAndRemove(item);
            return jresponse;
        }


        [HttpPost]
        public JObject Cancel()
        {
            return RestEcho.EchoPost();
        }

        private void WalkChildrenAndRemove(JToken jitem)
        {
            if (jitem is JProperty)
            {
                var prop = jitem as JProperty;
                switch (prop.Name)
                {
                    case "Foobar": jitem.Remove(); break;
                    default:
                        foreach (JToken item in jitem.Children())
                            WalkChildrenAndRemove(item);
                        break;
                }
            }
            else if (jitem is JArray)
            {
                var arr = (JArray)jitem;
                foreach (JToken item in arr)
                    WalkChildrenAndRemove(item);
            }
            else
            {
                foreach (JToken item in jitem.Children().ToArray())
                    WalkChildrenAndRemove(item);
            }
        }
    }
}
With the RestEcho class being also trivial,

  public static class RestEcho
    {
        public static JObject EchoPost()
        {
            var url = GetServer() + HttpContext.Current.Request.Path;
            var stream = new StreamReader(HttpContext.Current.Request.InputStream);
            var body = stream.ReadToEnd();
            var value = JObject.Parse(body);
            // Get the login and password sent
            HttpContext httpContext = HttpContext.Current;
            NameValueCollection headerList = httpContext.Request.Headers;
            var testHeader = headerList.Get("Authorization");
            if (testHeader == null || testHeader.Length < 7)
            {
                HttpContext.Current.Response.StatusCode = 401;
                HttpContext.Current.Response.StatusDescription = "Basic Authentication Is Required";
                HttpContext.Current.Response.Write("Failed to Authenticate");
                HttpContext.Current.Response.End();
            }
            // remove "BASIC " from field
            var authorizationField = headerList.Get("Authorization").Substring(6);
            HttpClient client = new HttpClient();
            client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Basic", authorizationField);
            var response = client.PostAsJsonAsync(url, value).Result;
            var jresponse = new JObject();
            try
            {
                jresponse = (JObject)response.Content.ReadAsAsync().Result;
            }
            catch
            {

            }
            return jresponse;
        }
  
This pattern can also be used to reduce a version X to version 0 with likely less coding than alternative approaches -- after all, you just have to add case statements if the structure is the same.

This was tossed up onto Azure with running costs being well less than $10/month. 

Happy customer. Happy customer's customers.

Sunday, November 15, 2015

What is an ideal REST implementation?

REST implementations are often tossed up without much thought. REST is an architectural style and lacks any standards. A frequent issue is no consideration to versioning. In this post I will attempt to develop a checklist of items that would be in an ideal REST implementation.

Content Negotiation

This means being aware of what a requesting agent will accept and the requested precedence. Wikipedia. For example:

Accept: text/xml; q=1.0, text/html; q=1.0,text/JSON; q=0.9, text/csv; q=0.85,  text/*; q=0.8, image/gif; q=0.6, image/jpeg; q=0.6, image/*; q=0.5, */*; q=0.1
Then the rest should return XML as the first choice, html as the second choice (if XML is not supported), then JSON, CSV, CSV.  If the request is for data that could be charted, then the fall thru would be to return an image of chart....  Got it?

So the checklist should be a list of formats supported. My own preference is to support XML as first priority (because it can provide a Schema Definition  or WSDL -- something that is not usually available for JSON). Structured XML can be quickly pushed thru a XmlReader and output JSON, or HTML, or whatever. The conversation routines are code once, use many.

Use the appropriate HTTP Requests

The method (GET, POST, PUT, PATCH, DELETE) for a HTTP Requests have specific meanings. Read the RFCs or just the wikipage on Hypertext Transfer Protocols. Make sure the right method is being used -- too often there is chaos in choices.

Use SSL always -- even internally!

This should be obvious. Some authors have stated that redirects from non-SSL to a SSL should not occur, but that an error should be thrown. I agree with this approach.

Versioning

There are several practices to versioning in the literature with the two main ones being that it is part of the URL; being placed in the header. A company should maintain and enforce a single practice across all of their REST APIs. Personally, I prefer the version being in the header with the absence defaulting to the original version.

Support Filtering by Content and by Fields

Personally I like to see SQL like function implemented, i.e. a like or regular expression for filtering in addition to ordinance (< >) and equality (=).  There is little sanity for returning 100K records for the client to filter down to 2 records -- it is slow and the client UI will be unresponsive.

The same applies to Fields. A special option may be needed "rf=true" for referential integrity, that is the fields specified AND the fields needed to match the records correctly to each other. In some cases, rf is not needed because of how the JSON or XML is structured.

Support Paging and Sorting

A corollary to supporting filtering is support of paging (with a standard corporate default, for example 25 or 100 lines). Sorting allows faster manipulation on most clients.

Provide a "WSDL" defining required and optional fields and their data contents

For SOAP, a WSDL is standard. REST can support XML which means that a XML Schema definition could be created and supplied. It should be provided.  This becomes a clear contract for the REST (regardless of representation) and can save developers trying to consume the REST service many hours of frustrations and reading documentation.

REST should be stateless and Robust

If you want two phrase commits, use SOAP.  By robust I mean that a version number on the data should be included so that updates do not occur on data that someone else has modified.  An alternative to a version number is to include the old value and the new value (thus changing a single field would likely be done as a PATCH and not a PUT) and restricted to the fields that changed.
  • No Cookies should be used, if practical. If used, justification needs to be written and reviewed.

Naming Convention should be consistent across the API and the Corporation

I really do not care if it is Pascal Case, camel Case, all lower case, hungarian notation or snake_case. What I care about is consistency!!

Use Nouns and Not Verbs

REST and SOAP are different because REST is focused on CRUD operation. CRUD operations are by definition on instances aka Nouns. SOAP is focused on methods, i.e. verbs. The HTTP Requests are the operations (GET, PUT, POST, etc) - there is a "grammar", don't become sloppy. Nouns should be plural and not singular ( /mice/ instead of /mouse/)

Pretty Print Results should be standard

Unless you are dealing with a bandwidth constraint situation that is unlikely to change, the difference between compressed and pretty print transmission sizes is very small.  If it is pretty printed, you make the developer trying to consume results significantly more efficient!

Implements RFC 6585 - Rate Limiting by IP Address and /or User

Keeping the system responsive and meeting SLA is essential. Rate limiting by user is often easy to kludge around. Rate limiting by IP and user is much more difficult to do. At Amazon, this was often the norm in the groups that I worked in. RFC 6585 

HTTP Status Codes returned must be Correct

Often I have see a 403 returns for a correctly authenticated user accessing a URL-resource that they are authorized to, but they are denied access to some data. Developers often attempt to project application status codes onto HTTP status codes -- that is incorrect.

Stack Traces should never be returned!

Proper error handling is essential. It is often a security risk to include a stack trace, you are revealing the contents of what should be a black box!

Provide API Calls in the response (HATEOAS)

Review Board provides a rich set of built in APIs in their response. This makes consuming an API much simpler (and allows fast automation often!) An example is shown below. This approach allows restructuring of APIs at a later time by just updating the href (assuming the client correctly consumes the information).

"links": 
            "diffs": 
               "href":"https://rbcommons.com/s/Corgi/api/review-requests/736/diffs/",
               "method":"GET"
            },
            "repository": 
               "href":"https://rbcommons.com/s/Corgi/api/repositories/828/",
               "method":"GET",
               "title":"WelshPembroke"
            },
            "screenshots": 
               "href":"https://rbcommons.com/s/Corgi/api/review-requests/736/screenshots/",
               "method":"GET"
            },
            "self": 
               "href":"https://rbcommons.com/s/Corgi/api/review-requests/736/",
               "method":"GET"
            },
            "update": 
               "href":"https://rbcommons.com/s/Corgi/api/review-requests/736/",
               "method":"PUT"
            },