Monday, January 4, 2016

A simple approach to getting all of the data out of Atlassian Jira

One of my current projects is getting data out of Jira into a DataMart to allow fast (and easy) analysis. A library such as TechTalk.JiraRestClient provides a basic foundation but there is a nasty gotcha. Jira can be heavily customized, often with different projects having dozen of different and unique custom fields. So how can you do one size fits all?

You could go down path of modifying the above code to enumerate all of the custom fields (and then have continuous work keeping them in sync) or try something like what I do below: exploiting that JSON and XML are interchangeable and XML in a SQL Server database can actually be real sweet to use.

Modifying JiraRestClient

The first step requires downloading the code from GitHub and modifying it.
In JiraClient.cs method  EnumerateIssuesByQueryInternal add the following code.
               var issues = data.issues ?? Enumerable.Empty<Issue>();
                var xml=JsonConvert.DeserializeXmlNode( response.Content,"json");
                // Insert all of the XML-JSON into 
                foreach (var issue in issues)
                {
                   var testNode=xml.SelectSingleNode(string.Format("//issues/key[text()='{0}']/..", issue.key));
                    if(testNode !=null)
                    {
                        issue.xml = testNode.OuterXml;
                    };

                }

You will also need to modify the issue class to include a string, "xml".  The result is an issue class containing all of the information from the REST response. 

Moving Issues into a Data Table

Once you have the issue by issue REST JSON response converted to XML, we need to move it to our storage. My destination is SQL server and I will exploit SQL Table variables to make the process simple and use set operations. In short, I move the enumeration of issues into a C# data table so I may pass the data to SQL Server.

                   var upload = new DataTable();
                   // defining columns omitted
                    var data = client.GetIssues(branch);
                    foreach (var issue in data)
                        try
                        {
                            var newRow = upload.NewRow();
                            newRow[Key] = issue.key;
                            newRow[Self] = issue.self;
                            // Other columns extracted
                            newRow[Xml] = issue.xml;
                            upload.Rows.Add(newRow);
                        }
                        catch (Exception exc)
                        {
                            Console.WriteLine(exc);
                        }
                    

The upload code is also clean and simple:

         using (var cmd = new SqlCommand { CommandType = CommandType.StoredProcedure, CommandText ="Jira.Upload1"})
                                using (cmd.Connection = MyDataConnection)
                                {
                                    cmd.Parameters.AddWithValue("@Data", upload);
                                     cmd.ExecuteNonQuery();
                                }

SQL Code

For many C# developers, SQL is an unknown country, so I will go into some detail. First, we need to define the table in SQL, just match the DataTable in C# above (same column names in same sequence is best)

    CREATE TYPE [Jira].[JiraUpload1Type] AS TABLE(
[Key] [varchar](max) NULL,
[Assignee] [varchar](max) NULL,
[Description] [varchar](max) NULL,
[Reporter] [varchar](max) NULL,
[Status] [varchar](max) NULL,
[Summary] [varchar](max) NULL,
[OriginalEstimate] [varchar](max) NULL,
[Labels] [varchar](max) NULL,
[Self] [varchar](max) NULL,
[XmlData] [Xml] Null
        )

Note: that I use (max) always -- which is pretty much how the C# datatable sees each column. Any data conversion to decimals will be done by SQL itself.

Second, we create the stored procedure. We want to update existing records and insert missing records. The code is simple and clean

    CREATE PROC  [Jira].[Upload1] @Data [Jira].[JiraUpload1Type] READONLY
    AS 
    Update Jira.Issue SET
      [Assignee] = D.Assignee
     ,[Description] = D.Description
     ,[Reporter] = D.Reporter
     ,[Status] = D.Status
     ,[Summary] = D.Summary
     ,[OriginalEstimate] = D.OriginalEstimate
     ,[Labels] = D.Labels
     ,[XmlData] = D.XmlData
    From @Data D
    JOIN Jira.Issue S ON D.[Key]=S.[Key]

    INSERT INTO [Jira].[Issue]
           ([Key]
           ,[Assignee]
           ,[Description]
           ,[Reporter]
           ,[Status]
           ,[Summary]
           ,[OriginalEstimate]
           ,[Labels]
  ,[XmlData])
    SELECT D.[Key]
           ,D.[Assignee]
           ,D.[Description]
           ,D.[Reporter]
           ,D.[Status]
           ,D.[Summary]
           ,D.[OriginalEstimate]
           ,D.[Labels]
  ,D.[XmlData]
    From @Data D
    LEFT JOIN Jira.Issue S ON D.[Key]=S.[Key]
    WHERE S.[Key] Is Null

All of the Json is now in XML and can be search by Xpath

Upon executing the above, we see our table is populated as shown below. The far right column is XML.This is the SQL Xml data type and contains the REST JSON converted to XML for each issue.

The next step is often to add computed columns using the SQL XML and a xpath. An example of a generic solution is below.

So what is the advantage?

No matter how many additional fields are added to Jira, you have 100% data capture here. There is no need to touch the Extract Transform Load (ETL) job. You can create (and index) the data in the XML in SQL server, or just hand back the XML to whatever is calling it.  While SQL Server 2016 supports JSON, XML is superior because of the ability to do XPaths into it as well as indices.

In many implementations of JIRA, the number of fields can get unreal.. as shown below

With the same data table, you could create multiple views that contain computed columns showing precisely the data that you are interested in.

Example of Computed column definitions

[ProductionReleaseDate]  AS ([dbo].[GetCustomField]('customfield_10705',[XmlData])),
[EpicName]  AS ([dbo].[GetCustomField]('customfield_10009',[XmlData])),
[Sprint]  AS ([dbo].[GetCustomField]('customfield_10007',[XmlData])),

With this Sql Function doing all of the work:

    CREATE FUNCTION [dbo].[GetCustomField]
    (
    @Name varchar(32),
    @Data Xml
    )
    RETURNS varchar(max)
    AS
    BEGIN
    DECLARE @ResultVar varchar(max)
    SELECT  @ResultVar = c.value('customfieldvalues[1]','varchar(max)') FROM     @Data.nodes('//customfield[@id]') as t(c)
    WHERE c.value('@id','varchar(50)')=@Name
    RETURN @ResultVar
    END


The net result is clean flexible code feeding into a database with very quick ability to extend. 

You want to expose a new field? it's literally a one liner to add it as a column to the SQL Server table or view. Consider creating custom views on top of the table as a clean organized solution.

2 comments: