Fast Named Formats In C#

The .NET Framework supports string formats since the first day. String format patterns look like “My name is {0} and I am {1} years old”. Optionally, the placeholders inside the curly brackets, called format items, can include other format strings – e.g. to format numbers or dates in a special way. Great stuff. But what if I had an object whose properties I want to directly reference in the format string like “My name is {Name} and I am {Age} years old”?

As you can guess, I am not the first one to write about this issue. For example, check out Phil Haacks blog post about this. That post deals a lot with the correct parsing of the format string and the correct interpretation of escaped and unescaped curly brackets. The emphasis in that post lies on correctness, not performance.

We’ll, I like it fast and nasty 🙂 I needed a formatter that performs very fast, but does not need to support every twisted input string you can think of. So I settled with the single regular expression ([^{]|^){(\w+)}([^}]|$)|([^{]|^){(\w+)\:(.+)}([^}]|$). This is probably not the most elaborate way to parse a formatting input string (in fact, regular expressions are usually not powerful enough to handle brackets and bracket nesting), but it worked for all my scenarios (single-line, not too long strings).

Ok, let’s dig into that. The key to performance in this case is pre-compilation. The first call to the formatter will parse the input strings, convert it into the regular string format form, construct Linq Expressions to call the regular String.Format method with the proper arguments, compile these expressions and cache them. Every subsequent call will just execute the precompiled expression. Here’s an example:

Input format string: "My name is {Name} and I am {Age} years old"
Input object: var obj = new { Name = "Santa", Age = 1700 }
Generated code: string.Format("My name is {0} and I am {1} years old", obj.Name, obj.Age)

Without further ado, here’s the code that does this:

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Text;
using System.Text.RegularExpressions;

namespace MunirHusseini
{
    public static class NamedFormat
    {
        private static readonly ConcurrentDictionary<string, object> PrecompiledExpressions = new ConcurrentDictionary<string, object>(StringComparer.OrdinalIgnoreCase);

        private static readonly Regex RegexFormatArgs = new Regex(@"([^{]|^){(\w+)}([^}]|$)|([^{]|^){(\w+)\:(.+)}([^}]|$)", RegexOptions.Compiled);

        public static string Format<T>(string pattern, T item)
        {
            // If we already have a compiled expression, just execute it.
            object o;
            if (PrecompiledExpressions.TryGetValue(pattern, out o))
            {
                return ((Func<T, string>)o)(item);
            }

            // Convert named format into regular format and return 
            // a list of the named arguments in order of appearance.
            string replacedPattern;
            var arguments = ParsePattern(pattern, out replacedPattern);
            // We'll be using the String.Format method to actually perform the formating.
            var formatMethod = typeof(string).GetMethod("Format", new[] { typeof(string), typeof(object[]) });

            // Now, construct code with Linq Expressions...

            // The constant that contains the format string:
            var patternExpression = Expression.Constant(replacedPattern, typeof(string));
            // The input object:
            var parameterExpression = Expression.Parameter(typeof(T));
            // An array containing a call to a property getter for each named argument :
            var argumentArrayElements = arguments.Select(argument => Expression.Convert(Expression.PropertyOrField(parameterExpression, argument), typeof(object)));
            var argumentArrayExpressions = Expression.NewArrayInit(typeof(object), argumentArrayElements);
            // The actual call to String.Format:
            var formatCallExpression = Expression.Call(formatMethod, patternExpression, argumentArrayExpressions);
            // The lambda expression we will be compiling:
            var lambdaExpression = Expression.Lambda<Func<T, string>>(formatCallExpression, parameterExpression);
            
            // The lambda expression will look something like this
            // input => string.Format("my format string", new[]{ input.Arg0, input.Arg1, ... });

            // Now we can compile the lambda expression
            var func = lambdaExpression.Compile();

            // Cache the pre-compiled expression 
            PrecompiledExpressions.TryAdd(pattern, func);

            // Execute the compiled expression
            return func(item);
        }

        private static IEnumerable<string> ParsePattern(string pattern, out string replacedPattern)
        {
            // Just replace each named format items with regular format items
            // and put all named format items in a list. Then return the 
            // new format string and the list of the named items.

            var sb = new StringBuilder();
            var lastIndex = 0;
            var arguments = new List<string>();
            var lowerarguments = new List<string>();

            foreach (var @group in from Match m in RegexFormatArgs.Matches(pattern)
                                   select m.Groups[m.Groups[6].Success ? 5 : 2])
            {
                var key = @group.Value;
                var lkey = key.ToLowerInvariant();
                var index = lowerarguments.IndexOf(lkey);
                if (index < 0)
                {
                    index = lowerarguments.Count;
                    lowerarguments.Add(lkey);
                    arguments.Add(key);
                }

                sb.Append(pattern.Substring(lastIndex, @group.Index - lastIndex));
                sb.Append(index);

                lastIndex = @group.Index + @group.Length;
            }

            sb.Append(pattern.Substring(lastIndex));
            replacedPattern = sb.ToString();
            return arguments;
        }
    }
}

Fast, you say?

So how fast is this? This is as fast as a normal string.format method! At least after the first call it is. I created a simple test to measure the time needed:

var input = new
{
    Name = "Santa",
    Age = 1700
};

var sw = Stopwatch.StartNew();

var text = NamedFormat.Format("{Name} is {Age:0.0} years old.", input);

Console.WriteLine(sw.ElapsedMilliseconds);
sw = Stopwatch.StartNew();

for (var i = 0; i < 1000000; i++)
{
    text = NamedFormat.Format("{Name} is {Age:0.0} years old.", input);
}

Console.WriteLine(sw.ElapsedMilliseconds);
Console.WriteLine(text);

When I run this on my Surface Pro (Intel i5-3317U), I get a measurements between 18ms and 25ms for the first call and 800ms to 1200ms for the next 1 million calls together. This equals 0.0008ms to 0.0012ms per call. Now that’s cooking with gas 🙂

7 thoughts on “Fast Named Formats In C#

  1. Thanks, I am working with API and they provide queries in json strings with named fields like {id} etc.. Your extension class will make it nice and easy to plug data right in without any messing about.

    Like

  2. Hey man, it works great just one small issue when there is a field but no mapped property to match it. This line of code throws an exception:

    var argumentArrayElements = arguments.Select(argument => Expression.Convert(Expression.PropertyOrField(parameterExpression, argument), typeof(object)));

    Can you suggest way that instead of exception it replaces them with an empty string instead? This would make this function work as a nice templating system.

    Like

    • Good point, and good job 🙂 I guess you could use the name or hash code of the object’s type as a prefix to the cache key. For example
      var cacheKey = item.GetType().GetHashCode() + pattern;
      if (PrecompiledExpressions.TryGetValue(cacheKey , out o))

      Like

  3. Thanks! The cache is working again 🙂

    I must say it took me a while before I really understood how this code worked. I ran into an issue as the function would not work on dynamic objects due to the difference in property access. I had to do a fair bit of exploring MSDN and SO as this was my first time using Expression class and I had only ever used a Func maybe once before. Took me a fair bit of trial and error but I think I addressed the issue.

    ExpandoObjects implement IDictionary. I thought about making an overload but thought that would be confusing so I created conditional statement that casts the Expando to IDictionary so the property can be accessed through Item property in the same fashion.

    I am not sure what these casts are doing to the performance though as from what I gather it should be avoided when possible.

    I haven’t thoroughly tested it but it handled my templates I threw at it with various different object types OK. I updated the gist, would love to get your opinion if you have the time.

    Like

    • Guerrilla Coder, I think you did a great job with the code.

      I think that the conversion to IDictionary does not add any significant performance impacts. Dispite the name “Convert”, I think that the Convert expression is treated like a conversion operator in C#: it tells the compiler what type to use but does not perform any conversion during runtime. I didn’t find any proof for this on the internet, though. The dictionary lookup itself will probably add some measurable execution time, but that will still be very fast. In fact, I measured the runtime of the code for an ExpandoObject and a regular object with 1 million iterations each. Both measurements yielded almost exact timings (approx. 620ms for 1 million iterations on my machine). Link to the test: https://gist.github.com/mhusseini/bfd816ca566ee0bb88e8.

      Regarding whether or not to add a method overload: IMHO, there are some point that speak for adding an an overload instead of using if-blocks in the code:

      • Using if-blocks, the input object is treated differently whether it is a dictionary or not. This means that the caller of the method must have knowledge of the method’s content to determine the correct objects to specify. Ideally, a method should be a black box to the caller, specified solely via it’s declaration.
      • Using if-blocks, the different execution paths might be interweaved, which may result in hard-to-maintain code or in execution of unused code during runtime.

      But in the end, your API must fit your needs and if that is the case, then stick to whatever you preferr.

      There’s just on thing: when using dictionaries, you should change the calculation of the cache key to reflect the keys that are contained in the dictionary.

      var prefix = item is IDictionary
      ? string.Join("_", ((IDictionary)item).Keys).GetHashCode()
      : item.GetType().GetHashCode();

      var cacheKey = prefix + pattern;

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s