Field access versus local variable access performance

Recently, I was at a customer site and they were comparing the performance of a function that was written in unmanaged C with its C# equivalent. As they expected, the C code was performing much faster than the C# code. However, I didn’t expect this. Once managed code is JIT compiled, the code is very similar to that of unmanaged code. The code was being called in a loop in both versios numerous times so the additional time spent by the JIT compiler should be noise and I expect the performance of the C# code to be near identical and even better than the unmanaged C version because the JIT compiler could emit optimizations for the speciifc CPU architecture of the host machine.

I was very interested in this problem and I closely examined both functions. Not just the source code, but I also very closely examined the machine code generated by the unmanaged C compiler as well as the machine code generated by the .NET Framework’s JIT compiler. Not surprising (to me), the machine code was pratically identical. I did see a place where the C compiler emitted an INCrement instruction where th JIT compiler emitted an ADD instruction to add 1. I assume that this is because the JIT compiler knew the host CPU and determined that an ADD instruction would be faster than an INCrement instruction. This would actually make the managed code version faster.

I also noticed a place where the unmanaged version was updating a register directly where the managed version was updating a indirect memory location. This was because the unmanaged source code was updating a local variable while the C# version was turned into an object-oriented equivalent and as updating a field in the class object. This difference alse seemed minor to me and could not possibly accounting for the performance discrepency we were seeing. However, we continued to examine the code and in an effort to get the emitted machine code to be identical between the 2 as possible, we got rid of the field in the object and turned it into a local variable instead. WOW! What a difference this made in performance!

Apparently, the JIT compiler is able to enregister local variables but it cannot enregister fields. This makes sense since a field can actually be manipulated by multiple threads simultaneously or via other means behind a method’s back and therefore, the JIT compiler takes the safe route of not enregisterring the local and always saves and fetches its value from memory. On the other hand, a local variable cannot be modified behind a method’s back and therefor the JIT compiler can put it in a register.

I always knew that memory access is slower than CPU register access but I guess I nenver really knew how big a difference it could be. On the machine we were using to test this, the differenece was about 8 times. That is, the method with the field ran 8 times slower than the equivalent method with the local variable! On my notebook computer, the difference was much less: the local variable version ran in ~25 seconds while the field version ran in ~31 seconds. Still a pretty big difference.

So, the moral of the story, is that you should avoid field access as much as possible in performance-sensitive code. It might even make sense to copy a field into a loca, use the local in the method and then copy the result back from the local into the field just before the method exits. Below is some simple code (that requires Whidbey because I’m using the Stopwatch class) that you can compile and test on your own machine to see the perf difference:

using System;
using System.Diagnostics;

class App {
static void Main() {
const Int64 iter = 5000000000;

Stopwatch sw = Stopwatch.StartNew();
TestLocalAccess(iter);
Console.WriteLine(“time taken:{0} ticks”, sw.Elapsed);

sw = Stopwatch.StartNew();
TestFieldAccess(iter);
Console.WriteLine(“time taken:{0} ticks”, sw.Elapsed);
}

private static Int64 j;

public static void TestLocalAccess(Int64 numIncrement) {
Int64 j=0;
for(Int64 i=0; i<numIncrement; i++) j++;
}

public static void TestFieldAccess(Int64 numIncrement) {

for(Int64 i=0; i<numIncrement; i++) j++;
}
}