Sometimes it’s easy to see why your .NET server application is using so much memory, but other times it makes no sense at all. I was at Microsoft earlier this week and someone who’d taken my debugging class stopped me and asked an excellent question. The scenario they had was their service memory would just grow at a steady rate without ever going down. The team found a fix for the memory leak through savvy internet searching but found it frustrating they could not see the answer through SOS and SOSEX. The person gave me the dumps as they wanted to know how they could have found the issue quicker. The research was pretty interesting so I thought I’d share the results.

The first command you always run after loading SOS is !dumpheap –stat so you can get a picture of the overall memory usage. On the dumps the team gave me, the result showed something very similar to the following at the end of the output:

  1. 0000000000386ce0        9         7328      Free
  2. 000007fee1d26ac8      464        18720 System.String
  3. 000007fee1d2afd0       34       183560 System.Object[]
  4. 000007fee1d478a0    11098       355136 System.WeakReference

In the real mini dump those WeakReferences were taking up over 270MB! The weak can kill you in the .NET world.

Whenever I see a WeakReference, you’re looking at some form of cache because it’s a special class that you use to reference an object, but allow that object to be garbage collected. So we know someone’s caching something, but who is doing the caching?

Running !dumpheap –type System.Weak yields the following output:

  1. 0000000002603f88 000007fee1d478a0       32
  2. 0000000002603fa8 000007fee1d478a0       32
  3. 0000000002603fc8 000007fee1d478a0       32
  4. total 0 objects
  5. Statistics:
  6.               MT    Count    TotalSize Class Name
  7. 000007fee1d4f320        1           40 System.Collections.Generic.List`1
  8.                                        [[System.WeakReference, mscorlib]]
  9. 000007fee1d478a0    11098       355136 System.WeakReference
  10. Total 11099 objects

Yep, that List<WeakReference>, is probably the issue. So it’s time to look who created it by doing a !gcroot on it’s address.

  1. 0:004> !gcroot 00000000025ad460
  2. Note: Roots found on stacks may be false positives. Run “!help gcroot” for
  3. more info.
  4. Scan Thread 0 OSTHread b3c
  5. Scan Thread 2 OSTHread 3dc
  6. DOMAIN(0000000000399A60):HANDLE(Pinned):4c17d8:Root:  00000000125a7048(System.Object[])->
  7.   00000000025ad460(System.Collections.Generic.List`1[[System.WeakReference, mscorlib]])
<p>Life just got miserable. .NET stores static fields in an Object Array for each app domain. The static array is pinned in memory so that’s the clue. Sadly, with .NET 4 SOS the only way to see which object has the List&lt;WeakReference&gt; as a field without manually dumping each object in the heap. Back in the .NET 1.1 days there was a way to pretty easily figure out the holding class, but Microsoft changed the implementation so it no longer works.</p>  <p>Fortunately, there is a way to figure out those static fields. All it takes is a little investment in the Professional edition on the amazing <a href="">.NET Memory Profiler</a>. Always purchase the Professional edition because that’s the version with the advanced feature to open mini dumps. Opening large mini dumps can take a long time as .NET Memory Profiler has to build up the reference chains and other data. However, I’m more than happy to let .NET Memory Profiler take it’s time because to do all of that work manually in SOS would consume months and make me give up technology.</p>  <p>After opening the mini dump of my sample program, which took about 60 seconds, I typed List into the Overview tab to narrow down to the List&lt;WeakReference&gt;</p>  <p><a href=""><img style="border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;border-top:0px;border-right:0px;padding-top:0px" title="image" border="0" alt="image" src="" width="668" height="202"></a></p>  <p>Double clicking on the on List&lt;WeakRefefence&gt; takes you to the Type details tab and shows you exactly who owns that pesky static.</p>  <p><a href=""><img style="border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;border-top:0px;border-right:0px;padding-top:0px" title="image" border="0" alt="image" src="" width="678" height="194"></a></p>    <p>It’s a <a href="">TraceSource</a> so we have the culprit! In fact, looking at the type instance graph shows the whole reference chain.</p>  <p><a href=""><img style="border-bottom:0px;border-left:0px;padding-left:0px;padding-right:0px;border-top:0px;border-right:0px;padding-top:0px" title="image" border="0" alt="image" src="" width="211" height="285"></a></p>  <p>Looking at the TraceSource constructor in Reflector shows exactly where the WeakReference is created.</p>
  1. public TraceSource(string name, SourceLevels defaultLevel)
  2. {
  3.     if (name == null)
  4.     {
  5.         throw new ArgumentNullException(“name”);
  6.     }
  7.     if (name.Length == 0)
  8.     {
  9.         throw new ArgumentException(“name”);
  10.     }
  11.     this.sourceName = name;
  12.     this.switchLevel = defaultLevel;
  13.     lock (tracesources)
  14.     {
  15.         _pruneCachedTraceSources();
  16.         tracesources.Add(new WeakReference(this));
  17.     }
  18. }

</div> </div> <p>The _pruneCachedTraceSources method is interesting and shows exactly why those WeakReferences are all stuck in Gen 2.</p>

  1. private static void _pruneCachedTraceSources()
  2. {
  3.     lock (tracesources)
  4.     {
  5.         if (s_LastCollectionCount != GC.CollectionCount(2))
  6.         {
  7.             List<WeakReference> collection = new List<WeakReference>(tracesources.Count);
  8.             for (int i = 0; i < tracesources.Count; i++)
  9.             {
  10.                 if (((TraceSource)tracesources[i].Target) != null)
  11.                 {
  12.                     collection.Add(tracesources[i]);
  13.                 }
  14.             }
  15.             if (collection.Count < tracesources.Count)
  16.             {
  17.                 tracesources.Clear();
  18.                 tracesources.AddRange(collection);
  19.                 tracesources.TrimExcess();
  20.             }
  21.             s_LastCollectionCount = GC.CollectionCount(2);
  22.         }
  23.     }
  24. }

</div> </div> <p>Basically, the cache is only cleared whenever new TraceSource is added or a call to Trace.Refresh is made. In the Microsoft code they were mistakenly allocating a new TraceSource and <a href=””>TraceSwitch</a>, which also has the WeakReference list, every time a connection came in. How this happened is that they converted what was a singleton static object into something they allocated on each call. That meant lots of TraceSource and TraceSwitch allocations but with this magic underneath causing big memory usage. You’ve probably guessed by now that you should always make your TraceSource and TraceSwitch fields statics so you hold only the one instance and avoid this potential memory issue.</p> <p>Note that I’m not saying the implementation of TraceSource or TraceSwitch is wrong as it gives you the ability to refresh all your tracing in one call instead of making you manage all the individual instances yourself. That’s a nice feature of the tracing system in .NET. The implementation just doesn’t expect that you’ll be allocating hundreds of thousands.</p> <p>While I would love SOS to have a better way to track down these static field problems, at least we do have a solution with .NET Memory Profiler.