当前位置:网站首页>Detailed explanation of dictionary source code in C #

Detailed explanation of dictionary source code in C #

2022-06-21 13:43:00 Guo Mahua

Dictionary

The last article introduced HashTable Implementation principle of ,Dictionary Very different .

  1. Dictionary Use zipper method to solve hash conflict ,HashTable Use Double Hash.
  2. Dictionary It's a generic type , For value types and reference types key,Hash The comparator is different .
  3. Dictionary Again resize New random may be used Hash The comparator .

Constructors  

Dictionary Two very important arrays are maintained internally , They are the basis for the realization of the zipper method .

private int[]? _buckets;
private Entry[]? _entries;

         private struct Entry
        {
            public uint hashCode;         
            public int next;
            public TKey key;    
            public TValue value; 
        }

initialization  Dictionary You can specify one capacity, But the length of the two arrays is the smallest prime number greater than this value . This is to reduce the probability of hash collision , And HashTable equally , The default is 3.

Zipper method Operating mechanism

The common zipper pictures on the Internet are as follows , In fact, this diagram will bring misunderstanding to friends who have not understood the principle .

First , The following figure shows all the elements on the linked list , It's actually all stored in Entry[]? _entries, In this array .

Observe Entry Structure The implementation of the , It can be found that it uses next Fields to represent hash The next node address subscript on the linked list .

jdk1.8 The previous internal structure

 int[]? _buckets Bucket array , Express Hash The beginning of the list . according to Key It's worth it hashcode To determine which bucket to allocate .

It should be noted that , stay Dictionary In the implementation ,_buckets The values saved in the array are from 1 At the beginning , If the chain header node is located in _entries Subscript to be 0, Then the value saved in this bucket is index+1 = 1.

in addition , When a new node is added , Is added to Hash First place of chain , And modify the subscript address pointed to by the bucket .

  in addition ,Dictionary In the implementation _freeCount、_freeList、StartOfFreeList The three fields are also very interesting .

At first I thought it meant the remaining capacity of the dictionary , It turned out _freeCount Only in Remove Add... After operation , That's what I understand , Originally, it represents the released space capacity , and _freeList Indicates the last free space address .

actually ,  Dictionary In a very clever way , Maintains a free space address chain , and _freeList Is the subscript to the address of the last release .

When adding a new element , Will preferentially select from the released addresses , according to _freeList After the address represented is added ,_freeList   Will automatically return to the last free address . Note that “ the previous ”, Give priority to the recently released address .

See the following source code for specific implementation .

Remove Realization principle

actually , Through to Hashtable and Dictionary Two object source code learning , I found that only the combination Remove and Add Methods to see , To understand the author's design ideas .

And HashTable equally , Let's start with the relatively simple Remove Method , How is it realized :

public bool Remove(TKey key)
        {            
            if (_buckets != null)
            {
                Debug.Assert(_entries != null, "entries should be non-null");
                uint collisionCount = 0;
                uint hashCode = (uint)(_comparer?.GetHashCode(key) ?? key.GetHashCode());

                //  GetBucket(uint hashCode) => buckets[hashCode % (uint)buckets.Length];
                ref int bucket = ref GetBucket(hashCode);
                Entry[]? entries = _entries;
                 
                int last = -1;

                // i Used to represent the element in entries  Position in , guess buckets The minimum valid values stored in the array are 1.
                int i = bucket - 1; 
                
                // i<0 Subscript indicating the current bucket storage bucket by 0, Describe what to remove key non-existent ,return false.
                while (i >= 0)
                {
                    ref Entry entry = ref entries[i];
                    //  If the element is found 
                    if (entry.hashCode == hashCode && (_comparer?.Equals(entry.key, key) ?? EqualityComparer<TKey>.Default.Equals(entry.key, key)))
                    {

                        if (last < 0)
                        {
                            // last<0 Indicates that the element to be removed is the first element in the hash chain ,
                            //  Therefore, you need to update the index value stored in the bucket to   The position of the second element  + 1;
                            //  This should also prove what we mentioned above ,buckets The minimum valid values stored in the array are 1
                            bucket = entry.next + 1; // Value in buckets is 1-based
                        }
                        else
                        {
                            // last>=0 Indicates that the element to be removed is an element at the middle or end of the hash chain ,
                            //  Therefore, there is no need to update the bucket address value , Just add the last element's next The node is set to the... Of the element to be removed next The node can be 
                            entries[last].next = entry.next;
                        }

                        // Why should I set... For elements that have been removed next?? This step is very important , See below 
                        entry.next = StartOfFreeList - _freeList;

                        if (RuntimeHelpers.IsReferenceOrContainsReferences<TKey>())
                        {
                            entry.key = default!;
                        }

                        if (RuntimeHelpers.IsReferenceOrContainsReferences<TValue>())
                        {
                            entry.value = default!;
                        }
                        // The free pointer points to i, Indicates that the address is free and available 
                        _freeList = i;
                        _freeCount++;
                        return true;
                    }
                    
                    //  I didn't go into the one above if, That there is hash Chain but no corresponding key,
                    //  The pointer moves down one bit , Keep looking for , When the end is found ,i Will be equal to the -1
                    last = i;
                    i = entry.next;

                    collisionCount++;
                    if (collisionCount > (uint)entries.Length)
                    {                      ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported();
                    }
                }
            }
            return false;
        }

Add Realization principle

According to the above , Combined with the source code discovery :

Assign the new element to _freeList Address to , then _freeList It points to StartOfFreeList - entries[_freeList].next Why? ?

Here we need to combine remove In terms of operation , There is a line of code that I can't understand :

  entry.next = StartOfFreeList - _freeList;

Of the space key-value Although it has been released , But it's next The pointer remains , And points to StartOfFreeList - _freeList Why? ?

Notice the two expressions I've bold !!

If now _freeList = 1, And then remove The address dropped is 2 The elements of . So address 2 Situated next The pointer points to StartOfFreeList( Constant for the -3 The constant ) - _freeList, That is to say -4. Next we'll _freeList Adjust to the most recently released address , namely 2.

then Add Operation to add a new element , Of course, it is allocated to 2 Address . So at this time

entries[_freeList].next How many ?-4!

StartOfFreeList - entries[_freeList].next, How many ? 1!

after _freeList It points to StartOfFreeList - entries[_freeList].next,  It should be a few ?1!

Equal to say ,Dictionary The released free addresses are strung into a chain ! When allocating new elements , A free address will be taken from the end of the chain , then _freeList Automatically move to the last free address , Become a new chain tail !!!

Think of it here. , I finally got it , Why is the idle pointer called _freeList, Because it is really a chain !

        private bool TryInsert(TKey key, TValue value, InsertionBehavior behavior)
        {
            ...
       
            uint hashCode = (uint)((comparer == null) ? key.GetHashCode() : comparer.GetHashCode(key));

            // Collision frequency 
            uint collisionCount = 0;

            // according to hashCode The barrel in which the calculation is made , Note that ref
            //  GetBucket(uint hashCode) => buckets[hashCode % (uint)buckets.Length];
            ref int bucket = ref GetBucket(hashCode);
            // here  i Why bucket - 1 Well ?
            int i = bucket - 1;
            
            //  here if...else Medium  while  The main purpose is to detect whether there is a ring in the hash chain ,
            //  If so update, Will update the element directly and return 
            //  If you add the same key The elements of , It throws an exception 
            if (comparer == null)
            {
                // Is not specified comparer  Words ,Dictionary Will use the default comparator 
                // There are value types key  and   Reference type Key Two cases , The code is highly similar to the following , Look straight down ↓
                if (typeof(TKey).IsValueType)
                { ... }
                else
                { ... }
            }
            else
            {
                while (true)
                {
                    //  Be careful : Here is when adding elements , The only way out of the loop , Very skillfully will i Into the uint.
                    //  The dictionary is added for the first time after initialization ,bucket  It must be 0,i This is the case -1.
                    //  Because the sign bit is the first bit of binary , A negative number is 1, So turn to uint after , It would be a very large integer .
                    if ((uint)i >= (uint)entries.Length)
                    {
                        break;
                    }

                    //Key Equal to the current element , If yes, an exception will be thrown , If yes, change the value and return
                    if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key))
                    {
                        if (behavior == InsertionBehavior.OverwriteExisting)
                        {
                            entries[i].value = value;
                            return true;
                        }

                        if (behavior == InsertionBehavior.ThrowOnExisting)
                        {
                            ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException(key);
                        }

                        return false;
                    }

                    i = entries[i].next;

                    collisionCount++;
                    if (collisionCount > (uint)entries.Length)
                    {                        ThrowHelper.ThrowInvalidOperationException_ConcurrentOperationsNotSupported();
                    }
                }
            }
            
            //  Be careful : Although the dictionary capacity after initialization is 3, But this time _freeCount The value is 0, and _freeList Then for -1
            //  After my analysis :_freeCount Express the classics remove Number of spaces freed ,
            // _freeList It means remove Free space address .
            //  combination remove You'll find that , The design here is very clever .
            int index;
            if (_freeCount > 0)
            {
                index = _freeList;
                Debug.Assert((StartOfFreeList - entries[_freeList].next) >= -1, "shouldn't overflow because `next` cannot underflow");
                _freeList = StartOfFreeList - entries[_freeList].next;
                _freeCount--;
            }
            else
            {
                int count = _count;
                if (count == entries.Length)
                {
                    Resize();
                    bucket = ref GetBucket(hashCode);
                }
                index = count;
                _count = count + 1;
                entries = _entries;
            }
            
            // bucket  Is used to record the starting address of the hash chain ,
            //  from entry.next = bucket - 1; The operation shows that the hash chain will add a new element to the header ,
            //  So the last of the hash chain next The value pointed to by the pointer must be -1
            //  and bucket = index + 1;  explain bucket  The minimum valid value in is 1
            ref Entry entry = ref entries![index];
            entry.hashCode = hashCode;
            entry.next = bucket - 1; 
            entry.key = key;
            entry.value = value;
            bucket = index + 1; 
            _version++;

            if (!typeof(TKey).IsValueType && collisionCount > HashHelpers.HashCollisionThreshold && comparer is NonRandomizedStringEqualityComparer)
            {
                Resize(entries.Length, true);
            }

            return true;
        }

Capacity expansion

Dictionary Capacity expansion and HashTable The expansion of is similar to , Are based on Current capacity ×2 Take the minimum prime number greater than this value .

however ,Dictionary Whether to force refresh during capacity expansion HashCode The option to ( For reference types only Key Work ), If forceNewHashCodes by True, A new one will be randomly generated IEqualityComparer, Used to generate new HashCode.

Indexer lookup

Indexer lookup and Remove Find something similar to , No more details here .

原网站

版权声明
本文为[Guo Mahua]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202221433591859.html