Saturday, April 7, 2012

"least astonishment". The case of HashSet in dot.net


A reason to wrap/replace/adapt primitive and objects provided by the platform.
This case is about substituting objects provided by the platform,
Substituting them can help in order to avoid suprising behavior. ("least astonishment principle")
The platform (the dot.net c.l.i.) provides a Set of strings declared as HashSet
This test compare two “identical” sets, but fails:


[Test] 
public void EqualsSets()

      HashSet<string> firstSet = new HashSet<string>(){"aaa","bbb"};
      HashSet<string> secondSet = new HashSet<string>() { "bbb", "aaa" };  
      Assert.AreEqual(new WordSet(firstSet), new WordSet(secondSet));         
      Assert.AreEqual(firstSet,secondSet); // fails
 }


However, initializing the two sets in the same order, it passes:
[Test]
public void EqualsSets()
{
      HashSet<string> firstSet = new HashSet<string>(){"aaa","bbb"}; 
      HashSet<string> secondSet = new HashSet<string>() { "aaa", "bbb" };  


      Assert.AreEqual(new WordSet(firstSet), new WordSet(secondSet));           
      Assert.AreEqual(firstSet,secondSet); // passes 
}


The behavior of the Equals between to HashSet is affected by the order used to initialize the two objects.
For similar reason an unexpected behavior happens using expectations provided by mocking framework (RhinoMock):


// prepare  
      HashSet words = new HashSet() {"oof","foo","xxx"};  
      var fileWriter = mockRepository.StrictMock<IStreamWriter>();  
      Expect.Call(delegate { fileWriter.WriteContent(new HashSet<string>() { "foo","oof" }); });  


      Expect.Call(delegate { fileWriter.WriteContent(new HashSet<string>() { "xxx" }); });
      mockRepository.ReplyAll();


  
       // act
      MainAnagrammer mainAnagrammer = new MainAnagrammer(words,fileWriter); 
      mainAnagrammer.WriteGroupedByAnagrams();  


//assert
       fileWriter.VerifyAllExpectations();



(for the problem see the kata 6 of anagrams)
The MainAnagrammer class writes the words grouped by anagrams in an outputstream.
The program does the job correctly, printing “foo” and “oof” as a single group of anagrams but the test fails for the same reason of ineffectiveness of the Equals implementation for the HashSet.
A “workaround” is possible using a wrapper class that still represent a set of string, and redefines appropriately the Equals (and, by consequence, the HashCode() as well):
public class WordSet
    {  
        private readonly HashSet<string> _wordList; 
        public HashSet<string> WordList
        {  
            get { return _wordList; }  
        }


        public WordSet(HashSet<string> wordList)
        {
            this._wordList = wordList;
        }


        public override bool Equals(object obj)
        {
            if (obj==null)
                return false;
            if (!(obj is WordSet))
                return false;  
            var wObj = (WordSet) obj;
            if (this._wordList.Count != wObj._wordList.Count)  
                return false;
            return _wordList.All(str => wObj._wordList.Contains(str));  
        }
  
        public override int GetHashCode()
        {
            return this._wordList.Count;  
        }


    }


  
So basically this case was for me another warning about the risk of sticking to primitive types or objects provided by the platform (primitive obsession).

No comments: