Out of the Box

Agile, Scrum, Tdd, Software Craftsmanship, System Thinking

Friday, March 21, 2014

Venn diagrams and Categorical Propositions in F#

Categorical propositions express specific relation between elements of different sets, in term of quantifier which can be Universal (like All, or None), or Particular (Some); and positive or negative (like "is" or "is not").

Examples: All Mammals are Animals, No Mineral is Alive, Some people are Smart, Some people are Not Smart.

The building blocks are the four basic form of categorical propositions, and are classified as Affirmative or Negative, and Universal or Particular, as follows:



They have representation based on Venn diagrams:



Venn diagrams in this case consist on two circles indicating two sets which are partially overlapped, so they form three regions. The left region indicates what is in S that is not in P, in the middle region what is in common between them, and on the right what is in P that is not in S.
To indicate that a region is empty we have to black-color it, and to indicate that in a region there must be something in it we have to put an asterisk in it. It is not possible to black-color and put an asterisk to a region at the same time because it will lead to a contradiction (like "All S is P" and "Some S is not P" which is impossible).

Symmetric basic form diagrams (E and O) represent proposition that are valid also on the reverse: so Some S is P means also Some P is S, and No S is P means also No S is P.

As long as we avoid placing asterisk and black-color in the same area (which leads to a contradiction) we can have more complex diagrams than the basic forms, which represent more proposition for example as follows:

This diagram means: Some S is not P, No S is P, No P is S, and Some P is not S.

I want to write some F# code solving the problem of translating any Venn diagram like the previous one, involving two sets, in the list of all the categorical propositions that follow from it.

I'll use records to represent either the diagram and either a categorical proposition.
To explain what I will get, this is a possible automated NUnit test related to the "A" proposition:

[<Test>]
    let ``A Venn diagram to proposition``()=
        let allSIsP = {S=BlackFilled; SP=Empty;P=Empty}
        let expectedProposition = [{quantifier1=All;category1=S;appartenence=Is;category2=P}]
        let actual = VennToPropositions allSIsP
        Assert.AreEqual(
expectedProposition,actual)




I need to define:
a type for any diagram involving two categories, which will be a record:

type twoSetsVennDiagram = {S: SectionStatus; SP: SectionStatus; P: SectionStatus}

A "SectionStatus" to set each of its region with the appropriate status:

type SectionStatus = Empty | BlackFilled | Starred

A "Propositions" type (again, as a record):

type twoTermsProposition = {quantifier1: Quantifier; category1: CategoryRef; appartenence: Appartenence; category2: CategoryRef}

Quantifier will be:

type Quantifier = All | Somes | No

("Some" is an F# keyword, so to avoid confusion I chose "Somes")

Appartenence will be:

type Appartenence = Is | IsNot

CategoryRef will simply be a possible name of the set involved:

type CategoryRef = S | M | P 

("S", "M", "P" are typical names used to model categorical propositions and syllogisms  - which I'm not going to write about this time)


The implementation aimed just to pass this test is like:

let VennToPropositions diagram = [{quantifier1=All;category1=S;appartenence=Is;category2=P}]

I expect that I need a sort of lookup table for basic diagrams (with only one non empty region, having BlackFilled or Starred state), and that I can one step a time add an element to such lookup on the base of adding a new test against the case:

       


let VennToPropositions diagram =

    match diagram with 

        | {S=Empty; SP=Empty;P=Empty} -> []

        | {S=BlackFilled; SP=Empty;P=Empty} ->  [{quantifier1=All;category1=S;appartenence=Is;category2=P}]

        | {S=Empty; SP=Empty;P=BlackFilled} ->  [{quantifier1=All;category1=P;appartenence=Is;category2=S}]

        | {S=Empty; SP=BlackFilled;P=Empty} ->  [{quantifier1=No;category1=S;appartenence=Is;category2=P};{quantifier1=No;category1=P;appartenence=Is;category2=S}]

        | {S=Empty; SP=Starred;P=Empty} -> [{quantifier1=Somes;category1=S;appartenence=Is;category2=P};{quantifier1=Somes;category1=P;appartenence=Is;category2=S}]

        | {S=Starred; SP=Empty;P=Empty} -> [{quantifier1=Somes;category1=S;appartenence=IsNot;category2=P}]

        | {S=Empty; SP=Empty;P=Starred} -> [{quantifier1=Somes;category1=P;appartenence=IsNot;category2=S}]






       
 
Basically the cases that are considered in this "lookup" table are the four basic forms, their "reverse", and the completely empty one, and the reverse of the ones that are not symmetrical:







But the match against more complex diagrams is missing, as the following:


I can see that this diagram is the union of the following basic forms:





















The resulting proposition is the union of them.

(Extreme case:  when all the region are filled.
The result may looks weird: "All M is P - No M is P - No P is M  - All P is M", but it is not a contradiction because it applies when M and P are both empty: the "All" quantifier does not commit to existence, but the "Some" does).

The related test to a function that decomposes it could be as follows:

[<Test>]
    let ``complex decomposition``()=
         let SomeSIsNoP = {S=Starred; SP=BlackFilled;P=Starred}
         let actual = basicCategoricalDecomposition SomeSIsNoP
         let expected = [{S=Starred;SP=Empty;P=Empty};{S=Empty;SP=BlackFilled;P=Empty};{S=Empty;SP=Empty;P=Starred}]
         Assert.AreEqual(expected,actual)



So, any diagram that is more complex than the six forms, can still be decomposed in basic forms, then they can be matched returning the basic propositions, and then all the propositions can be joined together.

The basicCategoricalDecomposition function is:


       



let basicCategoricalDecomposition diagram =

        match diagram with

            | {S=s_pattern; SP=Empty;P=Empty}  when s_pattern <> Empty -> [{S=s_pattern; SP=Empty;P=Empty}] // A or O

            | {S=Empty; SP=sp_pattern;P=Empty} when sp_pattern <> Empty -> [{S=Empty; SP=sp_pattern;P=Empty}]    // E or I  

            | {S=Empty; SP=Empty;P=p_pattern} when p_pattern <> Empty -> [{S=Empty; SP=Empty;P=p_pattern}]

            | {S=s_pattern; SP=sp_pattern;P=Empty} -> [{S=s_pattern; SP=Empty; P=Empty};{S=Empty;SP=sp_pattern;P=Empty}]

            | {S=Empty; SP=sp_pattern;P=p_pattern} -> [{S=Empty; SP=sp_pattern; P=Empty};{S=Empty;SP=Empty;P=p_pattern}]

            | {S=s_pattern; SP=Empty;P=p_pattern} -> [{S=s_pattern; SP=Empty; P=Empty};{S=Empty;SP=Empty;P=p_pattern}]

            | {S=s_pattern; SP=sp_pattern;P=p_pattern} -> [{S=s_pattern;SP=Empty;P=Empty};{S=Empty;SP=sp_pattern;P=Empty};{S=Empty;SP=Empty;P=p_pattern}]

            | _ -> []


       
 



So far if the VennToProposition matches a basic diagram it works as a lookup table.
If I pass to it a more complex diagram, then any match fail, and so I add a default matching rule that just calls the above basicCategoricalDecomposition diagram and make a recursive call to get the union of all the propositions.

I rewrite here the complete VennToProposition function:




       


let rec VennToPropositions diagram =

    match diagram with 

        | {S=Empty; SP=Empty;P=Empty} -> []

        | {S=BlackFilled; SP=Empty;P=Empty} ->  [{quantifier1=All;category1=S;appartenence=Is;category2=P}]

        | {S=Empty; SP=Empty;P=BlackFilled} ->  [{quantifier1=All;category1=P;appartenence=Is;category2=S}]

        | {S=Empty; SP=BlackFilled;P=Empty} ->  [{quantifier1=No;category1=S;appartenence=Is;category2=P};{quantifier1=No;category1=P;appartenence=Is;category2=S}]

        | {S=Empty; SP=Starred;P=Empty} -> [{quantifier1=Somes;category1=S;appartenence=Is;category2=P};{quantifier1=Somes;category1=P;appartenence=Is;category2=S}]

        | {S=Starred; SP=Empty;P=Empty} -> [{quantifier1=Somes;category1=S;appartenence=IsNot;category2=P}]

        | {S=Empty; SP=Empty;P=Starred} -> [{quantifier1=Somes;category1=P;appartenence=IsNot;category2=S}]

        | _ -> List.fold (fun acc item -> acc @ (VennToPropositions item) ) [] (basicCategoricalDecomposition diagram)



      
 



This last line is the one which decompose in basic forms and then call recursively the VennToPropositions.

So far, any valid two sets Venn diagram correspond to a set of one or more basic categorical propositions consistent each other.
Using the Fsharp Repl it is possible to test immediately the code, or it is possible to write new tests.
If needed, the sources are available on gitHub.

The page 184 of "Understanding Arguments" provides problems that can be solved using this "VennToPropositions" function (but of course they are not supposed to be solved with an automated tool).

Possible extension can be about evaluating categorical syllogisms, i.e. particular inferences involving two categorical propositions and three sets, but it can be something for another post.










Tuesday, March 11, 2014

kMeans in F#

This post is about the  KMeans , and a simple implementation in F#.

A quick definition of the problem I can give is:
given a set of points, and an integer n, divide the set in n partitions in a way that minimise the "within-cluster sum of squares (WCSS)".

By literature, the algorithm is fast but suboptimal, i.e. it converges to a local minimum.
It takes an initial guess of centroids, then calculate the clusters associated to each centroid guess, and then recalculate a new centroid of that clusters, and so... on until there is no difference between the set of clusters and the ones of the next iteration.

You should run it different times, and with different n, and then take the best result (according to the objective function minimising the "within cluster sum of squares").

For simplicity, I assumed only bidimensional points, but real problems are more complex and so they have more features that need to be represented by more dimensions.
Anyway this is the definition of a point:

type Point = {X: float; Y: float}

I describe here functions useful for the final algorithm.

Measure of a distance between two points:

let quadraticDistance point1 point2 = 
    Math.Pow((point1.X-point2.X),2.0)+ Math.Pow((point1.Y-point2.Y),2.0)


Any point has to be associated to a cluster represented by a centroid, when that centroids is the closest one:

let closestCentroid point centroids  =
    let precondition = match centroids with | [] -> failwith "Centroid list should be not empty" | _ -> true
    List.fold (fun acc item -> if ((quadraticDistance item point) < (quadraticDistance acc point)) then item else acc) (List.head centroids) centroids


(never mind the "precondition" check: it just creates an exception with a description, avoiding having a more obscure error of calling List.head on an empty list)

Given a segment, I need to find its real centroid:

let centroid segment = 
    let checkPrecondition = match segment with |[] -> failwith "precondition error, list of points expected to be not empty" | _ -> true
    let Xs = List.sumBy (fun item -> item.X) segment
    let Ys = List.sumBy (fun  item -> item.Y) segment
    let centroid = {X=(Xs/(float (List.length segment)));Y=(Ys/(float (List.length segment)))}
    centroid


(Again: the "precondition" make sure that if there is an error condition from empty segment we get the specific message, instead of a "division by zero")

Getting centroids for more segments all in once is easy by using the map function:

let centroids segments =
  List.map (fun x -> centroid x) segments



Now I have almost all the elements to build up the algorithm, and so I can try to implement it, and this time I use mutable types in the loop (so that I "reuse" the same variables, instead of virtually creating new ones each iteration).

I keep up to date the centroids relative to the current iteration and the relative cluster, and the centroids relative to the next iteration. If the cluster do not change, then we stop. Looks like list comparison works as long as they are reordered.


let iterateFindingFinalCentroids points initialCentroids =
    let mutable currentCentroids = initialCentroids
    let mutable segments = segmentPointsByClosestCentroid points currentCentroids
    let mutable nextCentroids = centroids segments 
    let mutable nextSegments = segmentPointsByClosestCentroid points nextCentroids
    while ((List.sort nextSegments) <> (List.sort segments)) do
        currentCentroids 
<- nextCentroids
        segments 
<- nextSegments
        nextCentroids 
<- centroids nextSegments
        nextSegments 
<- segmentPointsByClosestCentroid points nextCentroids
    nextCentroids



One missing part here is a function that separate all the points in different clusters on the base of their proximity to the given centroids.

I found convenient doing it in two steps. Tagging each point with its closest centroid, and then aggregating the points tagged with the same centroid.

A tagged point type can be useful, and perhaps better than a simple pair, because is more expressive:

type TaggedPoints = {Point: Point; AssignedCentroid: Point}


let tagPointsWithClosestCentroid centroids segment =
    List.fold (fun acc item -> {Point=item; AssignedCentroid = (closestCentroid item centroids)}::acc) [] segment


To put the points in different clusters:

let segmentPointsByClosestCentroid points centroids =
    let taggedPoints = tagPointsWithClosestCentroid centroids points
    List.fold (fun acc item -> (List.filter (fun p -> p.AssignedCentroid = item) taggedPoints |> List.map (fun taggedPoint -> taggedPoint.Point))::acc) [] centroids
   

(A next step could be making the tagPointsWithClosestCentroid internal to this last one).

For description of functions used here, see the msdn (for example this is the entry about how to use List.fold: http://msdn.microsoft.com/en-us/library/ee353894.aspx).

I played a little bit with artificial data: I generated artificially four clusters given by bidimensional Gaussian, so expecting that the centroid found will be the central/average (or just 'μ') of that distributions, and actually this is what happened (except when I included a subdle "bug" in the algorithm).

For that experiment I wrote some code in unit tests.

The Normal distributions have the 'μ'  (their centers) equals to (1,1), (1,8), (8,8), (8,1) respectively and variances the (σ^2)'s all equals to 2.

The algorithm actually converge to values close to the 'μ' of the random normal distributions used to generate the points.

I'll show some code for generating cluster from Normal distributions. This array contains the four sources of random data:

 let rnd = System.Random()
            let generators = [|
              {XGauss=new Normal(1.0,2.0);YGauss= new Normal(1.0,2.0)};   
              {XGauss=new Normal(
1.0,2.0);YGauss= new Normal(8.0,2.0)};
              {XGauss=new Normal(8.0,2.0);YGauss= new Normal(8.0,2.0)};
              {XGauss=new Normal(8.0,2.0);YGauss= new Normal(1.0,2.0)}
            |]



To experiment if the algorithm is affected by choosing different initial points the strategy is to take four generators:

            let sampleFirsts = [
                {X=generators.[0].XGauss.Sample();Y=generators.[0].YGauss.Sample()};
                {X=generators.[1].XGauss.Sample();Y=generators.[1].YGauss.Sample()};
                {X=generators.[2].XGauss.Sample();Y=generators.[2].YGauss.Sample()};
                {X=generators.[3].XGauss.Sample();Y=generators.[3].YGauss.Sample()}
            ]
              

I can change the configuration of the sources simply changing their index. For example in the following case the initial centroids are taken all from the first source:

            let sampleFirsts = [
                {X=generators.[0].XGauss.Sample();Y=generators.[0].YGauss.Sample()};
                {X=generators.[0].XGauss.Sample();Y=generators.[0].YGauss.Sample()};
                {X=generators.[0].XGauss.Sample();Y=generators.[0].YGauss.Sample()};
                {X=generators.[0].XGauss.Sample();Y=generators.[0].YGauss.Sample()}


To sample on thousand of points from the different sources (and the first four according to the idea of having initial centroids, so using the "SampleFirsts"):

            let sample = sampleFirsts @ (List.map (fun x -> {X=x.XGauss.Sample();Y=x.YGauss.Sample()}) (List.map (fun x -> generators.[x % 4 ]) [0..995]))


            let initialCentroids = [sample.[0]; sample.[1]; sample.[2];sample.[3]]

            let finalCentroids = iterateFindingFinalCentroids sample initialCentroids



Actually the code returns the centroids (and not the segments), and neither the "variability" (dispersion) of the segments, but they can be calculated later:

            let segment0 = List.fold (fun acc item -> if closestCentroid item finalCentroids = finalCentroids.[0then item::acc else acc) [] sample
            let segment1 = List.fold (fun acc item -> if closestCentroid item finalCentroids = finalCentroids.[1then item::acc else acc) [] sample
            let segment2 = List.fold (fun acc item -> if closestCentroid item finalCentroids = finalCentroids.[2then item::acc else acc) [] sample
            let segment3 = List.fold (fun acc item -> if closestCentroid item finalCentroids = finalCentroids.[3then item::acc else acc) [] sample

            let quadraticDist0 = sumOfQuadraticDistances finalCentroids.[0] segment0
            let quadraticDist1 = sumOfQuadraticDistances finalCentroids.[1] segment1
            let quadraticDist2 = sumOfQuadraticDistances finalCentroids.[2] segment2
            let quadraticDist3 = sumOfQuadraticDistances finalCentroids.[3] segment3




The result is that the algorithm is not affected by the different combinations of sources for the initial centroids (that correspond to the first four points of the sample), still converging to the four expected points, i.e. the center of the generators of the Normal (Guassian) bidimensional distributions.


Sunday, February 16, 2014

Animal Quiz problem in F#

One of my favourite "toy" coding problem is the Animalquiz. I made an implementation in F#, a language that I am interested to now.

My version is implemented as an old fashion console application, developed with unit tests (NUnit). Mock objects (with RhinoMock framework) are used to test from the input/output interaction point of view. Such kind of tests tells something like "expect that when some request for the input will return some specific value that we prepare in advance, and then if the computation is correct, we should see some specific value in output. 

In this post I will remind what the problem is, I'll give some snips of code with links to the sources, and other references from the msdn knowledge base.
I'll not cover advanced functional programming topics like monads (and actually I don't know if they could help to improve the design in this case; I have to investigate).

The project sources can be viewed and downloaded from bitbucket (see "download" link down-right). The downloaded zip archive contains the source,  a solution file produced by Xamarin, probably compatible with Visual Studio as well, and a .vim file containing some macro that I found useful to edit build and test the project from vim as well. External libraries like nunit and RhinoMock must be downloaded separately, and the references in the solution file may need to be fixed by your Ide (xamarin or Visual Studio), or even by editing the solution file (which is xml). I tried also xbuild to produce the binaries (instead of the Xamarin Ide) with mono as target.  Probably msbuild with .net clr will work as well.

I developed and tested it only under Mono 3.2.6 and F# 3.0 (for Mono) on a Mac laptop.
I used Xamarin, and sometimes also Vim.

The problem:
the user is invited to think about an animal, and the program will try to guess that animal, making some questions (like "is it a mammal?", "is it a fish?" etc...) to reduce the range of plausible guesses.
Questions and names of animals are stored in a  binary tree. Leaf nodes contains the name of the animals, and non leaf nodes contain the questions.
Example: if a non leaf node contains the question "is it big?", then big animals must be found only under the yesBranch subtree, and the others animals only under the noBranch subtree (we may have troubles related to the fuzziness of term like big or small, but who cares, from now:  we can just have some redundancy in knowledge because of this issue).

Example: only one leaf node is stored in the knowledge tree. In this case the engine will just guess you are thinking about an elephant, without making any question:



In the following example there is a two leaf nodes knowledge tree, separated by the question "is it big?". The engine will start asking if it is big, and then, according to the yes or no answer, it will guess that the thought animal it is Elephant or a Mouse




If the program fails its guess because the user was thinking about some other animal, then the program has the opportunity to increase the knowledge base,  asking her/him to suggest a new question to distinguish the guessed animal from the thought animal and the relative yes/no  answer. This is enough for the system to adapt the knowledge tree including this new leaf node reachable by the new question.

Example:
Suppose that the animal the user is thinking of is "ant": the program will ask if it is big, the user will answer no, and then the program will guess that the thought animal is a mouse.

The user will answer "no" and then so the program asks her/him to suggest as the new question to distinguish a mouse from an ant, and then she/he suggests something like "is it an insect?" and the relative answer: "yes". The program will so adapt the knowledge tree accordingly, that so will become as in the following picture:




Popular web sites like "akinator" probably work in a similar way (but they have more possible answers than only yes/no  - "probably yes", "probably not", ...-   an already huge knowledge base, and they are mede to guess real people - mostly famous - rather than animals name. Anyway the concept is almost the same.)

For a normal user, it may looks surprising how is it possible that a relatively low number of questions can lead to a fair guess, but the fact that few questions are enough to go deep in a huge database can be explained by mathematical properties of binaries tree data structure, which relates the number of nodes with the number of steps using a logarithmic law:
if the tree is "perfectly balanced", 10 steps are needed to guess an animal among 1024 possibilities, and if the animals are doubled (2048), then the steps needed are just one more: 11, and for 4096 leaf nodes the steps needed are 11. If the tree is not perfectly balanced the things are just a little bit worst. Anyway a logarithmic relation should still hold in the average case (as long as the user contribute adding nodes in a way that makes sense)

I defined the KnowledgeTree type as follows:

type KnowledgeTree = AnimalName of string | SubTree of Tree
and Tree = {Question: string; yesBranch: KnowledgeTree; noBranch: KnowledgeTree}


witch means that a Knowledge tree is an animal, or a question and two knowledge trees named yesBranch and noBranch (containing other knowledge trees).

An example of interaction with the program, that leads to the construction of the last tree show in the previous picture, starting from the first one, is as follows:

>think about an animal
[enter]
>is it a elephant?
no
>what animal was?
mouse
>please, write a yes/no question to distinguish a cat from a elephant
is it big?
>what is the answer to the question "is it big?" to distinguish a mouse from a elephant?
no
>ok

>think about an animal
[enter]
>is it big?
no
>is it a mouse?
no
>what animal was?
ant
>please, write a yes/no question to distinguish a ant from a cat
is it an insect?
>what is the answer to the question "is it an insect?" to distinguish a ant from a cat?
yes
>ok

>think about an animal
[enter]
>is it big?
no
>is it a mouse?
no
>is it a ant?
...


I used a pretty standard technique to to test input & output interaction: making the main just call another function (or method) passing to it instances of console based read/write interfaces (interfaces in term of O.O. programming).


Concretely:

This is the main:

[<EntryPoint>]
    let main argv =

        let myOutputStream = new ConsoleOutput() :> OutStream
        let myInputStream = new ConsoleInput() :> InStream

        let tree = AnimalName "elephant"
        runForever myOutputStream myInputStream tree Welcome

(source)

(the term Stream is used for input and output console, even if in functional programming it has also another meaning that I am not talking about here).

The OutStream and InStream interfaces are defined as:

type OutStream =
    abstract Write: string -> unit

type InStream =
    abstract Input: unit -> string



Their "Console" implementation are:

type ConsoleOutput() =
    interface OutStream with
       member this.Write(x) = printf "%s\n" x

type ConsoleInput() =
    interface InStream with
        member this.Input() = Console.ReadLine()


(source)

(The syntax  ":> OutStream" used by the main is needed, and actually makes an explicit upcast to the OutStream interface,. This upcast is required, which looks quite unusual from a Java/C# point of view, but is safe as long as you are upcasting only to an interface that you know is implemented. Here In msdn is discussed the topic).

The following call

 runForever myOutputStream myInputStream tree Welcome

consists in an infinite loop on standard input and output starting from the Welcome state, that I will show later.

There is an almost identical function
 runUntilIndex myOutputStream myInputStream tree Welcome index


that does the same work, with the same, just two, lines of code (yes duplicated).

As you may guess, this one is used by the tests, that cannot run an infinite loop. Passing the initial state explicitly is useful for the tests to be more granular (example: a test assume we are in the state Welcome  and another one assume we are in another state.
Actually runForever should not accept the state because it has to start always with Welcome. The parameter is there only to show the similarity with "runUntilIndex".

I have chosen to define states by the following type:

type state = | Welcome | InviteToThinkAboutAnAnimal |  GuessingFromCurrentNode |AskWhatAnimalWas | ExpectingDiscriminatingQuestion | AnsweringDiscriminatingQuestion


(source)

When the program starts, the state is Welcome, and after that it goes to InviteToThinkAboutAnAnimal, and so on... the current state, the type of node we are in, and the user message determine the next state and the answer given by the program. This is regulated by "pattern matching".

The static analysis of a pattern matching, reveals incomplete pattern matching at edit or compile time, which means that if we try to mach one of such states, then we get a warning if some of the possible states is missing from the match cases (because either it means you don't need that state, and so you should remove it from the type, or either you need that type, but then it means that you should match it and if you don't, it is probably a bug).

Testing:

as said, some tests that make expectation about what should be written to the console if the user type some message, benefit from stubbing/mocking the input and the output, and by the possibility of defining a state (which will be encapsulated in a "playStructure" exchanged at each "user machine interaction"). Such test can then "tell", by tests, stories like: "When I am in this state, and I write this in input, then I expect that the program will write that in output".

(I have been a little bit creative in using an alternative to the classic given when then/arrange act assert structure of tests, particularly substituting the then - or assert - with a "verify expectations", because there are no assertion, but a verification that the expected call of the mock actually happened - and, in the case of the input stream, returning the specified values)

Some examples:

    [<Test>]
         let ``in the initial welcome state, should invite to think about an animal``() =
          // setup
          let mockrepo = new MockRepository()
          let inStream = mockrepo.DynamicMock()
          let outStream = mockrepo.StrictMock()

          // expectations
          Expect.Call(outStream.Write("think about an animal")) |> ignore
          mockrepo.ReplayAll()

          // arrange
          let tree = AnimalName "elephant"
          let initialState = Welcome
          let numLoops = 0

          // act
          runUntilIndex outStream inStream tree initialState numLoops |> ignore

          // verify expectations
          mockrepo.VerifyAll()


(I used DynamicMock instead of StrictMock for the input because in this case we don't need any special input from the user. See this on StackOverflow about Dynamic vs Strict mock - in RhinoMock).


The following test describes when the use has in mind the only one leaf node present on the knowledge tree:


 [<Test>]
    let ``with a one leaf node knowledge tree, engine asks if it is the animal on that node, the user says yes, and so the engine answer with "yeah!"``() =
        // setup 
        let mockrepo = new MockRepository()
        let inStream = mockrepo.StrictMock()

        let outStream = mockrepo.StrictMock()

        // expectations
        Expect.Call(outStream.Write("think about an animal")) |> ignore
        Expect.Call(inStream.Input()).Return("whatever") |> ignore

        Expect.Call(outStream.Write("is it a cat?")) |> ignore     
        Expect.Call(inStream.Input()).Return("yes") |> ignore

        Expect.Call(outStream.Write("yeah!")) |> ignore 
        Expect.Call(inStream.Input()).Return("anything") |> ignore
          
        mockrepo.ReplayAll()

        // arrange
        let tree = AnimalName "cat"
        let initState = Welcome
        let numOfLoops = 3

        // act
        runUntilIndex outStream inStream tree initState numOfLoops |> ignore

        // verify expectations
        mockrepo.VerifyAll()



I have some longer interaction tests I am not showing here. Those test are more fragile, but have the advantage that can explain "longer" interaction stories (i.e. I start with a node, then the structure becomes of two nodes, and then of three nodes and so on...)

About tests closer to the "real domain" I decided to test the change of the state and the message exchanged and some of the value of a record used to pass values:

    [<Test>]
     let ``at the initial state, Welcome, the user receive the message "please, think about an animal", and state becomes 'invitethinkaboutanimal ``()=
        // arrange
        
        let playStart = messageAndState "" Welcome (AnimalName "monkey")

        // act
        let result = consoleInteract playStart

        // assert
        Assert.AreEqual(InviteToThinkAboutAnAnimal,  result.currentState)
        Assert.AreEqual("think about an animal", result.messageFromEngine)


messageAndState will just build a more complex record defined as follow:

type playingStructure = {messageFromEngine: string; 
                                mutable messageFromPlayer: string option; 
                                currentState: state; 
                                animalToBeLearned: string; 
                                rootTree: AnimalKnowledge; 
                                currentNode: AnimalKnowledge; 
                                yesNoList: string list; 
                                newDiscriminatingQuestion : string option}




The last part I am showing is the pattern matching related to the "state change" and message exchange between the player and the engine.

The consoleInteract function matches the states and the user messages, and returns a new "playingStructure" with the new state:

let consoleInteract playStructure =
    let currentAnimal = match playStructure.currentNode with | AnimalName name -> name | _ -> "ERROR!"
    let messageFromPlayer = match playStructure.messageFromPlayer with | Some X -> X | None -> ""

    match playStructure.currentState with 
        | InviteToThinkAboutAnAnimal -> initState playStructure
        | GuessingFromCurrentNode ->   (
            match playStructure.messageFromPlayer with
          | Some "yes" -> (
            match playStructure.currentNode with
                            | AnimalName _ ->     sayYeah playStructure 
                            | SubTree subTree  -> yesSubTreeNavigation playStructure subTree

                            )
          | Some "no" -> (
            match playStructure.currentNode with 
                            | AnimalName _ -> askWhatAnimalWas playStructure
                            | SubTree subTree -> noSubTreeNavigation playStructure subTree
                          )
          | _ -> askToEnterYesOrNot playStructure 
          )
        | Welcome -> welcomeMessage playStructure
        | AskWhatAnimalWas ->  askDiscriminatingQuestion playStructure messageFromPlayer currentAnimal
        | ExpectingDiscriminatingQuestion -> askAnswerToDiscriminatingQuestion playStructure messageFromPlayer currentAnimal
        | AnsweringDiscriminatingQuestion ->  match playStructure.messageFromPlayer with
            | (Some "yes"| Some "no") -> completeLearning playStructure
            | _ -> haveToAnswerOnlyYesOrNot playStructure



The "messagefromplayer" contains a "option type" probably unnecessary.

Let me just show, to conclude, the code that make the program adapt the binary tree with a new question and a new animal:

let rec learn playStructure = 
    let currentAnimal  = 
        match playStructure.currentNode with 
            | AnimalName name -> name 
            | _ -> failwith "ERROR, current node structure should be a leaf"

    let question tree  =
        match tree with
            | AnimalName name -> "is it a "+name+"?"
            | SubTree {Question=question} -> question 

    let noBranch tree = 
         match tree with
            |SubTree X -> X.noBranch
            | _ -> failwith "consider the consistency of the yes no list"

    let yesBranch tree = 
         match tree with
            |SubTree X -> X.yesBranch
            | _ -> failwith "consider the consistency of the yes no list"

    let thisIsTheQuestion = 
        match playStructure.newDiscriminatingQuestion with 
            | Some X -> X 
            | None -> failwith "discriminating question cannot be empty"

    match playStructure.yesNoList with
     | [] -> match playStructure.messageFromPlayer with
          | Some "yes" -> SubTree { Question=thisIsTheQuestion; 
                                    yesBranch = AnimalName playStructure.animalToBeLearned; 
                                    noBranch = AnimalName currentAnimal}
          | Some "no" -> SubTree {  Question=thisIsTheQuestion;
                                    yesBranch = AnimalName currentAnimal;
                                    noBranch = AnimalName playStructure.animalToBeLearned}


          | _ -> failwith "called learn when user interaction is different from yes or no"

     | "yes"::T -> SubTree {Question = question playStructure.rootTree; 
                            yesBranch = learn (substituteYesNoList playStructure T);     
                            noBranch= noBranch playStructure.rootTree
                           }
     | "no"::T -> SubTree {Question = question playStructure.rootTree; 
                           yesBranch = yesBranch playStructure.rootTree; 
                           noBranch = learn  (substituteYesNoList playStructure T)
                           }


As you may see, many "exceptional states" are explicitly raised in case of conditions that should not happen. We have virtually "preconditions" that assume, for example, that the list of the history of answer the user has given are only "yes" and "no".


I made external call to other function that contains some long list of values for the playStructure.
I didn't show everything of the code, and I will probably work more on it.

Conclusion:
Rhino can be used quite fairly with F#. I tried also Foq, but, more than a more fancy syntax, I didn't find it so lovable.

By using F# I learn how recursion, lists, mattern matching, records, tuples, type inference work. I don't think we have to use O.O. and other clr .net related features, that will be useful anyway later to create full web or desktop applications.

I will take a look to mvc for F# later, and will come back to this topic.

Future directions: how to manage the same domain when there are more users that perhaps need to modify the tree at the same time. I think I'll manage a "event sources"-like solution.

Thanks for reading, sorry for typos, if any.