Out of the Box

Agile, Scrum, Tdd, Software Craftsmanship, System Thinking

Saturday, September 24, 2016

Adding simple gtk# gui in a console F# application

In this post I am quickly talking about how I added a simple gtk# interface to a program I've done some times ago as an exercise in F#.
(Animal Quiz kata: http://tonyxzt.blogspot.it/2014/02/animal-quiz-problem-in-f.html)

The first version was command line based, and was developed with extensive use of unit tests and interaction tests (using Rhino framework).

That version is still incorporated in this new version (git: https://github.com/tonyx/fsharpanimalquizgtk) except that it is possible to run in a gui gtk# wideget adding the paremeter "gui" to the command line.

For instance, under mono, I will run it as
> mono fsharpAnimalQuizKata.exe gui

and the following window will appear:

The interaction with the user is still text based, as in the console based version: the user dialogues with the "engine" using the textbox and the label of the textbox, instead of the standard input/standard output.

Now the knowledge tree can be visualised, hidden, expanded by using the three buttons available.

The knowledge base starts from as single node with the animal "elephant" in it.

An example of interaction to make the engine add a new node with a "cat" is:

After this dialogue, the knowledge tree can be expanded and we can see the following:

Now the loop starts again, and in a new dialogue we can feed the engine to make it store a new node "ant", which is small and is an insect, so expanding the knowledge tree we can see the following:

Essentially the logic of using gtk# in F# is the same as using it in C#.
I've found that translating existing C# based tutorials of gtk# in F# requires:
- declaring as mutable any objects that will be reassigned.
- when event handling in C# has the form:

// adding

// adding handler
btn.Clicked += on_btn_clicked;

// declaration of the handler
void on_btn_clicked (object obj, EventArgs args) {
// method body

An equivalent in F# is 

// adding handler
do btn.Clicked.AddHandler(fun i e -> this.on_btn_clicked(i,e))

// declaration of the handler
member this.on_btn_clicked(i,e: EventArgs) =
// method body

Using Monodevelop/Xamarin I can create an empty F# gtk# template project that starts with an empty window. In my experience starting from that, and consulting any C# gtk#  tutorial to add buttons, menus etc... is good enough to play with gtk# gui.

(I'm sure they it should work also in .net and Windows, but I have not tried yet.)

To "wrap" a knowledge tree to a gtk# TreeStore view I used this function

let rec treeToTreeStore tree (storeTreeStore) iter  prefix =
    match tree with 
        | AnimalName name -> 
             do store.AppendValues(iter,[|prefix + name|]) |> ignore 
        | SubTree {Question=questYesBranch=yBranchNoBranch=nBranch } -> 
             let innerIter = store.AppendValues(iter,[|prefix + quest|])
             do treeToTreeStore yBranch store innerIter "YES: " 
             do treeToTreeStore nBranch store innerIter "NO:  "

Just as a reminder. A knowledge tree type is defined as follows:

type KnowledgeTree = AnimalName of string | SubTree of Tree
and Tree = {QuestionstringYesBranchKnowledgeTreeNoBranchKnowledgeTree}

which means that a knowledge tree is a leaf node constituted by a string (the name of the animal) or is a yes/no question and two sub trees (one leading to the animals for whom the answer is "yes" and another to the animals for whom the answer is "no").

I found reasonable to make extensive use of automated testing of the console based version by mocking the standard input and standard output, and avoiding adding tests related to the gui.

Mocking standard output and standard input requires only making those kind of abstractions

type OutStream =
    abstract Writestring -> unit

type InStream =
    abstract Inputunit -> string
that will be implemented in the real application as follows:

type ConsoleOutput() =
    interface OutStream with
       member this.Write(x) = printf "%s\n" x

type ConsoleInput() =
    interface InStream with
        member this.Input() = Console.ReadLine()

whereas mocks can be used to simulate input/output interaction (by simulating that the user write some input values, and adding expectations about values that the engine will write)

    let ``with a one leaf node knowledge tree, engine asks if it is the animal on that node, the user says yes, and so the engine answer with "yeah!"``() =
        // setup 
        let mockrepo = new MockRepository()
        let inStream = mockrepo.StrictMock<InStream>()

        let outStream = mockrepo.StrictMock<OutStream>()

        // setup expectations
        Expect.Call(outStream.Write("think about an animal")) |> ignore
        Expect.Call(inStream.Input()).Return("whatever") |> ignore

        Expect.Call(outStream.Write("is it a cat?")) |> ignore     
        Expect.Call(inStream.Input()).Return("yes") |> ignore

        Expect.Call(outStream.Write("yeah!")) |> ignore 
        Expect.Call(inStream.Input()).Return("anything") |> ignore

        // arrange
        let tree = AnimalName "cat"
        let initState = Welcome
        let numOfLoops = 3

        // act
        runUntilIndex outStream inStream tree initState numOfLoops |> ignore

        // verify expectations

I guess it would be more complicated to make a similar mechanism of mocking the gui as well, with less value, because the gui is just a thin layer upon the engine which is already extensively tested.

Special mention to Visual Studio Code and  Ionide as editor with some autocompletion features that are missing in Monodevelop/Xamarin.

Future work could be related to
-saving/loading trees, and
-editing tree nodes in the tree view.


Friday, March 21, 2014

Venn diagrams and Categorical Propositions in F#

Categorical propositions express specific relations between elements of different sets, in term of quantifier which can be Universal (like All, or None), or Particular (Some); and positive or negative ("is" or "is not").

Examples: All Mammals are Animals, No Mineral is Alive, Some people are Smart, Some people are Not Smart.
In this post I am going to show some code in F# I wrote, representing categorical propositions, and particularly, related to the union of them, with an internal structure that reflects the representation based on Venn diagrams.
This can be used to verify in an automated way the compatibility between different propositions, in the same way we use simple rules by Venn diagrams.
Example: "All S is P" and "Some S is not P" are not compatible. Venn diagrams show in this case that the two diagrams can't be overlapped.
The area of the diagrams can only be blank, can have a '*' or can be blackfilled, but we can't have '*' (meaning: there is something here), and a black area (meaning: there is nothing here), at the same time.

The building blocks are the four basic form of categorical propositions, and are classified as Affirmative or Negative, and Universal or Particular, as follows:

They have representation based on Venn diagrams:

Venn diagrams in this case consist on two circles indicating two sets which are partially overlapped, so they form three regions. The left region indicates what is in S that is not in P, in the middle region what is in common between them, and on the right what is in P that is not in S.
To indicate that a region is empty we have to black-color it, and to indicate that in a region there must be something in it we have to put an asterisk in it. It is not possible to black-color and put an asterisk to a region at the same time because it will lead to a contradiction (like "All S is P" and "Some S is not P" which is impossible).

Symmetric basic form diagrams (E and O) represent proposition that are valid also on the reverse: so Some S is P means also Some P is S, and No S is P means also No S is P.

As long as we avoid placing asterisk and black-color in the same area (which leads to a contradiction) we can have more complex and valid (consistent) diagrams than the basic forms, which represent more proposition for example as follows:

This diagram means: Some S is not P, No S is P, No P is S, and Some P is not S.

I want to write some F# code solving the problem of translating any Venn diagram like the previous one, involving two sets, in the list of all the categorical propositions that follow from it.

I'll use records to represent either the diagram and either a categorical proposition.
To explain what I will get, this is a possible automated NUnit test related to the "A" proposition:

    let ``A Venn diagram to proposition``()=
        let allSIsP = {S=BlackFilled; SP=Empty;P=Empty}
        let expectedProposition = [{quantifier1=All;category1=S;appartenence=Is;category2=P}]
        let actual = VennToPropositions allSIsP

I need to define:
a type for any diagram involving two categories, which will be a record:

type twoSetsVennDiagram = {S: SectionStatus; SP: SectionStatus; P: SectionStatus}

A "SectionStatus" to set each of its region with the appropriate status:

type SectionStatus = Empty | BlackFilled | Starred

A "Propositions" type (again, as a record):

type twoTermsProposition = {quantifier1: Quantifier; category1: CategoryRef; appartenence: Appartenence; category2: CategoryRef}

Quantifier will be:

type Quantifier = All | Somes | No

("Some" is an F# keyword, so to avoid confusion I chose "Somes")

Appartenence (i.e. "belonging") will be:

type Appartenence = Is | IsNot

CategoryRef will simply be a possible name of the set involved:

type CategoryRef = S | M | P 

("S", "M", "P" are typical names used to model categorical propositions and syllogisms  - which I'm not going to write about this time, so they are just "typical names" given to sets involved in categorical forms. For instance in the classical syllogism "Socrates is human, all humans are mortal, Socrates is Mortal, S is Socrates (or "everything that is Socrate"), M is "everything that is human" and P is "everything that is mortal").

The implementation aimed just to pass this test is like:

let VennToPropositions diagram = [{quantifier1=All;category1=S;appartenence=Is;category2=P}]

I expect that I need a sort of lookup table for basic diagrams (with only one non empty region, having BlackFilled or Starred state), and that I can one step a time add an element to such lookup on the base of adding a new test against the case:


let VennToPropositions diagram =

    match diagram with 

        | {S=Empty; SP=Empty;P=Empty} -> []

        | {S=BlackFilled; SP=Empty;P=Empty} ->  [{quantifier1=All;category1=S;appartenence=Is;category2=P}]

        | {S=Empty; SP=Empty;P=BlackFilled} ->  [{quantifier1=All;category1=P;appartenence=Is;category2=S}]

        | {S=Empty; SP=BlackFilled;P=Empty} ->  [{quantifier1=No;category1=S;appartenence=Is;category2=P};{quantifier1=No;category1=P;appartenence=Is;category2=S}]

        | {S=Empty; SP=Starred;P=Empty} -> [{quantifier1=Somes;category1=S;appartenence=Is;category2=P};{quantifier1=Somes;category1=P;appartenence=Is;category2=S}]

        | {S=Starred; SP=Empty;P=Empty} -> [{quantifier1=Somes;category1=S;appartenence=IsNot;category2=P}]

        | {S=Empty; SP=Empty;P=Starred} -> [{quantifier1=Somes;category1=P;appartenence=IsNot;category2=S}]

Basically the cases that are considered in this "lookup" table are the four basic forms, their "reverse", and the completely empty one, and the reverse of the ones that are not symmetrical:

But the match against more complex diagrams is missing, as the following:

I can see that this diagram is the union of the following basic forms:

The resulting proposition is the union of them.

(Extreme case:  when all the region are filled.
The result may looks weird: "All M is P - No M is P - No P is M  - All P is M", but it can be sound anyway, because it applies when M and P are both empty: the "All" quantifier does not commit to existence, but the "Some" does).

The related test to a function that decomposes it could be as follows:

    let ``complex decomposition``()=
         let SomeSIsNoP = {S=Starred; SP=BlackFilled;P=Starred}
         let actual = basicCategoricalDecomposition SomeSIsNoP
         let expected = [{S=Starred;SP=Empty;P=Empty};{S=Empty;SP=BlackFilled;P=Empty};{S=Empty;SP=Empty;P=Starred}]

So, any diagram that is more complex than those six forms, can still be decomposed in basic forms, then they can be matched returning the basic propositions, and then all the propositions can be joined together.

The basicCategoricalDecomposition function is:


let basicCategoricalDecomposition diagram =

        match diagram with

            | {S=s_pattern; SP=Empty;P=Empty}  when s_pattern <> Empty -> [{S=s_pattern; SP=Empty;P=Empty}] // A or O

            | {S=Empty; SP=sp_pattern;P=Empty} when sp_pattern <> Empty -> [{S=Empty; SP=sp_pattern;P=Empty}]    // E or I  

            | {S=Empty; SP=Empty;P=p_pattern} when p_pattern <> Empty -> [{S=Empty; SP=Empty;P=p_pattern}]

            | {S=s_pattern; SP=sp_pattern;P=Empty} -> [{S=s_pattern; SP=Empty; P=Empty};{S=Empty;SP=sp_pattern;P=Empty}]

            | {S=Empty; SP=sp_pattern;P=p_pattern} -> [{S=Empty; SP=sp_pattern; P=Empty};{S=Empty;SP=Empty;P=p_pattern}]

            | {S=s_pattern; SP=Empty;P=p_pattern} -> [{S=s_pattern; SP=Empty; P=Empty};{S=Empty;SP=Empty;P=p_pattern}]

            | {S=s_pattern; SP=sp_pattern;P=p_pattern} -> [{S=s_pattern;SP=Empty;P=Empty};{S=Empty;SP=sp_pattern;P=Empty};{S=Empty;SP=Empty;P=p_pattern}]

            | _ -> []


So far if the VennToProposition matches a basic diagram it works as a lookup table.
If I pass is is a more complex diagram, then any match fails, and so I add a default matching rule that just calls the above basicCategoricalDecomposition diagram by a recursive call to get the union of all the propositions.

I rewrite here the complete VennToProposition function:


let rec VennToPropositions diagram =

    match diagram with 

        | {S=Empty; SP=Empty;P=Empty} -> []

        | {S=BlackFilled; SP=Empty;P=Empty} ->  [{quantifier1=All;category1=S;appartenence=Is;category2=P}]

        | {S=Empty; SP=Empty;P=BlackFilled} ->  [{quantifier1=All;category1=P;appartenence=Is;category2=S}]

        | {S=Empty; SP=BlackFilled;P=Empty} ->  [{quantifier1=No;category1=S;appartenence=Is;category2=P};{quantifier1=No;category1=P;appartenence=Is;category2=S}]

        | {S=Empty; SP=Starred;P=Empty} -> [{quantifier1=Somes;category1=S;appartenence=Is;category2=P};{quantifier1=Somes;category1=P;appartenence=Is;category2=S}]

        | {S=Starred; SP=Empty;P=Empty} -> [{quantifier1=Somes;category1=S;appartenence=IsNot;category2=P}]

        | {S=Empty; SP=Empty;P=Starred} -> [{quantifier1=Somes;category1=P;appartenence=IsNot;category2=S}]

        | _ -> List.fold (fun acc item -> acc @ (VennToPropositions item) ) [] (basicCategoricalDecomposition diagram)


This last line is the one which decomposes in basic forms and then call recursively the VennToPropositions.

So far, any valid two sets Venn diagram corresponds to a set of one or more basic categorical propositions consistent each other.
Using the Fsharp Repl it is possible to test immediately the code, or it is possible to write new tests.
If needed, the sources are available on gitHub.

The page 184 of "Understanding Arguments" provides problems that can be solved using this "VennToPropositions" function (but of course they are not supposed to be solved with an automated tool).

Possible extension can be about evaluating categorical syllogisms, i.e. particular inferences involving two categorical propositions and three sets, but it can be something for another post.

Tuesday, March 11, 2014

kMeans in F#

This post is about the  KMeans , and a simple implementation in F#.

A quick definition of the problem I can give is:
given a set of points, and an integer n, divide the set in n partitions in a way that minimise the "within-cluster sum of squares (WCSS)".

By literature, the algorithm is fast but suboptimal, i.e. it converges to a local minimum.
It takes an initial guess of centroids, then calculate the clusters associated to each centroid guess, and then recalculate a new centroid of that clusters, and so... on until there is no difference between the set of clusters and the ones of the next iteration.

You should run it different times, and with different n, and then take the best result (according to the objective function minimising the "within cluster sum of squares").

For simplicity, I assumed only bidimensional points, but real problems are more complex and so they have more features that need to be represented by more dimensions.
Anyway this is the definition of a point:

type Point = {X: float; Y: float}

I describe here functions useful for the final algorithm.

Measure of a distance between two points:

let quadraticDistance point1 point2 = 
    Math.Pow((point1.X-point2.X),2.0)+ Math.Pow((point1.Y-point2.Y),2.0)

Any point has to be associated to a cluster represented by a centroid, when that centroids is the closest one:

let closestCentroid point centroids  =
    let precondition = match centroids with | [] -> failwith "Centroid list should be not empty" | _ -> true
    List.fold (fun acc item -> if ((quadraticDistance item point) < (quadraticDistance acc point)) then item else acc) (List.head centroids) centroids

(never mind the "precondition" check: it just creates an exception with a description, avoiding having a more obscure error of calling List.head on an empty list)

Given a segment, I need to find its real centroid:

let centroid segment = 
    let checkPrecondition = match segment with |[] -> failwith "precondition error, list of points expected to be not empty" | _ -> true
    let Xs = List.sumBy (fun item -> item.X) segment
    let Ys = List.sumBy (fun  item -> item.Y) segment
    let centroid = {X=(Xs/(float (List.length segment)));Y=(Ys/(float (List.length segment)))}

(Again: the "precondition" make sure that if there is an error condition from empty segment we get the specific message, instead of a "division by zero")

Getting centroids for more segments all in once is easy by using the map function:

let centroids segments =
  List.map (fun x -> centroid x) segments

Now I have almost all the elements to build up the algorithm, and so I can try to implement it, and this time I use mutable types in the loop (so that I "reuse" the same variables, instead of virtually creating new ones each iteration).

I keep up to date the centroids relative to the current iteration and the relative cluster, and the centroids relative to the next iteration. If the cluster do not change, then we stop. Looks like list comparison works as long as they are reordered.

let iterateFindingFinalCentroids points initialCentroids =
    let mutable currentCentroids = initialCentroids
    let mutable segments = segmentPointsByClosestCentroid points currentCentroids
    let mutable nextCentroids = centroids segments 
    let mutable nextSegments = segmentPointsByClosestCentroid points nextCentroids
    while ((List.sort nextSegments) <> (List.sort segments)) do
<- nextCentroids
<- nextSegments
<- centroids nextSegments
<- segmentPointsByClosestCentroid points nextCentroids

One missing part here is a function that separate all the points in different clusters on the base of their proximity to the given centroids.

I found convenient doing it in two steps. Tagging each point with its closest centroid, and then aggregating the points tagged with the same centroid.

A tagged point type can be useful, and perhaps better than a simple pair, because is more expressive:

type TaggedPoints = {Point: Point; AssignedCentroid: Point}

let tagPointsWithClosestCentroid centroids segment =
    List.fold (fun acc item -> {Point=item; AssignedCentroid = (closestCentroid item centroids)}::acc) [] segment

To put the points in different clusters:

let segmentPointsByClosestCentroid points centroids =
    let taggedPoints = tagPointsWithClosestCentroid centroids points
    List.fold (fun acc item -> (List.filter (fun p -> p.AssignedCentroid = item) taggedPoints |> List.map (fun taggedPoint -> taggedPoint.Point))::acc) [] centroids

(A next step could be making the tagPointsWithClosestCentroid internal to this last one).

For description of functions used here, see the msdn (for example this is the entry about how to use List.fold: http://msdn.microsoft.com/en-us/library/ee353894.aspx).

I played a little bit with artificial data: I generated artificially four clusters given by bidimensional Gaussian, so expecting that the centroid found will be the central/average (or just 'μ') of that distributions, and actually this is what happened (except when I included a subdle "bug" in the algorithm).

For that experiment I wrote some code in unit tests.

The Normal distributions have the 'μ'  (their centers) equals to (1,1), (1,8), (8,8), (8,1) respectively and variances the (σ^2)'s all equals to 2.

The algorithm actually converge to values close to the 'μ' of the random normal distributions used to generate the points.

I'll show some code for generating cluster from Normal distributions. This array contains the four sources of random data:

 let rnd = System.Random()
            let generators = [|
              {XGauss=new Normal(1.0,2.0);YGauss= new Normal(1.0,2.0)};   
              {XGauss=new Normal(
1.0,2.0);YGauss= new Normal(8.0,2.0)};
              {XGauss=new Normal(8.0,2.0);YGauss= new Normal(8.0,2.0)};
              {XGauss=new Normal(8.0,2.0);YGauss= new Normal(1.0,2.0)}

To experiment if the algorithm is affected by choosing different initial points the strategy is to take four generators:

            let sampleFirsts = [

I can change the configuration of the sources simply changing their index. For example in the following case the initial centroids are taken all from the first source:

            let sampleFirsts = [

To sample on thousand of points from the different sources (and the first four according to the idea of having initial centroids, so using the "SampleFirsts"):

            let sample = sampleFirsts @ (List.map (fun x -> {X=x.XGauss.Sample();Y=x.YGauss.Sample()}) (List.map (fun x -> generators.[x % 4 ]) [0..995]))

            let initialCentroids = [sample.[0]; sample.[1]; sample.[2];sample.[3]]

            let finalCentroids = iterateFindingFinalCentroids sample initialCentroids

Actually the code returns the centroids (and not the segments), and neither the "variability" (dispersion) of the segments, but they can be calculated later:

            let segment0 = List.fold (fun acc item -> if closestCentroid item finalCentroids = finalCentroids.[0then item::acc else acc) [] sample
            let segment1 = List.fold (fun acc item -> if closestCentroid item finalCentroids = finalCentroids.[1then item::acc else acc) [] sample
            let segment2 = List.fold (fun acc item -> if closestCentroid item finalCentroids = finalCentroids.[2then item::acc else acc) [] sample
            let segment3 = List.fold (fun acc item -> if closestCentroid item finalCentroids = finalCentroids.[3then item::acc else acc) [] sample

            let quadraticDist0 = sumOfQuadraticDistances finalCentroids.[0] segment0
            let quadraticDist1 = sumOfQuadraticDistances finalCentroids.[1] segment1
            let quadraticDist2 = sumOfQuadraticDistances finalCentroids.[2] segment2
            let quadraticDist3 = sumOfQuadraticDistances finalCentroids.[3] segment3

The result is that the algorithm is not affected by the different combinations of sources for the initial centroids (that correspond to the first four points of the sample), still converging to the four expected points, i.e. the center of the generators of the Normal (Guassian) bidimensional distributions.

Sunday, February 16, 2014

Animal Quiz problem in F#

One of my favourite "toy" coding problem is the Animalquiz. I made an implementation in F#, a language that I am interested to now.

My version is implemented as an old fashion console application, developed with unit tests (NUnit). Mock objects (with RhinoMock framework) are used to test from the input/output interaction point of view. Such kind of tests tells something like "when some request for the input will return some specific value that we prepare in advance, and then if the computation is correct, we should see some specific value in output".

In this post I will remind what the problem is, I'll give some snips of code with links to the sources, and other references from the msdn knowledge base.
I'll not cover advanced functional programming topics like monads (and actually I don't know if they could help to improve the design in this case; I have to investigate).

The project sources can be viewed and downloaded from   bitbucket   github.

(The downloaded zip archive contains the source,  a solution file produced by Xamarin, probably compatible with Visual Studio as well, and a .vim file containing some macro that I found useful to edit build and test the project from vim as well. External libraries like nunit and RhinoMock must be downloaded separately, and the references in the solution file may need to be fixed by your Ide (xamarin or Visual Studio), or even by editing the solution file (which is xml). I tried also xbuild to produce the binaries (instead of the Xamarin Ide) with mono as target.  Probably msbuild with .net clr will work as well.)

Summarizing: I developed and tested it only under Mono 3.2.6 and F# 3.0 (for Mono) on a Mac laptop, using Xamarin, and sometimes also Vim.

The problem:
the user is invited to think about an animal, and the program will try to guess that animal, making some questions (like "is it a mammal?", "is it a fish?" etc...) to reduce the range of plausible guesses.
Questions and names of animals are stored in a binary tree. Leaf nodes contains the name of the animals, and non leaf nodes contain the questions.
Example: if a non leaf node contains the question "is it big?", then big animals must be found only under the yesBranch subtree, and the others animals only under the noBranch subtree (we may have troubles related to the fuzziness of term like big or small, but who cares, from now:  we can just have some redundancy in knowledge because of this issue).

Example: only one leaf node is stored in the knowledge tree. In this case the engine will just guess you are thinking about an elephant, without making any question:

In the following example there is a two leaf nodes knowledge tree, separated by the question "is it big?". The engine will start asking if it is big, and then, according to the yes or no answer, it will guess that the thought animal it is Elephant or a Mouse

If the program fails its guess because the user was thinking about some other animal, then the program has the opportunity to increase the knowledge base,  asking her/him to suggest a new question to distinguish the guessed animal from the thought animal and the relative yes/no  answer. This is enough for the system to adapt the knowledge tree including this new leaf node reachable by the new question.

Suppose that the animal the user is thinking of is "ant": the program will ask if it is big, the user will answer no, and then the program will guess that the thought animal is a mouse.

The user will answer "no" and then so the program asks her/him to suggest as the new question to distinguish a mouse from an ant, and then she/he suggests something like "is it an insect?" and the relative answer: "yes". The program will so adapt the knowledge tree accordingly, that so will become as in the following picture:

Popular web sites like "akinator" probably work in a similar way (but they have more possible answers than only yes/no  - "probably yes", "probably not", ...-   an already huge knowledge base, and they are mede to guess real people - mostly famous - rather than animals name. Anyway the concept is almost the same.)

For a normal user, it may looks surprising how is it possible that a relatively low number of questions can lead to a fair guess, but the fact that few questions are enough to go deep in a huge database can be explained by mathematical properties of binaries tree data structure, which relates the number of nodes with the number of steps using a logarithmic law:
if the tree is "perfectly balanced", 10 steps are needed to guess an animal among 1024 possibilities, and if the animals are doubled (2048), then the steps needed are just one more: 11, and for 4096 leaf nodes the steps needed are 11. If the tree is not perfectly balanced the things are just a little bit worst. Anyway a logarithmic relation should still hold in the average case (as long as the user contribute adding nodes in a way that makes sense)

I defined the KnowledgeTree type as follows:

type KnowledgeTree = AnimalName of string | SubTree of Tree
and Tree = {Question: string; yesBranch: KnowledgeTree; noBranch: KnowledgeTree}

witch means that a Knowledge tree is an animal, or a question and two knowledge trees named yesBranch and noBranch (containing other knowledge trees).

An example of interaction with the program, that leads to the construction of the last tree show in the previous picture, starting from the first one, is as follows:

>think about an animal
>is it a elephant?
>what animal was?
>please, write a yes/no question to distinguish a cat from a elephant
is it big?
>what is the answer to the question "is it big?" to distinguish a mouse from a elephant?

>think about an animal
>is it big?
>is it a mouse?
>what animal was?
>please, write a yes/no question to distinguish a ant from a cat
is it an insect?
>what is the answer to the question "is it an insect?" to distinguish a ant from a cat?

>think about an animal
>is it big?
>is it a mouse?
>is it a ant?

I used a pretty standard technique to to test input & output interaction: making the main just call another function (or method) passing to it instances of console based read/write interfaces (interfaces in term of O.O. programming).


This is the main:

    let main argv =

        let myOutputStream = new ConsoleOutput() :> OutStream
        let myInputStream = new ConsoleInput() :> InStream

        let tree = AnimalName "elephant"
        runForever myOutputStream myInputStream tree Welcome


(the term Stream is used for input and output console, even if in functional programming it has also another meaning that I am not talking about here).

The OutStream and InStream interfaces are defined as:

type OutStream =
    abstract Write: string -> unit

type InStream =
    abstract Input: unit -> string

Their "Console" implementation are:

type ConsoleOutput() =
    interface OutStream with
       member this.Write(x) = printf "%s\n" x

type ConsoleInput() =
    interface InStream with
        member this.Input() = Console.ReadLine()


(The syntax  ":> OutStream" used by the main is needed, and actually makes an explicit upcast to the OutStream interface,. This upcast is required, which looks quite unusual from a Java/C# point of view, but is safe as long as you are upcasting only to an interface that you know is implemented. Here In msdn is discussed the topic).

The following call

 runForever myOutputStream myInputStream tree Welcome

consists in an infinite loop on standard input and output starting from the Welcome state, that I will show later.

There is an almost identical function
 runUntilIndex myOutputStream myInputStream tree Welcome index

that does the same work, with the same, just two, lines of code (yes duplicated).

As you may guess, this one is used by the tests, that cannot run an infinite loop. Passing the initial state explicitly is useful for the tests to be more granular (example: a test assume we are in the state Welcome  and another one assume we are in another state.
Actually runForever should not accept the state because it has to start always with Welcome. The parameter is there only to show the similarity with "runUntilIndex".

I have chosen to define states by the following type:

type state = | Welcome | InviteToThinkAboutAnAnimal |  GuessingFromCurrentNode |AskWhatAnimalWas | ExpectingDiscriminatingQuestion | AnsweringDiscriminatingQuestion


When the program starts, the state is Welcome, and after that it goes to InviteToThinkAboutAnAnimal, and so on... the current state, the type of node we are in, and the user message determine the next state and the answer given by the program. This is regulated by "pattern matching".

The static analysis of a pattern matching, reveals incomplete pattern matching at edit or compile time, which means that if we try to mach one of such states, then we get a warning if some of the possible states is missing from the match cases (because either it means you don't need that state, and so you should remove it from the type, or either you need that type, but then it means that you should match it and if you don't, it is probably a bug).


as said, some tests that make expectation about what should be written to the console if the user type some message, benefit from stubbing/mocking the input and the output, and by the possibility of defining a state (which will be encapsulated in a "playStructure" exchanged at each "user machine interaction"). Such test can then "tell", by tests, stories like: "When I am in this state, and I write this in input, then I expect that the program will write that in output".

(I have been a little bit creative in using an alternative to the classic given when then/arrange act assert structure of tests, particularly substituting the then - or assert - with a "verify expectations", because there are no assertion, but a verification that the expected call of the mock actually happened - and, in the case of the input stream, returning the specified values)

Some examples:

         let ``in the initial welcome state, should invite to think about an animal``() =
          // setup
          let mockrepo = new MockRepository()
          let inStream = mockrepo.DynamicMock()
          let outStream = mockrepo.StrictMock()

          // expectations
          Expect.Call(outStream.Write("think about an animal")) |> ignore

          // arrange
          let tree = AnimalName "elephant"
          let initialState = Welcome
          let numLoops = 0

          // act
          runUntilIndex outStream inStream tree initialState numLoops |> ignore

          // verify expectations

(I used DynamicMock instead of StrictMock for the input because in this case we don't need any special input from the user. See this on StackOverflow about Dynamic vs Strict mock - in RhinoMock).

The following test describes when the use has in mind the only one leaf node present on the knowledge tree:

    let ``with a one leaf node knowledge tree, engine asks if it is the animal on that node, the user says yes, and so the engine answer with "yeah!"``() =
        // setup 
        let mockrepo = new MockRepository()
        let inStream = mockrepo.StrictMock()

        let outStream = mockrepo.StrictMock()

        // expectations
        Expect.Call(outStream.Write("think about an animal")) |> ignore
        Expect.Call(inStream.Input()).Return("whatever") |> ignore

        Expect.Call(outStream.Write("is it a cat?")) |> ignore     
        Expect.Call(inStream.Input()).Return("yes") |> ignore

        Expect.Call(outStream.Write("yeah!")) |> ignore 
        Expect.Call(inStream.Input()).Return("anything") |> ignore

        // arrange
        let tree = AnimalName "cat"
        let initState = Welcome
        let numOfLoops = 3

        // act
        runUntilIndex outStream inStream tree initState numOfLoops |> ignore

        // verify expectations

I have some longer interaction tests I am not showing here. Those test are more fragile, but have the advantage that can explain "longer" interaction stories (i.e. I start with a node, then the structure becomes of two nodes, and then of three nodes and so on...)

About tests closer to the "real domain" I decided to test the change of the state and the message exchanged and some of the value of a record used to pass values:

     let ``at the initial state, Welcome, the user receive the message "please, think about an animal", and state becomes 'invitethinkaboutanimal ``()=
        // arrange
        let playStart = messageAndState "" Welcome (AnimalName "monkey")

        // act
        let result = consoleInteract playStart

        // assert
        Assert.AreEqual(InviteToThinkAboutAnAnimal,  result.currentState)
        Assert.AreEqual("think about an animal", result.messageFromEngine)

messageAndState will just build a more complex record defined as follow:

type playingStructure = {messageFromEngine: string; 
                                mutable messageFromPlayer: string option; 
                                currentState: state; 
                                animalToBeLearned: string; 
                                rootTree: AnimalKnowledge; 
                                currentNode: AnimalKnowledge; 
                                yesNoList: string list; 
                                newDiscriminatingQuestion : string option}

The last part I am showing is the pattern matching related to the "state change" and message exchange between the player and the engine.

The consoleInteract function matches the states and the user messages, and returns a new "playingStructure" with the new state:

let consoleInteract playStructure =
    let currentAnimal = match playStructure.currentNode with | AnimalName name -> name | _ -> "ERROR!"
    let messageFromPlayer = match playStructure.messageFromPlayer with | Some X -> X | None -> ""

    match playStructure.currentState with 
        | InviteToThinkAboutAnAnimal -> initState playStructure
        | GuessingFromCurrentNode ->   (
            match playStructure.messageFromPlayer with
          | Some "yes" -> (
            match playStructure.currentNode with
                            | AnimalName _ ->     sayYeah playStructure 
                            | SubTree subTree  -> yesSubTreeNavigation playStructure subTree

          | Some "no" -> (
            match playStructure.currentNode with 
                            | AnimalName _ -> askWhatAnimalWas playStructure
                            | SubTree subTree -> noSubTreeNavigation playStructure subTree
          | _ -> askToEnterYesOrNot playStructure 
        | Welcome -> welcomeMessage playStructure
        | AskWhatAnimalWas ->  askDiscriminatingQuestion playStructure messageFromPlayer currentAnimal
        | ExpectingDiscriminatingQuestion -> askAnswerToDiscriminatingQuestion playStructure messageFromPlayer currentAnimal
        | AnsweringDiscriminatingQuestion ->  match playStructure.messageFromPlayer with
            | (Some "yes"| Some "no") -> completeLearning playStructure
            | _ -> haveToAnswerOnlyYesOrNot playStructure

The "messagefromplayer" contains a "option type" probably unnecessary.

Let me just show, to conclude, the code that make the program adapt the binary tree with a new question and a new animal:

let rec learn playStructure = 
    let currentAnimal  = 
        match playStructure.currentNode with 
            | AnimalName name -> name 
            | _ -> failwith "ERROR, current node structure should be a leaf"

    let question tree  =
        match tree with
            | AnimalName name -> "is it a "+name+"?"
            | SubTree {Question=question} -> question 

    let noBranch tree = 
         match tree with
            |SubTree X -> X.noBranch
            | _ -> failwith "consider the consistency of the yes no list"

    let yesBranch tree = 
         match tree with
            |SubTree X -> X.yesBranch
            | _ -> failwith "consider the consistency of the yes no list"

    let thisIsTheQuestion = 
        match playStructure.newDiscriminatingQuestion with 
            | Some X -> X 
            | None -> failwith "discriminating question cannot be empty"

    match playStructure.yesNoList with
     | [] -> match playStructure.messageFromPlayer with
          | Some "yes" -> SubTree { Question=thisIsTheQuestion; 
                                    yesBranch = AnimalName playStructure.animalToBeLearned; 
                                    noBranch = AnimalName currentAnimal}
          | Some "no" -> SubTree {  Question=thisIsTheQuestion;
                                    yesBranch = AnimalName currentAnimal;
                                    noBranch = AnimalName playStructure.animalToBeLearned}

          | _ -> failwith "called learn when user interaction is different from yes or no"

     | "yes"::T -> SubTree {Question = question playStructure.rootTree; 
                            yesBranch = learn (substituteYesNoList playStructure T);     
                            noBranch= noBranch playStructure.rootTree
     | "no"::T -> SubTree {Question = question playStructure.rootTree; 
                           yesBranch = yesBranch playStructure.rootTree; 
                           noBranch = learn  (substituteYesNoList playStructure T)

UPDATE 8/14/2016: the previous two lines contained a bug that I reproduced by the test interaction test named "``update the knowledge tree starting from elephant (reproducing bug)``" in test file    "https://github.com/tonyx/fsharpanimalquiz/blob/master/fsharpAnimalQuizKata/InteractionTests.fs"

and in the corresponding unit test "``deep descending the tree (reproducing bug)`` in the test file https://github.com/tonyx/fsharpanimalquiz/blob/master/fsharpAnimalQuizKata/UnitTests.fs

here there are the fixed version of those two lines:
     | "yes"::T -> SubTree {Question = question playStructure.rootTree
                            yesBranch = learn (substituteYesNoList  {playStructure with rootTree = yesBranch playStructure.rootTree} T);     
                            noBranchnoBranch playStructure.rootTree
     | "no"::T -> SubTree {Question = question playStructure.rootTree;
                           yesBranch = yesBranch playStructure.rootTree;
                           noBranch = learn  (substituteYesNoList {playStructure with rootTree = noBranch playStructure.rootTree} T)

(source file: https://github.com/tonyx/fsharpanimalquiz/blob/master/fsharpAnimalQuizKata/BrainModule.fs )


As you may see, many "exceptional states" are explicitly raised in case of conditions that should not happen. We have virtually "preconditions" that assume, for example, that the list of the history of answer the user has given are only "yes" and "no".

I made external call to other function that contains some long list of values for the playStructure.
I didn't show everything of the code, and I will probably work more on it.

Rhino can be used quite fairly with F#. I tried also Foq, but, more than a more fancy syntax, I didn't find it so lovable.

By using F# I learn how recursion, lists, mattern matching, records, tuples, type inference work. I don't think we have to use O.O. and other clr .net related features, that will be useful anyway later to create full web or desktop applications.

I will take a look to mvc for F# later, and will come back to this topic.

Future directions: how to manage the same domain when there are more users that perhaps need to modify the tree at the same time. I think I'll manage a "event sources"-like solution.

Thanks for reading, sorry for typos, if any.

(update 8/14/2016: source code is here https://github.com/tonyx/fsharpanimalquiz )