Wednesday, July 24, 2013

Scala Redis client goes non blocking : uses Akka IO

scala-redis is getting a new non blocking version based on a kernel implemented with the new Akka IO. The result is that all APIs are non blocking and return a Future. We are trying to keep the API as close to the blocking version as possible. But bits rot and some of the technical debt need to be repaid. We have cleaned up some of the return types which had unnecessary Option[] wrappers, made some improvements and standardizations on the API type signatures and also working on making the byte array manipulation faster using akka.util.ByteString at the implementation level. We also have plans of using the Akka IO pipeline for abstracting the various stages of handling Redis protocol request and response.

As of today we have quite a bit ready for review by the users. The APIs may change a bit here and there, but the core APIs are up there. There are a few areas which have not yet been implemented like PubSub or clustering. Stay tuned for more updates on this blog .. Here are a few code snippets that demonstrate the usage of the APIs ..

Non blocking get/set

@volatile var callbackExecuted = false

val ks = (1 to 10).map(i => s"client_key_$i")
val kvs = ks.zip(1 to 10)

val sets: Seq[Future[Boolean]] = kvs map {
  case (k, v) => client.set(k, v)
}

val setResult = Future.sequence(sets) map { r: Seq[Boolean] =>
  callbackExecuted = true
  r
}

callbackExecuted should be (false)
setResult.futureValue should contain only (true)
callbackExecuted should be (true)

callbackExecuted = false
val gets: Seq[Future[Option[Long]]] = ks.map { k => client.get[Long](k) }
val getResult = Future.sequence(gets).map { rs =>
  callbackExecuted = true
  rs.flatten.sum
}

callbackExecuted should be (false)
getResult.futureValue should equal (55)
callbackExecuted should be (true)

Composing through sequential combinators

val key = "client_key_seq"
val values = (1 to 100).toList
val pushResult = client.lpush(key, 0, values:_*)
val getResult = client.lrange[Long](key, 0, -1)

val res = for {
  p <- pushResult.mapTo[Long]
  if p > 0
  r <- getResult.mapTo[List[Long]]
} yield (p, r)

val (count, list) = res.futureValue
count should equal (101)
list.reverse should equal (0 to 100)

Error handling using Promise Failure

val key = "client_err"
val v = client.set(key, "value200")
v.futureValue should be (true)

val x = client.lpush(key, 1200)
val thrown = evaluating { x.futureValue } should produce [TestFailedException]
thrown.getCause.getMessage should equal ("ERR Operation against a key holding the wrong kind of value")

Feedbacks welcome, especially on the APIs and their usage. All code are in Github with all tests in the test folder. Jisoo Park (@guersam) has been doing an awesome job contributing a lot to all the goodness that's there in the repo. Thanks a lot for all the help ..

Monday, July 22, 2013

The Realm of Racket is an enjoyable read

There are many ways to write a programming language book. You can start introducing the syntax and semantics of the language in a naturally comprehensible sequence of complexity and usage. Or you can choose to introduce the various features of the language with real world examples using the standard librray that the language offers. IIRC Accelerated C++ by Andrew Koenig and Barbara Moo takes this route. I really loved this approach and enjoyed reading the book.

Of course Matthias Felleisen is known for a third way of teaching a language - the fun way. The Little Schemer and The Seasoned Schemer have introduced a novel way of learning a language. The Realm of Racket follows a similar style of teaching the latest descendant of Lisp, one game at a time. The implementation of every game introduces the idioms and language features with increasing degrees of complexity. There's a nice progression which helps understanding the complex features of the language building upon the already acquired knowledge of the simpler ones in earlier chapters.

The book begins with a history of the Racket programming language and how it evolved as a descendant of Scheme, how it makes programming fun and how it can be used successfully as an introductory language to students aspiring to learn programming. Then it starts with Getting Started with Dr Racket for the impatient programmer and explains the IDE that will serve as your companion for the entire duration of your playing around with the book.

Every chapter introduces some of the new language features and either develops a new game or builds on some improvement of a game developed earlier. This not only demonstrates a real world usage of the syntax and semantics of the language but makes the programmer aware of how the various features interact as a whole to build complex abstractions out of simpler ones. The book also takes every pain to defer the complexity of the features to the right point so that the reader is not burdened upfront. e.g. Lambdas are introduced only when the authors have introduced all basics of programming with functions and recursion. Mutants are introduced only after teaching the virtues of immutablity. For loops and comprehensions appear only when the book has introduced all list processing functions like folds, maps and filters. And then the book goes into great depth explaining why the language has so many variants of the for loop like for/list, for/fold, for*, for/first, for/last etc. In this entire discussion of list processing, for loops etc., I would love to see a more detailed discussion on sequence in the book. Sequence abstracts a large number of data types, but, much like Clojure it introduces a new way of API design - a single sequence to rule them all. API designers would surely like to have more of this sauce as part of their repertoire. Racket's uniform way of handling sequence is definitely a potent model of abstraction as compared to Scheme or other versions of Lisp.

The games developed progress in complexity and we can see the powers of the language being put to great use when the authors introduce lazy evaluation and memoized computations and use them to improve the Dice of Doom. Then the authors introduce distributed game development which is the final frontier that the book covers. It's truly an enjoyable ride through the entire series.

The concluding chapter talks about some of the advanced features like classes, objects and meta-programming. Any Lisp book will be incomplete without a discussion of macros and language development. But I think the authors have done well to defer these features till the end. Considering the fact that this is a book for beginning to learn the language this sounded like a sane measure.

However, as a programmer experienced in other languages and wanting to take a look at Racket, I would have loved to see some coverage on testing. Introducing a bit on testing practices, maybe a unit testing library would have made the book more complete.

The style of writing of this book has an underlying tone of humor and simplicity, which makes it a thoroughly enjoyable read. The use of illustrations and comics take away the monotony of learning the prosaics of a language. And the fact that Racket is a simple enough language makes this combination with pictures very refreshing.

On the whole, as a book introducing a language, The Realm of Racket is a fun read. I enjoyed reading it a lot and recommend it without reservations for your bookshelf.

Monday, June 03, 2013

Endo is the new fluent API

I tweeted this over the weekend ..

a good title for a possible blog post .. endo is the new fluent API ..
— Debasish Ghosh (@debasishg) June 1, 2013

My last two blog posts have been about endomorphisms and how it combines with the other functional structures to help you write expressive and composable code. In A DSL with an Endo - monoids for free, endos play with Writer monad and implement a DSL for a sequence of activities through monoidal composition. And in An exercise in Refactoring - Playing around with Monoids and Endomorphisms, I discuss a refactoring exercise that exploits the monoid of an endo to make composition easier. Endomorphisms help you lift your computation into a data type that gives you an instance of a monoid. And the mappend operation of the monoid is the function composition. Hence once you have the Endo for your type defined, you get a nice declarative syntax for the operations that you want to compose, resulting in a fluent API. Just a quick recap .. endomorphisms are functions that map a type on to itself and offer composition over monoids. Given an endomorphism we can define an implicit monoid instance ..

implicit def endoInstance[A]: Monoid[Endo[A]] = new Monoid[Endo[A]] {
  def append(f1: Endo[A], f2: => Endo[A]) = f1 compose f2
  def zero = Endo.idEndo
}

I am not going into the details of this, which I discussed at length in my earlier posts. In this article I will sum up with yet another use case for making fluent APIs using the monoid instance of an Endo. Consider an example from the domain of securities trading, where a security trade goes through a sequence of transformations in its lifecycle through the trading process .. Here's a typical Trade model (very very trivialified for demonstration) ..

sealed trait Instrument
case class Security(isin: String, name: String) extends Instrument

case class Trade(refNo: String, tradeDate: Date, valueDate: Option[Date] = None, 
  ins: Instrument, principal: BigDecimal, net: Option[BigDecimal] = None, 
  status: TradeStatus = CREATED)

Modeling a typical lifecycle of a trade is complex. But for illustration, let's consider these simple ones which need to executed on a trade in sequence ..

Validate the trade
Assign value date to the trade, which will ideally be the settlement date
Enrich the trade with tax/fees and net trade value
Journalize the trade in books

Each of the functions take a Trade and return a copy of the Trade with some attributes modified. A naive way of doing that will be as follows ..

def validate(t: Trade): Trade = //..

def addValueDate(t: Trade): Trade = //..

def enrich(t: Trade): Trade = //..

def journalize(t: Trade): Trade = //..

and invoke these methods in sequence while modeling the lifecycle. Instead we try to make it more composable and lift the function Trade => Trade within the Endo ..

type TradeLifecycle = Endo[Trade]

and here's the implementation ..

// validate the trade: business logic elided
def validate: TradeLifecycle = 
  ((t: Trade) => t.copy(status = VALIDATED)).endo

// add value date to the trade (for settlement)
def addValueDate: TradeLifecycle = 
  ((t: Trade) => t.copy(valueDate = Some(t.tradeDate), status = VALUE_DATE_ADDED)).endo

// enrich the trade: add taxes and compute net value: business logic elided
def enrich: TradeLifecycle = 
  ((t: Trade) => t.copy(net = Some(t.principal + 100), status = ENRICHED)).endo

// journalize the trade into book: business logic elided
def journalize: TradeLifecycle = 
  ((t: Trade) => t.copy(status = FINALIZED)).endo

Now endo has an instance of Monoid defined by scalaz and the mappend of Endo is function composition .. Hence here's our lifecycle model using the holy monoid of endo ..

def doTrade(t: Trade) =
  (journalize |+| enrich |+| addValueDate |+| validate).apply(t)

It's almost the specification that we listed above in numbered bullets. Note the inside out sequence that's required for the composition to take place in proper order.

Why not plain old composition ?

A valid question. The reason - abstraction. Abstracting the composition within types helps you compose the result with other types, as we saw in my earlier blog posts. In one of them we built larger abstractions using the Writer monad with Endo and in the other we used the mzero of the monoid as a fallback during composition thereby avoiding any special case branch statements.

One size doesn't fit all ..

The endo and its monoid compose beautifully and gives us a domain friendly syntax that expresses the business functionality ina nice succinct way. But it's not a pattern which you can apply everywhere where you need to compose a bunch of domain behaviors. Like every idiom, it has its shortcomings and you need different sets of solutions in your repertoire. For example the above solution doesn't handle any of the domain exceptions - what if the validation fails ? With the above strategy the only way you can handle this situation is to throw exceptions from validate function. But exceptions are side-effects and in functional programming there are more cleaner ways to tame the evil. And for that you need different patterns in practice. More on that in subsequent posts ..

Monday, March 04, 2013

An exercise in Refactoring - Playing around with Monoids and Endomorphisms

A language is powerful when it offers sufficient building blocks for library design and adequate syntactic sugar that helps build expressive syntax on top of the lower level APIs that the library publishes. In this post I will discuss an exercise in refactoring while trying to raise the level of abstraction of a modeling problem.

Consider the following modeling problem that I recently discussed in one of the Scala training sessions. It's simple but offers ample opportunities to explore how we can raise the level of abstraction in designing the solution model. We will start with an imperative solution and then incrementally work on raising the level of abstraction to make the final code functional and more composable.

A Problem of Transformation ..

The problem is to compute the salary of a person through composition of multiple salary components calculated based on some percentage of other components. It's a problem of applying repeated transformations to a pipeline of successive computations - hence it can be generalized as a case study in function composition. But with some constraints as we will see shortly.

Let's say that the salary of a person is computed as per the following algorithm :

basic = the basic component of his salary
allowances = 20% of basic
bonus = 10% of (basic + allowances)
tax = 30% of (basic + allowances + bonus)
surcharge = 10% of (basic + allowances + bonus - tax)

Note that the computation process starts with a basic salary, computes successive components taking the input from the previous computation of the pipeline. But there's a catch though, which makes the problem a bit more interesting from the modleing perspective. Not all components of the salary are mandatory - of course the basic is mandatory. Hence the final components of the salary will be determined by a configuration object which can be like the following ..

// an item = true means the component should be activated in the computation
case class SalaryConfig(
  surcharge: Boolean    = true, 
  tax: Boolean          = true, 
  bonus: Boolean        = true,
  allowance: Boolean    = true 
)

So when we compute the salary we need to take care of this configuration object and activate the relevant components for calculation.

A Function defines a Transformation ..

Let's first translate the above components into separate Scala functions ..

// B = basic + 20%
val plusAllowance = (b: Double) => b * 1.2

// C = B + 10%
val plusBonus = (b: Double) => b * 1.1

// D = C - 30%
val plusTax = (b: Double) => 0.7 * b

// E = D - 10%
val plusSurcharge = (b: Double) => 0.9 * b

Note that every function computes the salary up to the stage which will be fed to the next component computation. So the final salary is really the chained composition of all of these functions in a specific order as determined by the above stated algorithm.

But we need to selectively activate and deactivate the components depending on the SalaryConfig passed. Here's the version that comes straight from the imperative mindset ..

The Imperative Solution ..

// no abstraction, imperative, using var
def computeSalary(sc: SalaryConfig, basic: Double) = {
  var salary = basic
  if (sc.allowance) salary = plusAllowance(salary)
  if (sc.bonus) salary = plusBonus(salary)
  if (sc.tax) salary = plusTax(salary)
  if (sc.surcharge) salary = plusSurcharge(salary)
  salary
}

Straight, imperative, mutating (using var) and finally rejected by our functional mindset.

Thinking in terms of Expressions and Composition ..

Think in terms of expressions (not statements) that compose. We have functions defined above that we could compose together and get the result. But, but .. the config, which we somehow need to incorporate as part of our composable expressions.

So direct composition of functions won't work because we need some conditional support to take care of the config. How else can we have a chain of functions to compose ?

Note that all of the above functions for computing the components are of type (Double => Double). Hmm .. this means they are endomorphisms, which are functions that have the same argument and return type - "endo" means "inside" and "morphism" means "transformation". So an endomorphism maps a type on to itself. Scalaz defines it as ..

sealed trait Endo[A] {
  /** The captured function. */
  def run: A => A
  //..
}

But the interesting part is that there's a monoid instance for Endo and the associative append operation of the monoid for Endo is function composition. That seems mouthful .. so let's dissect what we just said ..

As you all know, a monoid is defined as "a semigroup with an identity", i.e.

trait Monoid[A] {
  def append(m1: A, m2: A): A
  def zero: A
}

and append has to be associative.

Endo forms a monoid where zero is the identity endomorphism and append composes the underlying functions. Isn't that what we need ? Of course we need to figure out how to sneak in those conditionals ..

implicit def endoInstance[A]: Monoid[Endo[A]] = new Monoid[Endo[A]] {
  def append(f1: Endo[A], f2: => Endo[A]) = f1 compose f2
  def zero = Endo.idEndo
}

But we need to append the Endo only if the corresponding bit in SalaryConfig is true. Scala allows extending a class with custom methods and scalaz gives us the following as an extension method on Boolean ..

/**
 * Returns the given argument if this is `true`, otherwise, the zero element 
 * for the type of the given argument.
 */
final def ??[A](a: => A)(implicit z: Monoid[A]): A = b.valueOrZero(self)(a)

That's exactly what we need to have the following implementation of a functional computeSalary that uses monoids on Endomorphisms to compose our functions of computing the salary components ..

// compose using mappend of endomorphism
def computeSalary(sc: SalaryConfig, basic: Double) = {
  val e = 
    sc.surcharge ?? plusSurcharge.endo     |+|
    sc.tax ?? plusTax.endo                 |+|
    sc.bonus ?? plusBonus.endo             |+| 
    sc.allowance ?? plusAllowance.endo 
  e run basic
}

More Generalization - Abstracting over Types ..

We can generalize the solution further and abstract upon the type that represents the collection of component functions. In the above implementation we are picking each function individually and doing an append on the monoid. Instead we can abstract over a type constructor that allows us to fold the append operation over a collection of elements.

Foldable[] is an abstraction which allows its elements to be folded over. Scalaz defines instances of Foldable[] typeclass for List, Vector etc. so we don't care about the underlying type as long as it has an instance of Foldable[]. And Foldable[] has a method foldMap that makes a Monoid out of every element of the Foldable[] using a supplied function and then folds over the structure using the append function of the Monoid.

trait Foldable[F[_]]  { self =>
  def foldMap[A,B](fa: F[A])(f: A => B)(implicit F: Monoid[B]): B
  //..
}

In our example, f: A => B is the endo function and the append is the append of Endo which composes all the functions that form the Foldable[] structure. Here's the version using foldMap ..

def computeSalary(sc: SalaryConfig, basic: Double) = {
  val components = 
    List((sc.surcharge, plusSurcharge), 
         (sc.tax, plusTax), 
         (sc.bonus, plusBonus),
         (sc.allowance, plusAllowance)
    )
  val e = components.foldMap(e => e._1 ?? e._2.endo)
  e run basic
}

This is an exercise which discusses how to apply transformations on values when you need to model endomorphisms. Instead of thinking in terms of generic composition of functions, we exploited the types more, discovered that our tranformations are actually endomorphisms. And then applied the properties of endomorphism to model function composition as monoidal appends. The moment we modeled at a higher level of abstraction (endomorphism rather than native functions), we could use the zero element of the monoid as the composable null object in the sequence of function transformations.

In case you are interested I have the whole working example in my github repo.

Friday, February 15, 2013

A DSL with an Endo - monoids for free

When we design a domain model, one of the issues that we care about is abstraction of implementation from the user level API. Besides making the published contract simple, this also decouples the implementation and allows post facto optimization to be done without any impact on the user level API.

Consider a class like the following ..

// a sample task in a project
case class Task(name: String) 

// a project with a list of tasks & dependencies amongst the
// various tasks
case class Project(name: String, 
                   startDate: java.util.Date, 
                   endDate: Option[java.util.Date] = None, 
                   tasks: List[Task] = List(), 
                   deps: List[(Task, Task)] = List())

We can always use the algebraic data type definition above to add tasks and dependencies to a project. Besides being cumbersome as a user level API, it also is a way to program too close to the implementation. The user is coupled to the fact that we use a List to store tasks, making it difficult to use any alternate implementation in the future. We can offer a Builder like OO interface with fluent APIs, but that also adds to the verbosity of implementation, makes builders mutable and is generally more difficult to compose with other generic functional abstractions.

Ideally we should be having a DSL that lets users create projects and add tasks and dependencies to them.

In this post I will discuss a few functional abstractions that will stay behind from the user APIs, and yet provide the compositional power to wire up the DSL. This is a post inspired by this post which discusses a similar DSL design using Endo and Writers in Haskell.

Let's address the issues one by one. We need to accumulate tasks that belong to the project. So we need an abstraction that helps in this accumulation e.g. concatenation in a list, or in a set or in a Map .. etc. One abstraction that comes to mind is a Monoid that gives us an associative binary operation between two objects of a type that form a monoid.

trait Monoid[T] {
  def append(m1: T, m2: T): T
  def zero: T
}

A List is a monoid with concatenation as the append. But since we don't want to expose the concrete data structure to the client API, we can talk in terms of monoids.

The other data structure that we need is some form of an abstraction that will offer us the writing operation into the monoid. A Writer monad is an example of this. In fact the combination of a Writer and a Monoid is potent enough to have such a DSL in the making. Tony Morris used this combo to implement a logging functionality ..

for {
  a <- k withvaluelog ("starting with " + _)
  b <- (a + 7) withlog "adding 7"
  c <- (b * 3).nolog
  d <- c.toString.reverse.toInt withvaluelog ("switcheroo with " + _)
  e <- (d % 2 == 0) withlog "is even?"
} yield e

We could use this same technique here. But we have a problem - Project is not a monoid and we don't have a definition of zero for a Project that we can use to make it a Monoid. Is there something that would help us get a monoid from Project i.e. allow us to use Project in a monoid ?

Enter Endo .. an endomorphism which is simply a function that takes an argument of type T and returns the same type. In Scala, we can state this as ..

sealed trait Endo[A] {
  // The captured function
  def run: A => A
  //..
}

Scalaz defines Endo[A] and provides a lot of helper functions and syntactic sugars to use endomorphisms. Among its other properties, Endo[A] provides a natural monoid and allows us to use A in a Monoid. In other words, endomorphisms of A form a monoid under composition. In our case we can define an Endo[Project] as a function that takes a Project and returns a Project. We can then use it with a Writer (as above) and implement the accumulation of tasks within a Project.

Exercise: Implement Tony Morris' logger without side-effects using an Endo.

Here's how we would like to accumulate tasks in our DSL ..

for {
  _ <- task("Study Customer Requirements")
  _ <- task("Analyze Use Cases")
  a <- task("Develop code")
} yield a

Let's define a function that adds a Task to a Project ..

// add task to a project
val withTask = (t: Task, p: Project) => p.copy(tasks = t :: p.tasks)

and use this function to define the DSL API task which makes an Endo[Project] and passes it as a Monoid to the Writer monad. In the following snippet, (p: Project) => withTask(t, p) is a mapping from Project => Project, which gets converted to an Endo and then passed to the Writer monad for adding to the task list of the Project.

def task(n: String): Writer[Endo[Project], Task] = {
  val t = Task(n)
  for {
    _ <- tell(((p: Project) => withTask(t, p)).endo)
  } yield t
}

The DSL snippet above is a monad comprehension. Let's add some more syntax to the DSL by defining dependencies of a Project. That's also a mapping from one Project state to another and can be realized using a similar function like withTask ..

// add project dependency
val withDependency = (t: Task, on: Task, p: Project) => 
  p.copy(deps = (t, on) :: p.deps)

.. and define a function dependsOn to our DSL that allows the user to add the explicit dependencies amongst tasks. But this time instead of making it a standalone function we will make it a method of the class Task. This is only for getting some free syntactic sugar in the DSL. Here's the modified Task ADT ..

case class Task(name: String) {
  def dependsOn(on: Task): Writer[Endo[Project], Task] = {
    for {
      _ <- tell(((p: Project) => withDependency(this, on, p)).endo)
    } yield this
  }
}

Finally we define the last API of our DSL that glues together the building of the Project and the addition of tasks and dependencies without directly coupling the user to some of the underlying implementation artifacts.

def project(name: String, startDate: Date)(e: Writer[Endo[Project], Task]) = {
  val p = Project(name, startDate)
  e.run._1(p)
}

And we can finally create a Project along with tasks and dependencies using our DSL ..

project("xenos", now) {
  for {
    a <- task("study customer requirements")
    b <- task("analyze usecases")
    _ <- b dependsOn a
    c <- task("design & code")
    _ <- c dependsOn b
    d <- c dependsOn a
  } yield d
}

In case you are interested I have the whole working example in my github repo.

Friday, February 01, 2013

Modular Abstractions in Scala with Cakes and Path Dependent Types

I have been trying out various options of implementing the Cake pattern in Scala, considered to be one of the many ways of doing dependency injection without using any additional framework. There are other (more functional) ways of doing the same thing, one of which I blogged about before and also talked about in a NY Scala meetup. But I digress ..

Call it DI or not, the Cake pattern is one of the helpful techniques to implement modular abstractions in Scala. You weave your abstract components (aka traits), layering on the dependencies and commit to implementations only at the end of the world. I was trying to come up with an implementation that does not use self type annotations. It's not that I think self type annotations are kludgy or anything but I don't find them used elsewhere much besides the Cake pattern. And of course mutually recursive self annotations are a code smell that makes your system anti-modular.

In the following implementation I use path dependent types, which have become a regular feature in Scala 2.10. Incidentally it was there since long back under the blessings of an experimental feature, but has come out in public only in 2.10. The consequence is that instead of self type annotations or inheritance I will be configuring my dependencies using composition.

Let me start with some basic abstractions of a very simple domain model. The core component that I will build is a service that reports the portfolio of clients as a balance. The example has been simplified for illustration purposes - the actual real life model has a much more complex implementation.

A Portfolio is a collection of Balances. A Balance is a position of an Account in a specific Currency as on a particular Date. Expressing this in simple terms, we have the following traits ..

// currency
sealed trait Currency
case object USD extends Currency
case object EUR extends Currency
case object AUD extends Currency

//account
case class Account(no: String, name: String, openedOn: Date, status: String)

trait BalanceComponent {
  type Balance

  def balance(amount: Double, currency: Currency, asOf: Date): Balance
  def inBaseCurrency(b: Balance): Balance
}

The interesting point to note is that the actual type of Balance has been abstracted in BalanceComponent, since various services may choose to use various representations of a Balance. And this is one of the layers of the Cake that we will mix finally ..

Just a note for the uninitiated, a base currency is typically considered the domestic currency or accounting currency. For accounting purposes, a firm may use the base currency to represent all profits and losses. So we may have some service or component that would like to have the balances reported in base currency.

trait Portfolio {
  val bal: BalanceComponent
  import bal._

  def currentPortfolio(account: Account): List[Balance]
}

Portfolio uses the abstract BalanceComponent and does not commit to any specific implementation. And the Balance in the return type of the method currentPortfolio is actually a path dependent type, made to look nice through the object import syntax.

Now let's have some standalone implementations of the above components .. we are still not there yet to mix the cake ..

// report balance as a TUPLE3 - simple
trait SimpleBalanceComponent extends BalanceComponent {
  type Balance = (Double, Currency, Date)

  override def balance(amount: Double, currency: Currency, asOf: Date) = 
    (amount, currency, asOf)
  override def inBaseCurrency(b: Balance) = 
    ((b._1) * baseCurrencyFactor.get(b._2).get, baseCurrency, b._3)
}

// report balance as an ADT
trait CustomBalanceComponent extends BalanceComponent {
  type Balance = BalanceRep

  // balance representation
  case class BalanceRep(amount: Double, currency: Currency, asOf: Date)

  override def balance(amount: Double, currency: Currency, asOf: Date) = 
    BalanceRep(amount, currency, asOf)
  override def inBaseCurrency(b: Balance) = 
    BalanceRep((b.amount) * baseCurrencyFactor.get(b.currency).get, baseCurrency, b.asOf)
}

And a sample implementation of ClientPortfolio that adds logic without yet commiting to any concrete type for the BalanceComponent.

trait ClientPortfolio extends Portfolio {
  val bal: BalanceComponent
  import bal._

  override def currentPortfolio(account: Account) = {
    //.. actual impl will fetch from database
    List(
      balance(1000, EUR, Calendar.getInstance.getTime),
      balance(1500, AUD, Calendar.getInstance.getTime)
    )
  }
}

Similar to ClientPortfolio, we can have multiple implementations of Portfolio reporting that reports balances in various forms. So our cake has started taking shape. We have the Portfolio component and the BalanceComponent already weaved in without any implementation. Let's add yet another layer to the mix, maybe for fun - a decorator for the Portfolio.

We add Auditing as a component which can decorate *any* Portfolio component and report the balance of an account in base currency. Note that Auditing needs to abstract implementations of BalanceComponent as well as Portfolio since the idea is to decorate any Portfolio component using any of the underlying BalanceComponent implementations.

Many cake implementations use self type annotations (or inheritance) for this. I will be using composition and path dependent types.

trait Auditing extends Portfolio {
  val semantics: Portfolio
  val bal: semantics.bal.type
  import bal._

  override def currentPortfolio(account: Account) = {
    semantics.currentPortfolio(account) map inBaseCurrency
  }
}

Note how the Auditing component uses the same Balance implementation as the underlying decorated Portfolio component, enforced through path dependent types.

And we have reached the end of the world without yet committing to any implementation of our components .. But now let's do that and get a concrete service instantiated ..

object SimpleBalanceComponent extends SimpleBalanceComponent
object CustomBalanceComponent extends CustomBalanceComponent

object ClientPortfolioAuditService1 extends Auditing {
  val semantics = new ClientPortfolio { val bal = SimpleBalanceComponent }
  val bal: semantics.bal.type = semantics.bal
}

object ClientPortfolioAuditService2 extends Auditing {
  val semantics = new ClientPortfolio { val bal = CustomBalanceComponent }
  val bal: semantics.bal.type = semantics.bal
}

Try out in your Repl and see how the two services behave the same way abstracting away all implementations of components from the user ..

scala> ClientPortfolioAuditService1.currentPortfolio(Account("100", "dg", java.util.Calendar.getInstance.getTime, "a"))
res0: List[(Double, com.redis.cake.Currency, java.util.Date)] = List((1300.0,USD,Thu Jan 31 12:58:35 IST 2013), (1800.0,USD,Thu Jan 31 12:58:35 IST 2013))

scala> ClientPortfolioAuditService2.currentPortfolio(Account("100", "dg", java.util.Calendar.getInstance.getTime, "a"))
res1: List[com.redis.cake.ClientPortfolioAuditService2.bal.Balance] = List(BalanceRep(1300.0,USD,Thu Jan 31 12:58:46 IST 2013), BalanceRep(1800.0,USD,Thu Jan 31 12:58:46 IST 2013))

The technique discussed above is inspired from the paper Polymoprhic Embedding of DSLs. I have been using this technique for quite some time and I have discussed a somewhat similar implementation in my book DSLs In Action while discussing internal DSL design in Scala.

And in case you are interested in the full code, I have uploaded it on my Github.

Monday, January 14, 2013

A language and its interpretation - Learning free monads

I have been playing around with free monads of late and finding them more and more useful in implementing separation of concerns between pure data and its interpretation. Monads generally don't compose. But if you restrict monads to a particular form, then you can define a sum type that composes. In the paper Data Types a la carte, Wouter Swierstra describes this form as

data Term f a =
    Pure a
    | Impure (f (Term f a))

These monads consist of either pure values or an impure effect, constructed using f. When f is a functor, Term f is a monad. And in this case, Term f is the free monad being the left adjoint to the forgetful functor f.

I am not going into the details of what makes a monad free. I once asked this question on google+ and Edward Kmett came up with a beautiful explanation. So instead of trying to come up with a half assed version of the same, have a look at Ed's response here .

In short, we can say that a free monad is the freeest object possible that's still a monad.

Composition

Free monads compose and help you build larger abstractions which are pure data and yet manage to retain all properties of a monad. Hmm .. this sounds interesting because now we can not only build abstractions but make them extensible through composition by clients using the fact that it's still a monad.

A free monad is pure data, not yet interpreted, as we will see shortly. You can pass it to a separate interpreter (possibly multiple interpreters) which can do whatever you feel like with the structure. Your free monad remains pure while all impurities can be put inside your interpreters.

And so we interpret ..

In this post I will describe how I implemented an interpreter for a Joy like concatenative language that uses the stack for its computation. Of course it's just a prototype and addresses an extremely simplified subset, but you get the idea. The basic purpose is to explore the power of free monads and come up with something that can potentially be extended to a full blown implementation of the language.

When designing an interpreter there's always the risk of conflating the language along with the concerns of interpreting it. Here we will have the 2 completely decoupled by designing the core language constructs as free monads. Gabriel Gonzalez has written a series of blog posts [1,2,3] on the use of free monads which contain details of their virtues and usage patterns. My post is just an account of my learning experience. In fact after I wrote the post, I discovered a similar exercise done for embedding Forth in Haskell - so I guess I'm not the first one to learn free monads using a language interpreter.

Let's start with some code, which is basically a snippet of Joy like code that will run within our Haskell interpreter ..

p :: Joy ()
p = do push 5
       push 6
       add
       incr
       add2
       square
       cube
       end

This is our wish list and at the end of the post we will see if we can interpret this correctly and reason about some aspects of this code. Let's not bother much about the details of the above snippet. The important point is that if we fire it up in ghci, we will see that it's pure data!

*Joy> p
Free (Push 5 (Free (Push 6 (Free (Add (Free (Push 1 (Free (Add (Free (Push 1 (Free (Add (Free (Push 1 (Free (Add (Free (Dup (Free (Mult (Free (Dup (Free (Dup (Free (Mult (Free (Mult (Free End))))))))))))))))))))))))))))))
*Joy>

We haven't yet executed anything of the above code. It's completely free to be interpreted and possibly in multiple ways. So, we have achieved this isolation - that the data is freed from the interpreter. You want to develop a pretty printer for this data - go for it. You want to apply semantics and give an execution model based on Joy - do it.

Building the pure data (aka Builder)

Let's first define the core operators of the language ..

data JoyOperator cont = Push Int cont
                      | Add      cont 
                      | Mult     cont 
                      | Dup      cont 
                      | End          
                      deriving (Show, Functor)

The interesting piece is the derivation of the Functor, which is required for implementing the forgetful functor adjoint of the free monad. Keeping the technical mumbo jumbo aside, free monads are just a general way of turning functors into monads. So if we have a core operator f as a functor, we can get a free monad Free f out of it. And knowing something is a free monad helps you transform transform an operation over the monad (the monad homomorphism) into an operation over the functor (functor homomorphism). We will see how this helps later ..

The other point to note is that all operators take a continuation argument that points to the next operation in the chain. End is the terminal symbol and we plan to ignore anything that the user enters after an End.

Push takes an Int and pushes into the stack, Add pops the top 2 elements of the stack and pushes the sum, Mult does the same for multiplication and Dup duplicates the top element of the stack. End signifies the end of program.

Next we define the free monad over JoyOperator by using the Free data constructor, defined as part of Control.Monad.Free ..

data Free f a = Pure a | Free (f (Free f a))

-- | The free monad over JoyOperator
type Joy = Free JoyOperator

And then follow it up with some of the definitions of Joy operators as operations over free monads. Note that liftF lifts an operator (which is a Functor) into the context of a free monad. liftF has the following type ..

liftF :: Functor f => f a -> Free f a

As a property, a free moand has a forgetful functor as its left adjoint. The unlifting from the monad to the functor is given by the retract function ..

retract :: Monad f => Free f a -> f a

and needless to say

retract . liftF = id

-- | Push an integer to the stack
push :: Int -> Joy ()
push n = liftF $ Push n ()

-- | Add the top two numbers of the stack and push the sum
add :: Joy ()
add = liftF $ Add ()

.. and this can be done for all operators that we wish to support.

Not only this, we can also combine the above operators and build newer ones. Remember we are working with monads and hence the *do* notation based sequencing comes for free ..

-- | This combinator adds 1 to a number. 
incr :: Joy ()
incr = do {1; add}

-- | This combinator increments twice
add2 :: Joy ()
add2 = do {incr; incr}

-- | This combinator squares a number
square :: Joy ()
square = do {dup; mult}

Now we can have a composite program which sequences through the core operators as well as the ones we derive from them. And that's what we posted as our first example snippet of a target program.

An Interpreter (aka Visitor)

Once we have the pure data part done, let's try and build an interpreter that does the actual execution based on the semantics that we defined on the operators.

-- | Run a joy program. Result is either an Int or an error
runProgram :: Joy n -> Either JoyError Int
runProgram program = joy [] program
  where joy stack (Free (Push v cont))         = joy (v : stack) cont
        joy (a : b : stack) (Free (Add cont))  = joy (a + b : stack) cont
        joy (a : b : stack) (Free (Mult cont)) = joy (a * b : stack) cont
        joy (a : stack) (Free (Dup cont))      = joy (a : a : stack) cont
        joy _ (Free Add {})                    = Left NotEnoughParamsOnStack
        joy _ (Free Dup {})                    = Left NotEnoughParamsOnStack
        joy [] (Free End)                      = Left NotEnoughParamsOnStack
        joy [result] (Free End)                = Right result
        joy _ (Free End)                       = Left NotEmptyOnEnd
        joy _ Pure {}                          = Left NoEnd

runProgram is the interpreter that takes a free monad as its input. Its implementation is quite trivial - it just matches on the recursive structure of the data and pushes the appropriate results on the stack. Now if we run our program p using the above interpreter we get the correct result ..

*Joy> runProgram p
Right 7529536
*Joy>

Equational Reasoning

Being Haskell and being pure, we obviously can prove some bits and pieces of our program as mathematical equations. At the beginning I said that End is the end of the program and anything after End needs to be ignored. What happens if we do the following ..

runProgram $ do {push 5; incr; incr; end; incr; incr}

If you have guessed correctly we get Right 7, which means that all operations after end have been ignored. But can we prove that our program indeed does this ? The following is a proof that end indeed ends our program. Consider some operation m follows end ..

end >> m

-- definition of end
= liftF $ End >> m

-- m >> m' = m >>= \_ -> m'
= liftF $ End >>= \_ -> m

-- definition of liftF
= Free End >>= \_ -> m

-- Free m >>= f = Free (fmap (>>= f) m)
= Free (fmap (>>= \_ -> m) End)

-- fmap f End = End
= Free End

-- liftF f = Free f (f is a functor)
= liftF End

-- definition of End
= end

So this shows that any operation we do after end is never executed.

Improving the bind

Free monads offer improved modularity by allowing us to separate the building of pure data from the (possibly) impure concerns of interpreting it with the external world. But a naive implementation leads to a penalty in runtime efficiency as Janis Voigtlander discusses in his paper Asymptotic Improvement of Computations over Free Monads. And the Haskell's implementation of free monads had this inefficiency where the asymptotic complexity of substitution was quadratic because of left associative bind. Edward Kmett implemented the Janis trick and engineered a solution that gets over this inefficiency. I will discuss this in a future post. And if you would like to play around with the interpreter, the source is there in my github.

Thursday, January 03, 2013

strict : recursion :: non-strict : co-recursion

Consider the very popular algorithm that uses a tail recursive call to implement a map over a List. Here's the implementation in F#

let map converter l =
  let rec loop acc = function
    | [] -> acc
    | hd :: tl -> loop (converter hd :: acc) tl
  List.rev (loop [] l)

Scala is also a statically typed functional programming language, though it uses a completely different trick and mutability to implement map over a sequence. Let's ignore it for the time being and imagine we are being a bonafide functional programmer.

The above code uses a very common idiom of accumulate-and-reverse to implement a tail recursive algorithm. Though Scala stdlib does not use this technique and uses a mutable construct for this implementation, we could have done the same thing in Scala as well ..

Both Scala and F# are languages with strict evaluation semantics as the default. What would a similar tail recursive Haskell implementation look like ?

map' f xs = reverse $ go [] xs
    where
    go accum [] = accum
    go accum (x:xs) = go (f x : accum) xs

Let's now have a look at the actual implementation of map in Haskell prelude ..

map :: (a -> b) -> [a] -> [b]
map _ []     = []
map f (x:xs) = f x : map f xs

Whoa! It's not a tail recursion and instead a body recursive implementation. Why is that ? We have been taught that tail recursion is the holy grail since it takes constant stack space and all ..

In a strict language the above implementation of map will be a bad idea since it uses linear stack space. On the other hand the initial implementation is tail recursive. But Haskell has non-strict semantics and hence the last version of map consumes only one element of the input and yields one element of the output. The earlier versions consume the whole input before yielding any output.

In a lazy language what you need is to make your algorithm co-recursive. And this needs co-data. Understanding co-data or co-recursion needs a brief background of co-induction and how it differs from induction.

When we define a list in Haskell as

data [a] = [] | a : [a]

it means that the set "List of a" is the smallest set such that [] is in the List of a, and if [a] is in the List of a and a is in a, then a : [a] is in List of a. This is the inductive definition os a List and we can use recursion to implement various properties of a List. Also once we have a recursive definition, we can use structural induction to prove various properties of the data structure.

If an inductive definition on data gives us the smallest set, a co-inductive definition on co-data gives us the largest set. Let's define a Stream in Haskell ..

data Stream a = Cons a (Stream a)

First to note - unlike the definition of a List, there's no special case for empty Stream. Hence no base case unlike the inductive definition above. And a "Stream of a" is the largest set consisting of a pair of an "a" and a "Stream of a". Which means that Stream is an infinite data structure i.e. the range (or co-domain) of a Stream is infinite. And we implement properties on co-inductive data structures using co-recursive programs. A co-recursive program is defined as one whose range is a type defined recursively as the greatest solution of some equation.

In terms of a little bit of mathematical concepts, if we model types as sets, then an infinite list of integers is the greatest set X for which there is a bijection X ≅ ℤ x X and a program that generates such a list is a co-recursive program. As its dual, the type of a finite list is the least set X for which X ≅ 1 + (ℤ x X), where 1 is a singleton set and + is the disjoint union of sets. Any program which consumes such an input is a recursive program. (Note the dualities in italics).

Strictly speaking, the co-domain does not necessarily have to be infinite. If you have read my earlier post on initial algebra, co-recursion recursively defines functions whose co-domain is a final data type, dual to the way that ordinary recursion recursively defines functions whose domain is an initial data type.

In case of primitive recursion (with the List data type above), you always frame the recursive step to operate on data which gets reduced in size in subsequent steps of recursive calls. It's not so with co-recursion since you may be working with co-data and infinite data structures. Since in Haskell, a List can also be infinite, using the above co-recursive definition of map, we can have

head $ map (1+) [1..]

which invokes map on an infinite list of integers. And here the co-recursive steps of map operate successively on sets of data which are not less than the earlier set. So, it's not tail recursion that makes an efficient implementation in Haskell, you need to make the co-recursive call within the application of a constructor.

As the first post of the new year, that's all friends. Consider this as the notes from someone learning the principles of co-inductive definitions. Who knew co-recursion is so interesting ? It will be more interesting once we get down into the proof techniques for co-recursive programs. But that's for a future post ..