Compiler project: The beginning

This is the first post of a series concerning one of my craziest projects: writing from scratch an Haskell compiler. This post will be about the context of that project and a “mission statement”. Let’s get that show on the road !

Introduction

Amongst my many projects is an Haskell compiler targeting virtual machines or other such environments. One reason I have this project is because I like Haskell, and writing a compiler for it seems to be a good way to progress in the language. Another reason is that I am curious about compilers, and writing one — even for as peculiar a language as Haskell — seems to be a good way to quench that curiosity.

The project is to write an Haskell compiler, targeting virtual machines beginning with .Net’s one but it may in the future target also the JVM or Parrot. In the future, it might even target more exotic platforms such as WAM, SQL, OpenCL, OpenGL or PostScript. One goal of that choice is portability, another is that this toy compiler not be totally useless.

Objectives, wistful goals & non-objectives

N.B. : In the following, when I talk about an “host”, I am talking about the target for which we generate code, be it .Net’s VM, the JVM or SQL.

We’ll begin with what the compiler should do when it’s finished:

  • Compile code conforming to both Haskell 98 and Haskell 2010, with possibility to choose between both standards.
  • Compile code using common extensions, be they syntax extensions (e.g. monad comprehensions) or library ones (e.g. Concurrent Haskell).
  • Facilitate as much as possible calling to/from host functions.
  • Use as much as feasible the host’s standard library to implement Haskell’s one.
  • Map as far as is reasonable Haskell’s concepts to the host’s ones (e.g. Haskell’s packages).

The first two points are par for the course in a compiler, but note that points 3 to 5 are here because of an objective to generate code as “transparent” as possible, from the host point of view.

We’ll continue with what the compiler might do if I have the time:

  • Have an option to generate code with a non-strict non-lazy evaluation strategy (e.g. a classic call-by-name or a more exotic call-by-future).
  • Experiment with code generation (e.g. automatic memoization)
  • Facilitate as much as possible other FFI calls.
  • Use as much as possible the host’s facilities for debugging, profiling…
  • Be able to self-compile a working version of its Haskell parts
  • Be able to target from one compiler all other targets

It may take some effort, but it would be awesome to be able to seamlessly interact between lazy and non-lazy code, with both code bases keeping their non-strict semantics and without needing a recompilation of the called code. The calling code might then need informations on the called code evaluation strategy.
P.S. : While the last point is reasonable for hosts such as .Net or Parrot, at the extreme limit OpenCL or OpenGL, I think it is not the case for hosts such as SQL or PostScript. Thus, I wouldn’t hold my breath in having a PostScript-based compiler targeting SQL, for example.

We’ll conclude with what the compiler won’t even pretend to do:

  • Be an highly optimizing and efficient compiler: It is a pet project of mine, after all.
  • Have a stable behaviour from one target to another: I wouldn’t bat an eyebrow if .Net’s version is a lazy implementation and the JVM’s one is a non-strict non-lazy one.
  • Generate interoperable code from one version to another, at least in the beginning.

Yes, that does mean that performance will not be my primary goal: correctness will be difficult enough a goal, I’m afraid.

What is JSON-RPC ?

Welcome to the first blog post of this series consacred to my project of a JSON-RPC client implementation in Haskell.

Definition and description

JSON-RPC is a lightweight RPC protocol defining an encoding, JSON, and a transport, HTTP. It also defines notifications, requests not needing a response, and the protocol’s second version defines another possible transport, TCP/IP sockets, and a means of batching calls and notifications. However, due to its simplicity, it does not define neither authentication nor means of querying the server about implemented functions.

Why will I implement it ?

First of all, I’ll implement it because it is a simple textual RPC protocol, specifiying an already implemented in Haskell transport protocol. Another reason is that’s there are many implementations of this protocol, giving me the possibility to test my client against an already existing server. And it serves as a test of my capacities as an Haskell developper and as a spec reader. And there’s the fact that previous experience have told me that trying to tackle everything at once in my quest to be a better Haskell programmer is a bad idea: I’ll start with networking (without taking care of protocol details)… I’ll leave writing the network encoder and taking care of low level details to my next project…

Why Haskell ?

Cf. my post on RFC 707.

And after ?

The next blog post in this series will be consacred to the API my library will have and to the implementation choices I’ll make. After all, I have to choose between two versions of the protocol.

After this project, I’ll probably get back to my project of an RFC 707 implementation.

What is RFC 707 ?

Welcome to the first blog post of this series consacred to my project of a RFC 707 implementation in Haskell.

Definition and description

RFC 707, whose formal name is “A High-Level Framework for Network-Based Resource Sharing” and which was published on the 14th of January 1976, is a primitive system of Remote Procedure Call.
It describes the manner in which a networked procedure call is to be done, the format each message (call and return) must abide by and the binary encoding of each message. In essence, it is the ancestor of ONC RPC, CORBA, SOAP… However, it doesn’t describe the transport protocol used, the ports used, any form of authentication… It has the inconvenient of being underspecified enough that two implementations don’t have much chance to understand each other and the advantage of being very lightweight.

Why will I implement it ?

First of all, I’ll implement it because it is a simple binary RPC protocol, a stepping stone to my project of an ONC RPC client implementation. Another reason is that’s there’s to my knowledge no implementation of this protocol, so it gives me more freedom to handle unspecified parts of the protocol, like how should I handle floating-point numbers. And it serves as a test of my capacities as an Haskell developper and as a spec reader. And there’s the fact that RPC systems are often distributed file systems’ basis, such as NFS or 9P.

Why Haskell ?

Because I appreciate and want to learn this language, and such a library seems like a good idea to me: after all, monads (and I/O in particular) is often cited as THE stumbling block for learning and mastering this language.

And after ?

The next blog post in this series will be consacred to the API my library will have and to the implementation choices I’ll make. After all, like I have said above, this RPC method is woefully underspecified by its RFC.

After this project, I’ll tackle either the ONC RPC client implementation or the AWT curses implementation.

Projects list

Featured

Here are the list of my projects featured in this blog, with their status:

  • JSON-RPC client implementation in Haskell (started: repo is here)
  • RFC 707 implementation in Haskell (on hold)
  • ONC RPC client implementation in Haskell (not started)
  • AWT implementation in curses, using caciocavallo (not started)
  • Haskell compiler in Haskell (not started)
  • file(1) implementation using shared-mime-info as its source of information (started: see here) That project's in Perl.

For the time being, all my projects are on hold or not even started due to my studies…I will however talk about my RFC 707 implementation project, given it already started.

N.B. : I may do some of these projects also in other languages (e.g. Perl) if the fancy takes me.