Working towards a more stable Template Haskell

Posted on 2024-05-06 by Teo Camarasu

Template Haskell enables writing programs that generate and manipulate Haskell code. This is a powerful tool that lets us avoid boilerplate and expand the expressivity of the language. Yet, its power comes with a risk. Users of Template Haskell often end up being tightly coupled to the template-haskell support library. This library defines Template Haskell’s version of the Haskell syntax tree and the interface between Template Haskell and GHC. Almost every release of GHC leads to a breaking change to this library to reflect the ever expanding range of syntax accepted by GHC. These changes often break user code. Over time libraries that use template-haskell amass a bunch of conditionally compiled code to enable compatibility with a wide range of template-haskell versions.

This post is about some work I have been doing with Sebastian Graf and other GHC devs to help avoid Template Haskell breaking user’s code in the future. This is a well known problem (see GHC ticket #24021). There is no simple, single solution. Both the API and its patterns of usage are complex.

So far, our focus has been on the internal relationship between GHC and the template-haskell library. We managed to untie some knots that tightly coupled together each version of GHC to their bundled version of template-haskell, and knots that made it difficult to change the interface of template-haskell. With these blockers out of the way, it should be possible to make some changes to give us much stronger stability guarantees.

Before embarking on implementing these exciting improvements, we wanted to share our plans to get feedback and to invite the community to collaborate with us.

I’ll sketch how Template Haskell leads to instability, the changes we are planning to make, the changes we’ve made so far and how you can get involved.

Why Template Haskell leads to broken code🔗

At the very core of the template-haskell library is a set of data types that represent Haskell syntax trees: Expr, Type, Dec, Pat, etc. These types often change, because they reflect the source-syntax of GHC Haskell, in its full glory. That in turn means that any code that directly uses these data types (for construction or pattern matching) will break as new releases of GHC change the source-syntax. We want to loosen this undesirable coupling between the source-syntax and user code.

We can divide usages of Template Haskell by how they relate to these syntax tree types. At a high level, a specific usage either produces/constructs or consumes/deconstructs these types. We can further subdivide these into four categories:

(A) Producing with quotes
(B) Producing with constructors or smart constructors
(C) Consuming through reification
(D) Consuming for syntax analysis

A library that makes use of Template Haskell is likely to have a mixture of these usage patterns. For instance, a common case is a library that produces derived typeclass instances. Such a library might use reification (type C) to look up the definition of a data type, then use smart constructors (type B) to generate the definition of the instances themselves.

Let’s now briefly look at each of these usage classes, and their bearing on user code stability.

Producing syntax using constructors: unstable, but popular (type B)🔗

By far the most popular way to produce Template Haskell syntax trees is by either directly using constructors (VarE), or by using the monadic smart constructors (varE) exported by Language.Haskell.TH.Lib. These both make users vulnerable to breakage when the syntax tree types change.

For instance, in template-haskell-2.18, the ConP constructor, which represents a constructor pattern was given a new field to represent the possibility of a list of type applications preceding the argument. This led to the following code in esqueleto to break:

pure $ ConP 'Value [VarP var]

This then had to be patched to give an empty value for the new field (and conditional compilation had to be introduced to support both versions):

pure $ ConP 'Value [] [VarP var]

By using these constructors or smart constructors directly to produce syntax trees, users expose themselves to breakage whenever a new field is added. And as we’ve pointed out, these happen often.

Producing syntax using Quotes: stable, but unpopular (type A)🔗

Template Haskell quotes produce syntax trees in a very stable manner. Haskell’s concrete syntax develops in a backwards compatible way. Changes to the language will not cause x + y to stop being accepted, and neither will it break [| x + y |], irrespective of whether the representation of infix operations in the Template Haskell syntax tree types changes. By using quotes, we can piggy-back off these existing backwards compatibility guarantees.

Returning to our example from before, we could have expressed the same thing using quote syntax as:

[p| Value $(varP var) |]

This wouldn’t have broken, and is unlikely to ever break with a newer version of template-haskell.

Yet, quotes are much less popular than direct uses of constructors. There are definitely some places where quotes (currently) fall short and we are trying to track these here. These don’t completely explain their low uptake though. The example above shows one place where they could’ve been used to make code more stable, but weren’t.

Next time you are writing Template Haskell code, try using quotes. If you run into issues let us know on the GHC bug tracker, whether they are to do with limitations in expressivity, error messages, or lack of documentation.

Consuming syntax through reification (type C)🔗

template-haskell exports a reify family of methods to lookup information about identifiers.

For instance:

reifyType :: Name -> Q Type

Here reifyType will return the type of the given identifier. But the notion of type in play is that of source-language type syntax, which is full of syntactic clutter like parentheses, infix operators, list and tuple notation, etc. For reification we don’t want source syntax; we want a nice small representation of the type (type variable, arrow, forall, type constructor application etc). This is simply a mis-design of Template Haskell’s reification interface.

Not only does this get in the way when we want to analyse the results from reification functions, since we need to wade through the details of source-language syntax, but we are also exposed to breaking changes from when these types often change. Returning to the ConP example, if we pattern matched on a ConP constructor, our code would break when the new field was added.

Consuming syntax for analysis (type D)🔗

Exhaustively pattern matching on Template Haskell syntax trees is the rarest usage pattern. Users might for instance want to find all free variables, or transform the entire syntax tree into a different representation.

Such users are inherently tightly coupled to the exact definition of syntax trees for a given version of GHC. Because this case is both much more difficult and much more rare than the others, we currently aren’t focused on avoiding breaking changes here. Indeed it’s hard to see how that could be possible.

How we can avoid tight coupling🔗

Now we move from the current situation to the ways we can improve it. In this section I will be summarising the Plan to Stabilise Template Haskell document I’ve written up, which in turn is trying to synthesise the various ideas by GHC developers and others into a concrete plan.None of these plans are set in stone, and they could benefit from feedback and experimentation.

The tight coupling we have sketched consists of two components: (i) the definitions of the syntax tree types changing regularly with new versions of GHC; (ii) user code depends on the exact definitions of the syntax tree types. When these two elements combine, code breaks. So, our aim is to as much as possible reduce the prevalence of one (or both) of these.

Loosening coupling between GHC and `template-haskell`🔗

Of the usage patterns we’ve examined, only one, type D, cares about seeing the entire, exact syntax trees that correspond to GHC’s current version of the source-language. All of the other use cases can be satisfied with a mere subset of the language. This opens up a possibility that allows us to avoid breaking changes for the vast majority of users, without requiring any modifications to their usage of Template Haskell. The idea is to let a user use a version of template-haskell whose syntax potentially corresponds to the source-language of an older GHC. In so doing, we can decouple releases of GHC and template-haskell, and have it make breaking changes on its own schedule and implement migration plans.

For instance, let’s again examine the ConP breaking change. This came out in template-haskell-2.18 bundled with GHC-9.2. But consider if we had released a minor version of template-haskell-2.17 that insulated against this change by exporting a pattern synonym that defaulted the new field always to [], and ignored the field when pattern matching. Then users could have upgraded their GHC without even having to change the bound on their template-haskell dependency and their code would have continued to compile. This strategy wasn’t actually possible to implement at the time due to some blockers that we’ve only recently cleared.

This sort of migration strategy would work for type A, B, and C clients, but cause issues for type D clients, who explicitly want the latest syntax tree. We would have to enable some way for them to access this without any compatibility shims, perhaps through a different template-haskell-unstable package.

This example is only one of the possibilities amongst many of how this migration could have been implemented. Another option would be to rename the constructor when adding the new field and keep the name ConP for a compatibility pattern synonym that matches the old definition.

How exactly we implement these should be decided on a case by case basis. The important detail is that by decoupling template-haskell and GHC, we introduce slack into the relationship that makes these sorts of gradual migration plans possible.

Any sudden ecosystem migration is a great burden on maintainers, and can be slow to fully implement. Therefore this strategy is very attractive because it requires no work for downstream maintainers. This complements greater usage of quotes well. Quotes still should probably be used for new Template Haskell code. But by implementing something like this, we avoid putting maintainers into a double bind where they either need to quickly rewrite their code into a different style or suffer from breaking changes. This would give time for the transition to happen more gradually.

These compatibility shims could likely not be maintained indefinitely, so users would benefit from a sliding window of compatibility. They would still need to upgrade their template-haskell package bounds eventually.

This naturally leads to another strategy. We can also expose copies of the syntax tree types that correspond to some well-defined version of the language like Haskell2010. If users move to depending on these interfaces then they can get even stronger stability guarantees. As we would have a moving window of support for GHC versions for each version of the template-haskell interface, but we could support these well-defined versions almost indefinitely. The trade-off is that this would require modifications to downstream code, but still much less modification than rewriting one’s code to use quotes throughout.

Loosening coupling between `template-haskell` and user code🔗

We plan to improve the interface of Template Haskell, so that users aren’t forced into tight coupling.

We are hoping to find the pain points and improve the experience of using Template Haskell quotes. Then users could benefit both from this nicer syntax and the stability benefits it would bring. For instance, we are looking at adding (untyped) pattern quotes that would allow one to pattern match on Template Haskell syntax trees using quotes.

We are planning to improve the experience of deconstructing syntax tree values by introducing named record fields. These would allow users to only look at the fields they care about and mean that their pattern matching code wouldn’t break when new fields are added. This would primarily help type D clients, but would also benefit type C clients who occasionally want to pattern match shallowly on Template Haskell syntax trees.

To improve the experience when reify and friends, we are planning to transition these methods to return types that return some sort of normalised types rather than the syntax tree types. These would be closer to GHC’s Core or System F rather than surface syntax. This is what users want anyway when performing analysis. For instance libraries like th-abstraction and th-desugar already implement forms of normalisation (to differing degrees), and express an appetite for this. We can draw on these libraries to guide our design.

What’s happened so far🔗

My efforts so far have been focused on clearing roadblocks that have blocked changes like these in the past. I have merged two MRs into GHC: !12306 and !12479. These have benefited greatly from reviews and collaboration from Sebastian Graf, Matthew Craven and others. Collectively these two change the relationship between GHC and template-haskell, so that it becomes a library just like any other.

The first of these changes the way the bootstrapping of Template Haskell works. Historically, the process required that the interface of the new template-haskell library matched the interface of the template-haskell library that came with the bootstrapping compiler in certain ways. This made it difficult to refactor template-haskell (see #23536). Small changes could lead to difficult to debug errors. We resolved this through a form of vendoring, allowing two versions of template-haskell in essence to co-exist at the same time. Thus removing the requirement.

The second change moves any identifiers that have a special wired-in status to the ghc-internal package, mirroring the changes to base. This then meant that template-haskell can become a completely normal package that just happens to be where certain wired-in identifiers are conventionally re-exported from.

With these changes implemented, the ideas I sketched out in the previous are now much more easily achievable.

How you can get involved🔗

This is a great stage to get involved. I would love to hear feedback on the plan I’ve sketched, and if you have time on the Plan to Stabilise Template Haskell document.

I would especially like to hear about concrete examples where Template Haskell quotes have been lacking to help guide us to improve this feature. We feel like Template Haskell quotes are vastly underutilised and would love to help remove barriers to people adopting them. We are trying to gather up issues here.

It would also be great to see more people get involved in this process, and maybe help prototype some of these ideas. You don’t have to be a confident GHC developer to take part!

A few of us will be at ZuriHac 2024 where we will be working on improving Template Haskell. Come join us! Sebastian has collated some tickets about improving Template Haskell (not all stability related).

The future is looking bright for Template Haskell. I’m really hoping that very soon we will be able to give strong guarantees to users that upgrading GHC won’t lead to any breakages caused by Template Haskell.

Acknowledgments🔗

Thanks to Adam Gundry, Sebastian Graf, Simon Peyton Jones, and Michael Peyton Jones for reading and commenting on a draft of this post.