Monday, January 15, 2024

Don't Use String Templates for Structured Output


['html', foo]


You're writing a web server that does server side rendering of HTML for a website. Pages are mostly static, but they differ slightly from session to session.

I know, I'll write a page.template.html that looks like HTML but contains {{special syntax}} or maybe <%special syntax%>. Then my web server will read the file, replace the special syntax sections with markup based on logic of my choosing, and then serve the resulting string as HTML.

You're writing a Kubernetes controller, and you know that the resources managed by the controller will need their own YAML configurations, but the YAML will have to differ based on the configuration of the controller.

I know, I'll write a deployment.yaml file that isn't actually YAML, but instead is a mix of YAML and Go's text template syntax. I'm using Go already, so I can just tmpl.Execute(output, myData), and voila, myData.Thingy is now part of the YAML where there was once {{.Thingy}}.

You're writing a code generator to automate the more tedious parts of a programmatic interface. Maybe the generated code contains message types for use with an RPC framework in a statically typed language. Maybe the generated code marshals the results of known SQL statements into a library's types. Maybe the generated code encodes a set of types into some serialization format like JSON or Protocol Buffers.

I know, using string templates naively could get out of hand here since we're working with a general purpose programming language. I'll just write escaping functions to make sure that " is escaped when inside of a double-quoted string, and I'll write validation functions to make sure that the name of a variable doesn't include a comment opening sequence (/*).

I can use string templates to generate everything, so long as I'm careful.


Stop doing this. If your goal is to end up with a string containing language X, then you cannot begin with a string. Instead, you must begin with a representation of language X. Then, perform any desired transformations within the representation of language X, and only at the very end render the representation into a string.

It's not overkill. The problem of altering structured data is different from the problem of manipulating strings. Don't use the latter just because it's on hand or because the alterations seem trivial. Find a library that allows you to represent the target language in your programming language of choice, or otherwise write your own.

Once your project is a soup of string templates supporting an increasingly complex mini-language of syntax oblivious interpolation operations, it's too late to replace it with something better. You must start with a representation of the target language.

You don't necessarily need to have a representation that covers the entire language — it need only cover what you will generate. Add bells and whistles later as needed.

My Examples