Friday, December 28, 2018

Llama JSON

Playing with a pet project, I noticed a relationship between s-expressions and JSON.

The following JSON is also valid Llama:

{
    "hello": null, 
    "isn't": ["it", 2, "conveninent?"]
}

Since

the above is the same as:

("hello" ': 'null "isn't" ': ("it" 2 "convenient?"))

Does that mean we can represent JSON text within Llama without having to escape our quotes?

(Document ((version 1.1) (xmlns "http://mycompany.com"))
  (Widget ((name Fred))
    (_.content
      (json {
          "columns": ["Foo", "Bar", "Baz"],
          "rows": [
              [1, 2, 3],
              [null, null, null]
          ]
      }))))

Ideally that would be the same as:

(Document ((version 1.1) (xmlns "http://mycompany.com"))
  (Widget ((name Fred))
    (_.content
      "{
          \"columns\": [\"Foo\", \"Bar\", \"Baz\"],
          \"rows\": [
              [1, 2, 3],
              [null, null, null]
          ]
      }")))

Values could even be substituted in:

(let ([names ["John" "Paul" "George" "Ringo"]])
  (json {
      "Beatles": names,
      "musicians": names,
      "Brits": names
  }))

yielding:

"{
     \"Beatles\": [\"John\", \"Paul\", \"George\", \"Ringo\"],
     \"musicians\": [\"John\", \"Paul\", \"George\", \"Ringo\"],
     \"Brits\": [\"John\", \"Paul\", \"George\", \"Ringo\"]
}"

Perhaps even computed property names:

(let ([key "Willy Wonka"])
  (json {[key]: "value"}))

It's a happy coincidence that the syntax for computed property names looks like a Llama list. The above expands to:

(json (("Willy Wonka") ': "value"))

It's not hard to imagine that the json form might have a special case for lists in property name position -- just pretend the (singular) contents of the list were there instead.

The one thing that bothers me is Javascript-style unquoted property names:

{
    "quoted": "value",
    unquoted: "value"
}

In Javascript (but notably not in JSON), that's the same as if the second property name were in quotes:

{
    "quoted": "value",
    "unquoted": "value"
}

In particular, in Javascript, even if there's a variable named unquoted, that property name is literally "unquoted", not the value of unquoted; hence the special syntax for computed property names.

The unquoted: case is a challenge for our json feature, because unquoted: is a single Llama token, and it's a valid symbol. It just happens to end with the colon character. I could forbid this case to make things easier, but why not support it? Llama is all about finding the sweet spot between brevity and readability.

And what will json be, exactly, in Llama? Is it a procedure? A macro? A special intrinsic?

I work through these and other questions in the following sections, and then propose a definition for the json form.

It's a Macro

Well, there you have it. It has to be a macro, not a regular procedure. Here's why:

(let ([: "colon lol"])
  (json {"foo": "bar"}))

As perverse as that might seem, it's perfectly valid to bind the symbol named : to some value. If json were a normal procedure, its arguments would be evaluated first, and so the list of arguments going into json would end up being ("foo" "colon lol" "bar"), and we just can't have that. With a macro, though, whatever literally appears as the argument is what is passed in, e.g. the symbol :.

We're not out of the woods yet, though.

Dealing with null

null is a special symbol that has to be dealt with. The trouble is that the symbol null might be bound to some value during evaluation. I think, therefore, it's best to force null always to mean null within a json form. However, it's reasonable to accept it as a valid value after evaluation as well, so that this:

(let ([nada null])
  (json {value: nada}))

yields

"{\"value\": null}"

As an aside, note that the appearance of null in the let binding, above, would need to be 'null (quoted) if null were bound to a value above. That is, while the expression above is fine, the following:

(let ([null "oops!"]
      [nada null])
  (json {value: nada}))

would yield a different answer; namely,

"{\"value\": \"oops!\"}"

In order to refer to the literal null, it has to be prefixed by the quoting character:

(let ([null "oops!"]
      [nada 'null])
  (json {value: nada}))

so that once again the result is "{\"value\": null}".

So, null will be treated literally without evaluation when appearing in a json form, but the value after evaluation will also be accepted.

Distinguishing Arrays from Objects

Trickier even than encountering the colon (:) symbol are the situations we get into supporting unquoted: property names. Is the following a JSON object, or JSON array having two elements?

(foo: "bar")

Well, we have to decide. Fortunately, JSON does not have a concept of "symbols" like most s-expressions do, and so we can forbid them outright in the final output of the json form (except for null).

This means that the example above, (foo: "bar") is a JSON object with one property named "foo" having the value "bar".

Or is it? What if foo: were a name bound to some other value?

(let ([foo: "gotcha"])
  (json (foo: "bar")))

Now we might want this to expand to (json ("gotcha" "bar")), and that looks at lot more like a JSON array having two elements, i.e. ["gotcha", "bar"].

What are we going to do? If the symbol foo: is bound to some value, and we encounter foo: within a json form in a context where it could decide the object-ness of a form, did the programmer intend for it to be the unquoted property named "foo", or did they intend for an array element having the value bound to the name foo:?

It's once again tempting to disallow unqouted property names, as in JSON; but then it seems awkward having the {[computed]: "property names"} borrowed from Javascript without also having the unquoted property names.

One idea that helps is to parse symbols-ending-in-colon as unquoted property names, during macro expansion, before any potential value substitution. This means that

(let ([tricky: "look out!"])
  (json {tricky: "tricks"}))

yields "{\"tricky\": \"tricks\"}" instead of "[\"look out!\" \"tricks\"]". That settles that ambiguity, but still we have a problem if the object or array is empty.

Empty Objects and Arrays

What does (json ()) yield? Is it "[]" or is it "{}"? Remember that the different types of grouping characters are indistinguishable in Llama.

This presents a serious problem -- it reveals that in order to truly represent JSON unambiguously in Llama, we'd need the help of the reader (the parser). The reader knows, after all, which of (, {, or [ it encountered, because it must match it up with the corresponding ), }, or ].

I'm tempted to add this information to the output of the parser. Right now, a datum is represented in the implementation as a Javascript object whose sole property name tells you the type and the value at that propery is the value, e.g.

const listOfNumbers = {list: [{number: "1"}, {number: "2"}, {number: "3"}]},
      aNumber       = {number: "13"},
      aString       = {string: "hello"};

The datum listOfNumbers could have been parsed from any of (1 2 3), [1 2 3], or {1 2 3}, but the parser has jettisoned the distinction.

What if a list datum had an additional property, "suffix"?

const listOfNumbers = {list: [{number: "1"}, {number: "2"}, {number: "3"}],
                       suffix: "]"};

This way, we would know that it was [1 2 3] to begin with.

Doing this would solve the "empty object or array?" problem, at the cost of requiring that the json form be implemented as an intrinsic macro -- macros and procedures written in Llama would not have access to this extra information found in the implementation. Instead, the macro would have to be written in Javascript.

I see no way around it. The parser has to be modified to preserve the distinction among the various flavors of lists. Doing this will require some subtle changes "downstream," as well, since we have to make sure that we don't accidentally consider a list's "suffix" as part of its value. That is, I still want (1 2 3), [1 2 3], and {1 2 3} to be considered equal, except in contexts where the distinction is explicitly relevant, like in the json macro.

That work was done in this commit. In doing so, I accidentally introduced a bug, which I fixed in the following commit.

Numbers and JSON.stringify

There's one more sticky point, before we get into the implementation. When first thinking about the implementation, I thought it would be convenient to have the json form produce a javascript value suitable for JSON serialization by JSON.stringify, so that all I have to do is "unpack" the AST nodes into a form that JSON.stringify understands, and then it would do the serialization for me.

This would work fine, except that the only way to get JSON.stringify to print a number is to give it a Javascript Number. Javascript numbers, though, are always stored in IEEE double precision floating point format.

So what? Double precision floating point is good enough for everybody, right?

No! We must support arbitrary numbers, as defined in the Llama grammar! (Or, for that matter, the JSON grammar)

In order to do this, the textual content of Llama numbers has to be bypassed through the JSON serializer, and since JSON.stringify does not support this (even with its replacer argument!), we have to do our own JSON serialization.

Fortunately, JSON is simple, and also we can still use JSON.stringify for Strings, null, Dates, and any other non-numeric scalars.

Helper Procedure

Before we get into writing the json macro itself, recognize that the job of converting an evaluated list of data (datums) into a string of JSON can be done by a procedure, once the colon business has been taken care of, and so the job of the json macro will be to take care of the colon business and then produce an invocation of this procedure.

The input to the helper procedure will be a Llama datum that has received the following pre-processing by the macro:

So, the job of the helper procedure is to convert, for example, the following Llama (note the lack of colons):

[1, {"foo" "hi" "bar" null}]

whose AST is the following Javascript object:

{
    suffix: ']',
    list: [
        {number: '1'},
        {
            suffix: '}',
            list: [
                {string: 'foo'},
                {string: 'hi'},
                {string: 'bar'},
                {symbol: 'null'}
            ]
        }
    ]
}

into the following Javascript object:

[
    {[numberProperty]: '1'},
    {
        foo: 'hi',
        'bar': null
    }
]

where numberProperty is a special string recognized by the JSON serializer to mean the contained string is to be serialized as a number rather than as a (quoted) string. You can see what I mean in the code.

The helper procedure is called jsonify in the implementation. The only hairy part was walking through a list two elements at a time (Javascript's "splat" (...) operator and recursion helped here).

The json Macro

All that remains to write is the json macro itself, which will prepare its argument for the helper procedure and then expand into an invocation of the helper procedure with the modified argument, i.e.

(json argument)

becomes

((lambda ...) modified-argument)

so that then the evaluator will evaluate modified-argument before applying it to the helper procedure.

The macro-time massaging of the argument happens in the removeColonsFromObjects function, which also does the null quoting (I need to change the name to indicate that...).

After the massaging, the macro expands to an invocation expression, and finally the helper procedure does its work before calling the custom JSON serializer.

With that, we're done! JSON embedded within a lisp, using macros.

Example

Input (Llama):

(pml ((xmlns http://www.proprietary.com/ui)
      (xmlns:pml http://www.proprietary.com/markup))
  (Table ((pml:name tickets))
    (_.dataSource
      (pml:json (json
        (let ([(row ticket status owner desc)
               {ticket: ticket, status: status, owner: owner, desc: desc}])
        {
            columnTitles: ["Ticket", "Status", "Owner", "Description"],
            rows: [
                (row 11333 "open"   "Bob"   "The darn thing doesn't work")
                (row 11334 "closed" "Bob"   "Could you do this for me?")
                (row 11332 "open"   "Alice" "URGENT: label is wrong color")
            ]
        }))))))

Output (XML, after additional formatting):

<pml xmlns="http://www.proprietary.com/ui"
     xmlns:pml="http://www.proprietary.com/markup">
  <Table pml:name="tickets">
    <_.dataSource>
      <pml:json>{
          "columnTitles": ["Ticket", "Status", "Owner", "Description"],
          "rows": [
              {
                  "ticket": 11333,
                  "status": "open",
                  "owner":  "Bob",
                  "desc":   "The darn thing doesn't work"
              },
              {
                  "ticket": 11334,
                  "status": "closed",
                  "owner":"Bob",
                  "desc": "Could you do this for me?"
              },
              {
                  "ticket": 11332,
                  "status": "open",
                  "owner":"Alice",
                  "desc": "URGENT: label is wrong color"
              }
          ]
      }</pml:json>
    </_.dataSource>
  </Table>
</pml>

You can try it out by cloning Llama onto your computer and opening the playground in a web browser.