================
Expression Types
================
:JEP: 8
:Author: James Saryerwinnie
:Status: accepted
:Created: 02-Mar-2013
Abstract
========
This JEP proposes grammar modifications to JMESPath to allow for
expression references within functions. This allows for functions
such as ``sort_by``, ``max_by``, ``min_by``. These functions take
an argument that resolves to an expression type. This enables
functionality such as sorting an array based on an expression that
is evaluated against every array element.
Motivation
==========
A useful feature that is common in other expression languages is the
ability to sort a JSON object based on a particular key. For example,
given a JSON object::
{
"people": [
{"age": 20, "age_str": "20", "bool": true, "name": "a", "extra": "foo"},
{"age": 40, "age_str": "40", "bool": false, "name": "b", "extra": "bar"},
{"age": 30, "age_str": "30", "bool": true, "name": "c"},
{"age": 50, "age_str": "50", "bool": false, "name": "d"},
{"age": 10, "age_str": "10", "bool": true, "name": 3}
]
}
It is not currently possible to sort the ``people`` array by the ``age`` key.
Also, ``sort`` is not defined for the ``object`` type, so it's not currently
possible to even sort the ``people`` array. In order to sort the ``people`` array,
we need to know what key to use when sorting the array.
This concept of sorting based on a key can be generalized. Instead of
requiring a key name, an expression can be provided that each element
would be evaluated against. In the simplest case, this expression would just
be an ``identifier``, but more complex expressions could be used such as
``foo.bar.baz``.
A simple way to accomplish this might be to create a function like this::
sort_by(array arg1, expression)
# Called like:
sort_by(people, age)
sort_by(people, to_number(age_str))
However, there's a problem with the ``sort_by`` function as defined above.
If we follow the function argument resolution process we get::
sort_by(people, age)
# 1. resolve people
arg1 = search(people, ) -> [{"age": ...}, {...}]
# 2. resolve age
arg2 = search(age, ) -> null
sort_by([{"age": ...}, {...}], null)
The second argument is evaluated against the current node and the expression
``age`` will resolve to ``null`` because the input data has no ``age`` key.
There needs to be some way to specify that an expression should evaluate to
an expression type::
arg = search(, ) ->
Then the function definition of ``sort_by`` would be::
sort_by(array arg1, expression arg2)
Specification
=============
The following grammar rules will be updated to::
function-arg = expression /
current-node /
"&" expression
Evaluating an expression reference should return an object of type
"expression". The list of data types supported by a function will now be:
* number (integers and double-precision floating-point format in JSON)
* string
* boolean (``true`` or ``false``)
* array (an ordered, sequence of values)
* object (an unordered collection of key value pairs)
* null
* expression (denoted by ``&expression``)
Function signatures can now be specified using this new ``expression`` type.
Additionally, a function signature can specify the return type of the
expression. Similarly how arrays can specify a type within a list using the
``array[type]`` syntax, expressions can specify their resolved type using
``expression->type`` syntax.
Note that any valid expression is allowed after ``&``, so the following
expressions are valid::
sort_by(people, &foo.bar.baz)
sort_by(people, &foo.bar[0].baz)
sort_by(people, &to_number(foo[0].bar))
Additional Functions
--------------------
The following functions will be added:
sort_by
~~~~~~~
::
sort_by(array elements, expression->number|expression->string expr)
Sort an array using an expression ``expr`` as the sort key.
Below are several examples using the ``people`` array (defined above) as the
given input. ``sort_by`` follows the same sorting logic as the ``sort``
function.
.. list-table:: Examples
:header-rows: 1
* - Expression
- Result
* - ``sort_by(people, &age)[].age``
- [10, 20, 30, 40, 50]
* - ``sort_by(people, &age)[0]``
- {"age": 10, "age_str": "10", "bool": true, "name": 3}
* - ``sort_by(people, &to_number(age_str))[0]``
- {"age": 10, "age_str": "10", "bool": true, "name": 3}
max_by
~~~~~~
::
max_by(array elements, expression->number expr)
Return the maximum element in an array using the expression ``expr`` as the
comparison key. The entire maximum element is returned.
Below are several examples using the ``people`` array (defined above) as the
given input.
.. list-table:: Examples
:header-rows: 1
* - Expression
- Result
* - ``max_by(people, &age)``
- {"age": 50, "age_str": "50", "bool": false, "name": "d"},
* - ``max_by(people, &age).age``
- 50
* - ``max_by(people, &to_number(age_str))``
- {"age": 50, "age_str": "50", "bool": false, "name": "d"},
* - ``max_by(people, &age_str)``
-
* - ``max_by(people, age)``
-
min_by
~~~~~~
::
min_by(array elements, expression->number expr)
Return the minimum element in an array using the expression ``expr`` as the
comparison key. The entire maximum element is returned.
Below are several examples using the ``people`` array (defined above) as the
given input.
.. list-table:: Examples
:header-rows: 1
* - Expression
- Result
* - ``min_by(people, &age)``
- {"age": 10, "age_str": "10", "bool": true, "name": 3}
* - ``min_by(people, &age).age``
- 10
* - ``min_by(people, &to_number(age_str))``
- {"age": 10, "age_str": "10", "bool": true, "name": 3}
* - ``min_by(people, &age_str)``
-
* - ``min_by(people, age)``
-
Alternatives
------------
There were a number of alternative proposals considered. Below outlines
several of these alternatives.
Logic in Argument Resolver
~~~~~~~~~~~~~~~~~~~~~~~~~~
The first proposed choice (which was originally in JEP-3 but later removed) was
to not have any syntactic construct for specifying functions, and to allow the
function signature to dictate whether or not an argument was resolved. The
signature for ``sort_by`` would be::
sort_by(array arg1, any arg2)
arg1 -> resolved
arg2 -> not resolved
Then the argument resolver would introspect the argument specification of a
function to determine what to do. Roughly speaking, the pseudocode would be::
call-function(current-data)
arglist = []
for each argspec in functions-argspec:
if argspect.should_resolve:
arglist <- resolve(argument, current-data)
else
arglist <- argument
type-check(arglist)
return invoke-function(arglist)
However, there are several reasons not to do this:
* This imposes a specific implementation. This implementation would be
challenging in a bytecode VM, as the CALL bytecode will typically
resolve arguments onto the stack and allow the function to then
pop arguments off the stack and perform its own arity validation.
* This deviates from the "standard" model of how functions are
traditionally implemented.
Specifying Expressions as Strings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Another proposed alternative was to allow the expression to be
a string type and to give functions the capability to parse/eval
expressions. The ``sort_by`` function would look like this::
sort_by(people, `age`)
sort_by(people, `foo.bar.baz`)
The main reasons this proposal was not chosen was because:
* This complicates the implementations. For implementations that walk the AST
inline, this means AST nodes need access to the parser. For external tree
visitors, the visitor needs access to the parser.
* This moves what *could* by a compile time error into a run time error. The
evaluation of the expression string happens when the function is invoked.