More shell tricks: first class lists, jq, and the es shell

Preamble

It’s not a secret that most common shells don’t have first class lists.

Sure, you can pass a list to a program by passing each element in argv (e.g. with "$@" or "${list_variable[@]}"), but what if you want to return a list?

There are a couple of options.

The challenge

As a more practical example of this, let’s implement split-by-double-dash, a function (or a program) that would return two lists: args that come before -- and ones that come after. That’s a common format used by command line utilities.

So, for example, split-by-double-dash a b c -- d e f should return the lists [a, b, c] and [d, e, f], somehow.

One possible use case for this would be a wrapper around a utility that accepts arguments in this form.

The quoting way with jq

Some utilities can return text quoted in the syntax of shell. For examples, getopt and jq can do that. getopt can indeed help us solve the problem of parsing cli options, but we’ll focus on jq here.

Side note: jq is an amazing utility (in fact, a programming language) that can manipulate structured data in all kinds of ways.

Here’s an example:

jq -nr '["foo", "baz", "baz quux"] | @sh'
'foo' 'baz' 'baz quux'

We ask jq to turn a list into a string suitable for consumption with shell’s eval. -n tells that there’s no input to read, -r tells jq to output text, not JSON. So let’s see how to use the output in the shell:

eval set -- "$(jq -nr '["foo", "bar", "baz quux"] | @sh')"
# Now, let's verify that arguments has been passed correctly, by printing each one on a separate line with printf.
printf 'arg: %s\n' "$@"
# Output:
# arg: foo
# arg: bar
# arg: baz quux

You might think: “wait, you can’t use eval”. But that’s actually safe, since jq does the escaping for us. Actually, I don’t think there’s a way to do this without eval, since the list is dynamic and has to be interpreted by the shell.

In fact, lists can be nested, which is what we want, since we want to return two lists. The outputs gets a bit unwieldy at this point though:

jq -nr '[["foo", "bar"], ["baz", "baz quux"]] | map(@sh) | @sh'
# Output:
# ''\''foo'\'' '\''bar'\''' ''\''baz'\'' '\''baz quux'\'''

What happens with | map(@sh) | @sh is that we first apply quoting to each of the lists to get a list of strings, and then we quote that.

But, if we want to implement split-by-double-dash, we need a way to process the arguments passed to jq. Can we do this? Of course we can.

jq '$ARGS.positional' -n --args -- a b c
# Output: ["a", "b", "c"]

In fact, jq has much more (check out the manual), so I’ll skip right ahead to the solution. We’ll use the #!/usr/bin/env shebang and we’ll pass -f to jq to make this a proper script:

#!/usr/bin/env -S jq -nrf --args --
# Usage: eval set -- "$(split-by-double-dash ...)"
# Or, in Bash: eval array=("$(split-by-double-dash ...)")
$ARGS.positional |
reduce .[] as $arg (
  {
    before: [],
    after: [],
    found: false,
  };
  if .["found"] then
    . | .after += [$arg]
  elif $arg == "--" then
    . | .found = true
  else
    . | .before += [$arg]
  end
) |
{ before, after } |
map(@sh) |
@sh

Here’s how to use this (in Bash, but you can use this in sh as well):

eval array=("$(./split-by-double-dash a b c -- d e f)")
eval before=("${array[0]}")
eval after=("${array[1]}")
# Example:
printf '%s\n' "${before[@]}"
# Output:
# a
# b
# c

The closure way with the es shell

The es shell is a descendant of the rc Plan 9 shell, influenced by Tcl and Scheme. What’s interesting for us about it is that it has first class functions and structured returns. We can combine these to return multiple lists from functions, without all the quoting.

Emulating a list

The irony is that es still doesn’t have first class lists per se, since you can’t store lists in lists, only strings. But turns out that it’s easy to emulate them with closures.

fn list { return @ _ { return $* } }

What’s happening here?

We define a function (called list), which takes an arbitrary number of arguments, which are stored in the list $* (similar to $@ in sh). Then, we return a different function, which ignores its argument (it takes an argument named _ in order not to clobber $*), when called, returns the list that was originally given. It’s essentially a closure (a function that captures its environment).

Let’s verify that this works.

list a b c
# (Nothing is printed.)

Oops! Turns out that by default we only see the “stdout” of what’s called, so return does nothing. To get the value returned by return, we need to use <=.

<= { list a b c }

Still nothing. Let’s try giving this to printf.

printf '%s\n' <= { list a b c }
# Output: %closure(*=a b c)@ _{return $*}

That’s better! So, that’s the representation of the closure in es. The variable * (which represents arguments passed) is bound to the list a b c, which is returned on call. In order to get the elements of the list, we need to call it again.

printf '%s\n' <= <= { list a b c }
# Output:
# a
# b
# c

The rest of the owl

So, given that we have first class lists, let’s draw “the rest of the owl”. This solution will be a little different from the jq one. Instead of accumulating args before the double dash, we will look for the double dash and create the two lists at that moment.

fn list { return @ _ { return $* } }

fn split-by-double-dash {
  let (
    # Before is set to `$*` by default, unless we find `--`.
    before = <= { list $* }
    after = <= { list }
    # This is a hack to avoid shelling-out to `expr` to increment `i` on each iteration.
    indicies = `{ seq $#* }
  ) {
    # `$*($i)` is taking the i-th element from `$*`.
    for (i = $indicies) if { ~ $*($i) -- } {
      # If we have found --, it's time to split.
      before = <= {
        # This little dance is needed in case `--` is the first argument.
        if { ~ $i 1 } { list } {
          # The "..." syntax is used for slicing a list.
          # We use it to skip everything past `--`.
          list $*(... `{ expr $i - 1 })
        }
      }

      after = <= {
        list $*(`{expr $i + 1} ...)
      }
    }
    return $before $after
  }
}

Given that you have es installed, here’s how to use this:

(before after) = <= { split-by-double-dash a b c -- d e f }

printf '%s\n' <=$before
# Output:
# a
# b
# c

printf '%s\n' <=$after
# Output:
# d
# e
# f

What’s interesting is that (before after) works since you can return lists. It’s just that you can’t nest them, by default.

Conclusion

Shell scripting can be incredibly cursed. And there are many pitfalls, of course. Still, there’s something beautiful in shells and there are often ways to do what initially seems impossible, especially with tools like jq.

I’ll quote Rich’s sh (POSIX shell) tricks to end this:

I am a strong believer that Bourne-derived languages are extremely bad, on the same order of badness as Perl, for programming, and consider programming sh for any purpose other than as a super-portable, lowest-common-denominator platform for build or bootstrap scripts and the like, as an extremely misguided endeavor