It’s not a secret that most common shells don’t have first class lists.
Sure, you can pass a list to a program by passing each element in
argv (e.g. with "$@"
or
"${list_variable[@]}"
), but what if you want to
return a list?
There are a couple of options.
As a more practical example of this, let’s implement
split-by-double-dash
, a function (or a program) that would
return two lists: args that come before --
and ones that
come after. That’s a common format used by command line utilities.
So, for example, split-by-double-dash a b c -- d e f
should return the lists [a, b, c]
and
[d, e, f]
, somehow.
One possible use case for this would be a wrapper around a utility that accepts arguments in this form.
Some utilities can return text quoted in the syntax of shell. For
examples, getopt
and jq can
do that. getopt
can indeed help us solve the problem of
parsing cli options, but we’ll focus on jq
here.
Side note:
jq
is an amazing utility (in fact, a programming language) that can manipulate structured data in all kinds of ways.
Here’s an example:
jq -nr '["foo", "baz", "baz quux"] | @sh'
'foo' 'baz' 'baz quux'
We ask jq
to turn a list into a string suitable for
consumption with shell’s eval
. -n
tells that
there’s no input to read, -r
tells jq
to
output text, not JSON. So let’s see how to use the output in the
shell:
eval set -- "$(jq -nr '["foo", "bar", "baz quux"] | @sh')"
# Now, let's verify that arguments has been passed correctly, by printing each one on a separate line with printf.
printf 'arg: %s\n' "$@"
# Output:
# arg: foo
# arg: bar
# arg: baz quux
You might think: “wait, you can’t use eval”. But that’s actually
safe, since jq
does the escaping for us. Actually, I don’t
think there’s a way to do this without eval, since the list is dynamic
and has to be interpreted by the shell.
In fact, lists can be nested, which is what we want, since we want to return two lists. The outputs gets a bit unwieldy at this point though:
jq -nr '[["foo", "bar"], ["baz", "baz quux"]] | map(@sh) | @sh'
# Output:
# ''\''foo'\'' '\''bar'\''' ''\''baz'\'' '\''baz quux'\'''
What happens with | map(@sh) | @sh
is that we first
apply quoting to each of the lists to get a list of strings, and then we
quote that.
But, if we want to implement split-by-double-dash
, we
need a way to process the arguments passed to jq
. Can we do
this? Of course we can.
jq '$ARGS.positional' -n --args -- a b c
# Output: ["a", "b", "c"]
In fact, jq
has much more (check out the manual), so
I’ll skip right ahead to the solution. We’ll use the
#!/usr/bin/env
shebang and
we’ll pass -f
to jq
to make this a proper
script:
#!/usr/bin/env -S jq -nrf --args --
# Usage: eval set -- "$(split-by-double-dash ...)"
# Or, in Bash: eval array=("$(split-by-double-dash ...)")
$ARGS.positional |
reduce .[] as $arg (
{
before: [],
after: [],
found: false,
};
if .["found"] then
. | .after += [$arg]
elif $arg == "--" then
. | .found = true
else
. | .before += [$arg]
end
) |
{ before, after } |
map(@sh) |
@sh
Here’s how to use this (in Bash, but you can use this in
sh
as well):
eval array=("$(./split-by-double-dash a b c -- d e f)")
eval before=("${array[0]}")
eval after=("${array[1]}")
# Example:
printf '%s\n' "${before[@]}"
# Output:
# a
# b
# c
The es
shell is a descendant of the rc
Plan 9 shell, influenced by
Tcl and Scheme. What’s interesting for us about it is that it has first
class functions and structured returns. We can combine these to return
multiple lists from functions, without all the quoting.
The irony is that es
still doesn’t have first class
lists per se, since you can’t store lists in lists, only strings. But
turns out that it’s easy to emulate them with closures.
fn list { return @ _ { return $* } }
What’s happening here?
We define a function (called list
), which takes an
arbitrary number of arguments, which are stored in the list
$*
(similar to $@
in sh
). Then,
we return a different function, which ignores its argument (it takes an
argument named _
in order not to clobber $*
),
when called, returns the list that was originally given. It’s
essentially a closure (a function that captures its environment).
Let’s verify that this works.
list a b c
# (Nothing is printed.)
Oops! Turns out that by default we only see the “stdout” of what’s
called, so return
does nothing. To get the value returned
by return
, we need to use <=
.
<= { list a b c }
Still nothing. Let’s try giving this to printf
.
printf '%s\n' <= { list a b c }
# Output: %closure(*=a b c)@ _{return $*}
That’s better! So, that’s the representation of the closure in
es
. The variable *
(which represents arguments
passed) is bound to the list a b c
, which is returned on
call. In order to get the elements of the list, we need to call it
again.
printf '%s\n' <= <= { list a b c }
# Output:
# a
# b
# c
So, given that we have first class lists, let’s draw “the rest of the
owl”. This solution will be a little different from the jq
one. Instead of accumulating args before the double dash, we will look
for the double dash and create the two lists at that moment.
fn list { return @ _ { return $* } }
fn split-by-double-dash {
let (
# Before is set to `$*` by default, unless we find `--`.
before = <= { list $* }
after = <= { list }
# This is a hack to avoid shelling-out to `expr` to increment `i` on each iteration.
indicies = `{ seq $#* }
) {
# `$*($i)` is taking the i-th element from `$*`.
for (i = $indicies) if { ~ $*($i) -- } {
# If we have found --, it's time to split.
before = <= {
# This little dance is needed in case `--` is the first argument.
if { ~ $i 1 } { list } {
# The "..." syntax is used for slicing a list.
# We use it to skip everything past `--`.
list $*(... `{ expr $i - 1 })
}
}
after = <= {
list $*(`{expr $i + 1} ...)
}
}
return $before $after
}
}
Given that you have es
installed, here’s how to use
this:
(before after) = <= { split-by-double-dash a b c -- d e f }
printf '%s\n' <=$before
# Output:
# a
# b
# c
printf '%s\n' <=$after
# Output:
# d
# e
# f
What’s interesting is that (before after)
works since
you can return lists. It’s just that you can’t nest them, by
default.
Shell scripting can be incredibly cursed. And there are many pitfalls, of
course. Still, there’s something beautiful in shells and there are often
ways to do what initially seems impossible, especially with tools like
jq
.
I’ll quote Rich’s sh (POSIX shell) tricks to end this:
I am a strong believer that Bourne-derived languages are extremely bad, on the same order of badness as Perl, for programming, and consider programming sh for any purpose other than as a super-portable, lowest-common-denominator platform for build or bootstrap scripts and the like, as an extremely misguided endeavor