Examples¶
Fixing Bugs By Replacing Shell¶
Have you ever encountered shell like this? It arranges to conditionally execute
an if
statement as root on a file server behind a bastion host:
ssh bastion "
if [ \"$PROD\" ];
then
ssh fileserver sudo su -c \"
if grep -qs /dev/sdb1 /proc/mounts;
then
echo \\\"sdb1 already mounted!\\\";
umount /dev/sdb1
fi;
rm -rf \\\"/media/Main Backup Volume\\\"/*;
mount /dev/sdb1 \\\"/media/Main Backup Volume\\\"
\";
fi;
sudo touch /var/run/start_backup;
"
Chances are high this is familiar territory, we’ve all seen it, and those working in infrastructure have almost certainly written it. At first glance, ignoring that annoying quoting, it looks perfectly fine: well structured, neatly indented, and the purpose of the snippet seems clear.
At first glance, is
"/media/Main Backup Volume"
quoted correctly?How will the
if
statement behave if there is a problem with the machine, and, say, the/bin/grep
binary is absent?Ignoring quoting, are there any other syntax problems?
If this snippet is pasted from its original script into an interactive shell, will it behave the same as before?
Can you think offhand of differences in how the arguments to
sudo ...
andssh fileserver ...
are parsed?In which context will the
*
glob be expanded, if it is expanded at all?What will the exit status of
ssh bastion
be ifssh fileserver
fails?
Innocent But Deadly¶
The quoting used is nonsense! At best,
mount
will receive 3 arguments. At worst, the snippet will not parse at all.The
if
statement will treat a missinggrep
binary (exit status 127) the same as if/dev/sdb1
was not mounted at all (exit status 1). Unless the program executing this script is parsingstderr
output, the failure won’t be noticed. Consequently, since the volume was still mounted whenrm
was executed, it got wiped.There is at least one more syntax error present: a semicolon missing after the
umount
command.If you paste the snippet into an interactive shell, the apparently quoted “!” character in the
echo
command will be interpreted as a history expansion.sudo
preserves the remainder of the argument vector as-is, whilessh
concatenates each part into a single string that is passed to the login shell. While quotes appearing within arguments are preserved bysudo
, without additional effort, pairs of quotes are effectively stripped byssh
.As for where the glob is expanded, the answer is I have absolutely no idea without running the code, which might wipe out the backups!
If the
ssh fileserver
command fails, the exit status ofssh bastion
will continue to indicate success.Depending in which environment the
PROD
variable is set, either it will always evaluate to false, because it was set by the bastion host, or it will do the right thing, because it was set by the script host.
Golly, we’ve managed to hit at least 8 potentially mission-critical gotchas in only 14 lines of code, and they are just those I can count! Welcome to the reality of “programming” in shell.
In the end, superficial legibility counted for nothing, it’s 4AM, you’ve been paged, the network is down and your boss is angry.
Shell Quoting Madness¶
Let’s assume on first approach that we really want to handle those quoting
issues. I wrote a little Python script based around the shlex.quote()
function to construct, to the best of my knowledge, the quoting required for
each stage:
ssh bastion '
if [ "$PROD" ];
then
ssh fileserver sudo su -c '"'"'
if grep -qs /dev/sdb1 /proc/mounts;
then
echo "sdb1 already mounted!";
umount /dev/sdb1
fi;
rm -rf "/media/Main Backup Volume"/*;
mount /dev/sdb1 "/media/Main Backup Volume"
'"'"';
fi;
sudo touch /var/run/start_backup
'
Even with Python handling the heavy lifting of quoting each shell layer, and
even if the aforementioned minor disk-wiping issue was fixed, it is still not
100% clear that argument handling rules for all of su
, sudo
, ssh
,
and bash
are correctly respected.
Finally, if any login shell involved is not bash
, we must introduce
additional quoting in order to explicitly invoke bash
at each stage,
causing an explosion in quoting:
ssh bastion 'bash -c '"'"'if [ "$PROD" ]; then ssh fileserver bash -c '"'"'
"'"'"'"'"'"'sudo su -c '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"
'bash -c '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"
'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'
"'"'"'"'"'"'"'"'"'"'if grep -qs /dev/sdb1 /proc/mounts; then echo "sdb1 alr
eady mounted!"; umount /dev/sdb1 fi; rm -rf "/media/Main Backup Volume"/*;
mount /dev/sdb1 "/media/Main Backup Volume"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"
'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'
"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"'"'"'
"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"'; fi; sudo touch /var/run/
start_backup'"'"''
There Is Hope¶
We could instead express the above using Mitogen:
import shutil, os, subprocess
import mitogen
def run(*args):
return subprocess.check_call(args)
def file_contains(s, path):
with open(path, 'rb') as fp:
return s in fp.read()
@mitogen.main()
def main(router):
device = '/dev/sdb1'
mount_point = '/media/Media Volume'
bastion = router.ssh(hostname='bastion')
bastion_sudo = router.sudo(via=bastion)
if PROD:
fileserver = router.ssh(hostname='fileserver', via=bastion)
if fileserver.call(file_contains, device, '/proc/mounts'):
print('{} already mounted!'.format(device))
fileserver.call(run, 'umount', device)
fileserver.call(shutil.rmtree, mount_point)
fileserver.call(os.mkdir, mount_point, 0777)
fileserver.call(run, 'mount', device, mount_point)
bastion_sudo.call(run, 'touch', '/var/run/start_backup')
In which context must the
PROD
variable be defined?On which machine is each step executed?
Are there any escaping issues?
What will happen if the
grep
binary is missing?What will happen if any step fails?
What will happen if any login shell is not
bash
?
Recursively Nested Bootstrap¶
This demonstrates the library’s ability to use slave contexts to recursively proxy connections to additional slave contexts, with a uniform API to any slave, and all features (function calls, import forwarding, stdio forwarding, log forwarding) functioning transparently.
This example uses a chain of local contexts for clarity, however SSH and sudo contexts work identically.
nested.py:
import os
import mitogen
@mitogen.main()
def main(router):
mitogen.utils.log_to_file()
context = None
for x in range(1, 11):
print('Connect local%d via %s' % (x, context))
context = router.local(via=context, name='local%d' % x)
context.call(subprocess.check_call, ['pstree', '-s', 'python', '-s', 'mitogen'])
Output:
$ python nested.py
Connect local1 via None
Connect local2 via Context(1, 'local1')
Connect local3 via Context(2, 'local2')
Connect local4 via Context(3, 'local3')
Connect local5 via Context(4, 'local4')
Connect local6 via Context(5, 'local5')
Connect local7 via Context(6, 'local6')
Connect local8 via Context(7, 'local7')
Connect local9 via Context(8, 'local8')
Connect local10 via Context(9, 'local9')
18:14:07 I ctx.local10: stdout: -+= 00001 root /sbin/launchd
18:14:07 I ctx.local10: stdout: \-+= 08126 dmw /Applications/iTerm.app/Contents/MacOS/iTerm2
18:14:07 I ctx.local10: stdout: \-+= 10638 dmw /Applications/iTerm.app/Contents/MacOS/iTerm2 --server bash --login
18:14:07 I ctx.local10: stdout: \-+= 10639 dmw bash --login
18:14:07 I ctx.local10: stdout: \-+= 13632 dmw python nested.py
18:14:07 I ctx.local10: stdout: \-+- 13633 dmw mitogen:dmw@Eldil.local:13632
18:14:07 I ctx.local10: stdout: \-+- 13635 dmw mitogen:dmw@Eldil.local:13633
18:14:07 I ctx.local10: stdout: \-+- 13637 dmw mitogen:dmw@Eldil.local:13635
18:14:07 I ctx.local10: stdout: \-+- 13639 dmw mitogen:dmw@Eldil.local:13637
18:14:07 I ctx.local10: stdout: \-+- 13641 dmw mitogen:dmw@Eldil.local:13639
18:14:07 I ctx.local10: stdout: \-+- 13643 dmw mitogen:dmw@Eldil.local:13641
18:14:07 I ctx.local10: stdout: \-+- 13645 dmw mitogen:dmw@Eldil.local:13643
18:14:07 I ctx.local10: stdout: \-+- 13647 dmw mitogen:dmw@Eldil.local:13645
18:14:07 I ctx.local10: stdout: \-+- 13649 dmw mitogen:dmw@Eldil.local:13647
18:14:07 I ctx.local10: stdout: \-+- 13651 dmw mitogen:dmw@Eldil.local:13649
18:14:07 I ctx.local10: stdout: \-+- 13653 dmw pstree -s python -s mitogen
18:14:07 I ctx.local10: stdout: \--- 13654 root ps -axwwo user,pid,ppid,pgid,command