Easy methods to Lose Management of your Shell

A couple of weeks in the past I used to be hacking on language server assist in Zed, attempting to get Zed to detect when a given language server binary, reminiscent of gopls
, is already current in $PATH
. In that case, it ought to use that as a substitute of downloading a brand new binary.
The problem: $PATH
is usually dynamically modified by instruments reminiscent of direnv
, asdf
, mise
and others, which let you set a selected $PATH
in a given folder. (Why do these instruments do this? As a result of it provides you the flexibility to, say, prepend ./my_custom_binaries
to $PATH
once you’re in my-cool-project
.) So we are able to’t simply use the $PATH
related to the Zed course of, we want the $PATH
as it’s once you cd
into your undertaking listing.
Straightforward, I assumed. Simply launch a $SHELL
, cd
into the undertaking to set off direnv
and whathaveyou, run env
, retailer the surroundings, select $PATH
, discover binaries in there.
And straightforward it was. Right here’s a few of the code, the half that launches $SHELL
, cd
s and will get the env
:
fn load_shell_environment(dir: &Path) -> Consequence<HashMap<String, String>> {
// Get the $SHELL
let shell = std::env::var("SHELL")?;
// Assemble the command we wish the $SHELL to execute
let command = format!("cd {:?}; /usr/bin/env -0;", dir);
// Launch the $SHELL as an interactive shell (so the person's rc recordsdata are used)
// and execute `command`:
let output = std::course of::Command::new(&shell)
.args(["-i", "-c", &command])
.output()?;
// [... check exit code, get stdout, turn stdout into HashMap, etc. ...]
}
Aside from one factor: after beginning a Zed occasion in my terminal that executed this operate, I might now not kill Zed by hitting Ctrl-C
.
What?
I might spam the terminal with ^C
and nothing occurred. Traces and contours of determined ^C
s that by no means hear their very own echo.
How? Why? … What?
After saying “What?” 20 occasions and hitting Ctrl-c
much more, I requested Piotr for assist, as a result of I wasn’t 100% assured in how Rust spawns processes and he’s a Rust wizard. What I did know was that there must be fork
and exec
syscalls someplace inside std::course of::Command
however I wasn’t certain whether or not Rust doesn’t do one thing intelligent with the sign handlers or has default sign handlers setup that mess with Ctrl-c
. As a result of Ctrl-c
ought to lead to an interrupt signal being sent to the processes which ought to trigger it to terminate, however clearly that stopped working.
We began to poke at every kind of issues to check every kind of hypotheses, as outlandish as they may be.
Are we certain that the shell just isn’t working anymore? Sure, we’re, as a result of .output()
up there solely returns as soon as the command has completed working.
Is that this about cd
? Do direnv
or asdf
or different instruments fireplace some hooks that take management of the terminal? No, seems after we simply run /usr/bin/env -0;
with out cd
it additionally takes management over the shell.
So is it the -0
that we go to env
? It shouldn’t be, clearly, as a result of that’s simply formatting. However: determined occasions breed desparate debugging makes an attempt. So we tried it and it wasn’t -0
both.
Wait, is it env
? Does it do one thing bizarre with my terminal? Huh.
So we modified the command
from
let command = format!("/usr/bin/env;");
to
let command = format!("echo lol");
… and guess what? Ctrl-c
labored once more.
What?
Okay, one other try. What if we do each?
let command = format!("/usr/bin/env; echo lol");
That additionally labored. WHAT!
Okay, wait a second… my intestine is telling me one thing. /usr/bin/env
isn’t a shell built-in, is it? However echo
is. Is {that a} clue?
Let’s do that one:
let command = format!("ls");
Good outdated ls
. Most likely the command I’ve ran probably the most in my life. It’s at all times there once I want it and on each machine I acquire entry to I instantly run ls
simply to see that it really works. I’d belief ls
with my life.
And but: after working ls
in that subshell, Ctrl-c
stopped working. Et tu, ls
?
Subsequent speculation: is it one thing in Zed? Can we setup some sign handlers? Let’s discover out. We copied the operate to a brand new, bare-bones Rust undertaking, ran it and… it reproduced. Ctrl-c
stopped working in that undertaking too.
Okay, is it Rust then? I rewrote the operate to Go and in Go too Ctrl-c
misplaced management.
At this level we had spent practically 2 hours on this and couldn’t determine it out. However we did have a workaround:
let command = format!("/usr/bin/env; exit 0;");
exit
is a built-in in all of the totally different shells, so it’s protected to run and it fixes the issue. Okay, truthful sufficient. We slapped one hell of a remark above that line to let the subsequent particular person to return alongside know that the exit 0
is now load-bearing and moved on.
However this puzzle received to me. I requested fellow shell-nerds whether or not they know what’s occurring however nobody had a solution prepared. So in my mornings I began to research.
I setup a repository during which a small Rust program reproduced the issue: it spawns a shell course of, waits for it to exit, then idles for five seconds so I can check whether or not Ctrl-c
nonetheless works. The hunt was on.
The primary massive mild bulb second got here once I realized that I don’t need to ship a sign through Ctrl-c
: I can use the kill
command. And, alas, it’s not the sign dealing with that’s borked! After I used kill -INT
the sign arrived and the method stopped. It’s not that my course of doesn’t react to alerts anymore, however quite that Ctrl-c
doesn’t ship the appropriate alerts after launching the shell course of.
Subsequent try: is the terminal said borked after launching the shell? Okay, so one thing in regards to the terminal state. Someone in the tweet replies did level me to stty
, which helps you to set choices in your terminal gadget, such because the baud price (sure) and different issues. I modified my program to run stty -a
earlier than and after the shell course of. No luck: no modifications within the output.
Determined, I additionally used Ghostty’s terminal inspector to see whether or not some state modifications within the terminal that ends in Ctl-c
going up in smoke. However no luck there both.
After days of going backwards and forwards on this with ChatGPT (which I wrote about the last time) it lastly gave me a clue:
The spawned shell inherits the terminal (TTY) management, and because it’s an interactive shell (-i flag), it units itself because the foreground course of group chief for the terminal. This modifications how alerts, particularly SIGINT generated by Ctrl-C, are dealt with.
Huh. Foreground course of group chief. Attention-grabbing. Hmmm. Right here’s what Advanced Programming in the Unix Environment (APUE), which I pulled out as we speak whereas scripting this, says on course of teams:
A course of group is a group of a number of processes, normally related to the identical job (job management is mentioned in Part 9.8), that may obtain alerts from the identical terminal. Every course of group has a novel course of group ID. Course of group IDs are just like course of IDs: they’re optimistic integers and may be saved in a
pid_t
knowledge kind. The operategetpgrp
returns the method group ID of the calling course of.
The vital half: “that may obtain alerts from the identical terminal.”

APUE has extra clues:
It’s doable for a course of group chief to create a course of group, create processes within the group, after which terminate.
So is that what occurs? The shell spawns, claims it’s the method group chief when it doesn’t run a built-in command, exits, after which doesn’t restore the earlier course of group chief?
It felt like I used to be getting nearer. So I saved asking ChatGPT verify this and it led me to tcgetprg
:
The operate
tcgetpgrp()
returns the method group ID of the foreground course of group on the terminal related tofd
, which have to be the controlling terminal of the calling course of.
Okay, now we’re speaking, this sounds prefer it could lead on us someplace. I requested ChatGPT to generate me some Rust code for that tcgetpgrp
name:
fn get_process_group_id(fd: i32) -> io::Consequence<libc::pid_t> {
let pgid = unsafe { libc::tcgetpgrp(fd) };
if pgid == -1 {
Err(io::Error::last_os_error())
} else {
Okay(pgid)
}
}
I plugged that into my program so it might print the method group ID related to STDIN (file descriptor 0
) earlier than and after the $SHELL
course of has run. That is what it printed:
course of group earlier than: 54530
shell exited with standing: exit standing: 0
course of group after: 54571
Properly, hiya there! This actually appears to be like just like the homicide weapon. How can I verify that it is what kills my Ctrl-c
although? Is there a way I might cease the shell from taking on as course of group chief? ChatGPT stated that I might use the pre_exec
hook on std::course of::Command
to place the shell course of in a brand new, separate course of session, which is able to put it in a brand new course of group, which in flip means it gained’t be capable to turn into the method group chief of the group related to STDIN. Like this:
let cmd = std::course of::Command::new("/bin/zsh");
cmd.args(["-i", "-c", "/usr/bin/env"]);
// Set a hook that will probably be executed proper after `fork`, however earlier than `exec`:
unsafe {
cmd.pre_exec(|| {
if libc::setsid() == -1 {
return Err(std::io::Error::last_os_error());
}
Okay(())
});
}
// Run the command
let output = cmd.output().unwrap();
Proper there, within the center: setsid
. That’s known as proper after we create a brand new course of with fork
however earlier than that course of is became $SHELL
.
APUE on what occurs when a course of calls setsid
:
The method turns into the session chief of this new session. […]
The method turns into the method group chief of a brand new course of group. […]
The method has no controlling terminal. […] If the method had a controlling terminal earlier than calling
setsid
, that affiliation is damaged.`
That is sensible. By calling setsid
it might break any affiliation the newly-spawned shell course of has with the terminal and that would assist me verify whether or not the shell mucking with the method teams chief is the issue.
And — growth! fireworks! loud noises! a small little one saying: “ta-da!” — with the pre_exec
hook that is what this system printed:
course of group earlier than: 54530
shell exited with standing: exit standing: 0
course of group after: 54530
And Ctrl-C
nonetheless labored!
The foreground course of group ID is the homicide weapon. At this level it was clear what occurs: the shell that’s spawned takes management of the terminal, by setting the foreground course of group ID, which suggests the sign ensuing from Ctrl-C
is distributed to the shell course of. But when the shell runs a non-built-in command as its final command, it doesn’t clear up after itself and its course of ID stays related to the terminal, resulting in all of our Ctrl-C
s ending up within the void.
With that What? the subsequent query is: why?
Why does ZSH (the shell with which this occurred for me) not reset the foreground course of group chief when it runs a non-built in command?
On my Linux machine I ran strace -f
to see which syscalls my course of and, extra importantly, its little one processes (together with the spawned shell) have been making. What I might determine was this:
When zsh
is run with -c
and the final command in that handed command is a non-built-in, reminiscent of ls
or env
, then ZSH execve
s into that final course of. Which means: it doesn’t create a toddler course of to run ls
. No, as a substitute it turns itself into that command. Meaning on the cut-off date when ls
is run in zsh -c 'echo lol; ls'
the zsh
course of is gone and became ls
and there’s nobody left to reset the foreground course of group chief.
However once you run zsh -c '/usr/bin/env; echo lol'
, i.e.: first non-built-in, then built-in, then ZSH doesn’t disappear. It forks
and execs
/usr/bin/env
after which executes the echo lol
and, someplace in there, cleans up the foreground course of group chief.
Now, pay attention. I want I might proceed right here and finish with “… and this is why ZSH does it that method!” and somebody would lastly PayPal me $100 with the message “thanks in your publication”, however I’ve to disappoint you.
I don’t know the way and why precisely ZSH does what it does. I cloned the repo, I compiled it, I attempted to run it from supply, however in some way failed and man cmake
is rather a lot and likewise the folders have names like Src
and Doc
and who the hell capitalizes the primary letter in a folder identify and there’s additionally a ./configure
you must run after which you’ll want to be certain that it doesn’t use your system library and… You see this shell investigation stuff isn’t straightforward and I gave up, sorry.
What I did find though is that ZSH does actively set the method group id for job management. And it additionally remembers the original one and resets it. However I gave up once I noticed this part here that does job management stuff in ZSH and realized that I’m not getting paid for this.
I await your letters with the reason.