Why does 'connect' take so long (sometimes) and can't be interrupted?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Why does 'connect' take so long (sometimes) and can't be interrupted?

gazelle
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS:  -DPROGRAM='bash' -DCONF_HOSTTYPE='x86_64' -DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='x86_64-unknown-linux-gnu' -DCONF_VENDOR='unknown' -DLOCALEDIR='/.../local/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H   -I.  -I. -I./include -I./lib   -g -O2 -Wno-parentheses -Wno-format-security
uname output: Linux shell 4.4.0-79-generic #100-Ubuntu SMP Wed May 17 19:58:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-unknown-linux-gnu

Bash Version: 4.4
Patch Level: 0
Release Status: release

Description:
        This is a little complicated and I can't give you full details on how to
        replicate it, since I don't fully understand it myself.  But under certain
        circumstances, the following line takes a very long time to execute:

            exec 5<>/dev/tcp/localhost/12345

        My objections are twofold:
            a) That it takes so long - it should either succeed or file (almost)
                immediately.
            b) When it is running, it is uninterruptable.  None of ^C, ^\, or ^Z, nor
                any signal sent to the bash process (other than SIGKILL) will cause
                it to exit.  Effectively, the only escape is to SIGKILL the bash
                process, which causes the entire shell to be killed.

        More details below.

Repeat-By:
        I am using a bash script to communicate with a program that I wrote (in C)
        using TCPIP.  The C program listens on port 12345 (for example) and the bash
        script connects to it, using the command line shown above.  The actual lines
        in my script are now as follows:

            printf "Elapsed time for this 'exec' ...\t";tme=$(date +%s)
            exec 5<>/dev/tcp/localhost/12345
            echo "$(($(date +%s) - $tme)) seconds."

        Normally, for almost all possible inputs (to the C program), this executes
        immediately (says "0 seconds" elapsed).  But, for one particular input, it
        takes a very long time - in my most recent test, it was 116 seconds (!).
        This problem is 100% repeatable (with the given specific input to the C
        program).

        Note, however, that it does eventually connect.  As far as I can tell, it
        does always eventually connect.

        Needless to say, when I first hit this problem, I assumed it had hung, and
        when I tried to kill it, I ran into the problems described above.

        Also note: In testing this, I found that if I do hit ^C while it is hung,
        then wait long enough, eventually it does exit as shown below:

Elapsed time for this 'exec' ... ^C^C^C^Cbash: connect: Connection refused
bash: /dev/tcp/localhost/12345: Connection refused

Fix:
        Well, I'd like to know why it (sometimes) takes so long.
        Amd it would be nice if you could interrupt it when it does hang.
        Or, alternatively, set a timelimit for the connect().


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why does 'connect' take so long (sometimes) and can't be interrupted?

Eduardo A. Bustamante López
On Thu, Jun 15, 2017 at 10:36 AM,  <[hidden email]> wrote:
[...]
> Description:
>         This is a little complicated and I can't give you full details on how to
>         replicate it, since I don't fully understand it myself.  But under certain
>         circumstances, the following line takes a very long time to execute:
>
>             exec 5<>/dev/tcp/localhost/12345

Attach to the hanged process with gdb, print a backtrace and reply
back, so that we can see where it's stuck.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why does 'connect' take so long (sometimes) and can't be interrupted?

tetsujin
In reply to this post by gazelle

I can't think of what could cause this problem. The TCP connection
code in Bash seems pretty straightforward, and in my experiments I was
able to interrupt it even if it was waiting for the server to accept a
connection, or waiting for an available slot in the listen queue. It's
possible the problem is in your C code. Would you be able to post a
version of it that exhibits the problem?

----- Original Message -----
From: [hidden email]
To:<[hidden email]>
Cc:
Sent:Thu, 15 Jun 2017 09:36:12 -0600
Subject:Why does 'connect' take so long (sometimes) and can't be
interrupted?

Description:
 This is a little complicated and I can't give you full details on how
to
 replicate it, since I don't fully understand it myself. But under
certain
 circumstances, the following line takes a very long time to execute:

 exec 5<>/dev/tcp/localhost/12345

 My objections are twofold:
 a) That it takes so long - it should either succeed or file (almost)
 immediately.
 b) When it is running, it is uninterruptable. None of ^C, ^, or ^Z,
nor
 any signal sent to the bash process (other than SIGKILL) will cause
 it to exit. Effectively, the only escape is to SIGKILL the bash
 process, which causes the entire shell to be killed.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RFE: File Descriptor passing and socket pair creation

tetsujin
In reply to this post by gazelle

Since the subject's come up - difficulties with /dev/tcp, no timeout
feature, somehow a failure to interrupt it, etc.  I want to suggest
an alternative.  I don't propose removing /dev/tcp from Bash (since
it's been there quite a long time, people use it and I'm sure many
people like it), but I propose an alternative approach for providing
similar functionality in the future:

In Unix, there are what's known as "local" and "Unix Domain" sockets.
A nice feature of these is the ability to pass open files between
processes. Integrating that functionality into the shell opens up a
lot of possibilities for integrating other things into the shell
without having to write them as shell "builtin" libraries.
Specifically, if you provide "socketpair" and "recvmsg" as
shell-built-ins, then processes for opening files and file-like
devices can be implemented as external executables.

I wrote an interface to socketpair(), sendmsg(), and recvmsg() as
shell builtins (for Bash and ksh) as part of a library I'm working on
called "shell-pepper". The library is like a platform for delivering
functionality I want to see in the shell, but don't expect to be
accepted by the upstream maintainers. :) With it you could write an
external program to handle the task of opening that TCP connection,
and send the file table entry for the connection over a local socket -
then the connection process COULD run in the background, kind of like
this:

$ enable -f ./socketpair socketpair
$ enable -f ./recvmsg recvmsg
$ socketpair s   # Create a socketpair, store the FDs in $s
$ tcp_connect localhost 12345 >&${s[1]}  &    # Run a program to
create a TCP connection
$ exec {s[1]}<&-    # Close the socket we sent to tcp_connect, so
we can tell when tcp_connect is done with it
$ recvmsg msg tcp_fd <&${s[0]}   # Get a response message, including
file descriptors, from tcp_connect
$ exec {s[0]}<&-    # Alas, attempting to use this syntax on Bash
4.2 will kill your shell!

Of course all the above is way more code than simply

$ exec {fd}<>/dev/tcp/localhost/12345

And most notably, it doesn't allow you to localize the open file
descriptor the way a redirect does:

$ { cmd1; cmd2; etc; } <>/dev/tcp/localhost/49152
$ # TCP connection was closed automatically when the commands above
were finished.

The upshot is that it gives you flexibility: If you want different
types of connections (named socket, etc.), or different connection
options (datagram, timeout, etc.), or authentication as part of
creating the connection, it can be implemented as part of the
executable. Then the whole ugly process of creating the socketpair,
running the command, receiving the message, etc. could be wrapped in a
shell function.

Better yet, look how this plays out in ksh:

$ tcp_connect localhost 12345 | recvmsg msg tcp_fd

...Because "pipes" in Korn Shell are actually socket pairs, (and
because it has the equivalent of shopt "lastpipe" on by default) you
can receive file descriptors from external commands without having to
explicitly create and destroy a socketpair. (Though the use of sockets
in ksh has negative side-effects as well - for instance opening files
in /dev/fd on Linux works for pipes, but not for sockets. Maybe some
alternative could be considered that lends comparable convenience?)

I realize "socketpair()" and FD passing are not universally supported
over all the platforms supported by Bash...  The feature goes back
POSIX-2001, so it should be pretty broadly supported at least.

----- Original Message -----
From: [hidden email]
To:<[hidden email]>
Cc:
Sent:Thu, 15 Jun 2017 09:36:12 -0600
Subject:Why does 'connect' take so long (sometimes) and can't be
interrupted?

 Description:
 This is a little complicated and I can't give you full details on how
to
 replicate it, since I don't fully understand it myself. But under
certain
 circumstances, the following line takes a very long time to execute:

 exec 5<>/dev/tcp/localhost/12345

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why does 'connect' take so long (sometimes) and can't be interrupted?

Chet Ramey
In reply to this post by gazelle
On 6/15/17 11:36 AM, [hidden email] wrote:

> Note, however, that it does eventually connect.  As far as I can tell, it
> does always eventually connect.
>
> Needless to say, when I first hit this problem, I assumed it had hung, and
> when I tried to kill it, I ran into the problems described above.

It's all system calls. If it hangs in `connect', bash has to wait until
the system call returns one way or another.  There is no provision for a
timeout with connect, and any signal (e.g., SIGALRM) that bash tries to
set for a timeout will be deferred until connect completes or fails
anyway.

>
> Also note: In testing this, I found that if I do hit ^C while it is hung,
> then wait long enough, eventually it does exit as shown below:
>
> Elapsed time for this 'exec' ... ^C^C^C^Cbash: connect: Connection refused
> bash: /dev/tcp/localhost/12345: Connection refused

If that's a legit error, the problem might be with the server.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    [hidden email]    http://cnswww.cns.cwru.edu/~chet/

Loading...