mapfile doesn't accept input from a pipe

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

mapfile doesn't accept input from a pipe

Keith Thompson-3
Configuration Information [Automatically generated, do not change]:
Machine: x86_64
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -g -O2 -Wno-parentheses -Wno-format-security
uname output: Linux bomb20 4.8.0-46-generic #49-Ubuntu SMP Fri Mar 31
13:57:14 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Machine Type: x86_64-unknown-linux-gnu

Bash Version: 4.4
Patch Level: 12
Release Status: maint

Description:
        The "mapfile" command works correctly if stdin is redirected
from a file, but not if it's from a pipe.
        Demonstrated with several versions including the latest
bash-snap-20170626

Repeat-By:
        This script demonstrates the problem:
        __CUT_HERE__
        #!/bin/bash

        printf 'one\ntwo\nthree\n' > /tmp/input.txt

        mapfile REDIRECT < /tmp/input.txt
        cat /tmp/input.txt | mapfile PIPE

        echo "\$REDIRECT has ${#REDIRECT[@]} elements"
        echo "\$PIPE     has ${#PIPE[@]} elements"

        if [ ${#REDIRECT[@]} -eq 3 ] && [ ${#PIPE[@]} -eq 3 ] ; then
            echo PASS
            exit 0
        else
            echo FAIL
            exit 1
        fi
        __AND_HERE__

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mapfile doesn't accept input from a pipe

Greg Wooledge
On Wed, Jun 28, 2017 at 07:08:27PM -0700, Keith Thompson wrote:
> Description:
>         The "mapfile" command works correctly if stdin is redirected
> from a file, but not if it's from a pipe.

This is because each command in a pipeline is executed in its own subshell.
Not a bug.

If you need to read input from a command with mapfile (or read), use a
process substitution.

>         cat /tmp/input.txt | mapfile PIPE

mapfile PIPE < /tmp/input.txt
mapfile PIPE < <(a real program, not cat)

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mapfile doesn't accept input from a pipe

Eduardo A. Bustamante López
In reply to this post by Keith Thompson-3
On Wed, Jun 28, 2017 at 07:08:27PM -0700, Keith Thompson wrote:
[...]
>         mapfile REDIRECT < /tmp/input.txt
>         cat /tmp/input.txt | mapfile PIPE

The `mapfile PIPE' is a piece of a pipeline, and as such, it runs in a
subshell (different process).

See: http://mywiki.wooledge.org/BashFAQ/024

--
Eduardo Bustamante
https://dualbus.me/

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mapfile doesn't accept input from a pipe

Keith Thompson-3
On Thu, Jun 29, 2017 at 6:56 AM, Eduardo A. Bustamante López
<[hidden email]> wrote:
> On Wed, Jun 28, 2017 at 07:08:27PM -0700, Keith Thompson wrote:
> [...]
>>         mapfile REDIRECT < /tmp/input.txt
>>         cat /tmp/input.txt | mapfile PIPE
>
> The `mapfile PIPE' is a piece of a pipeline, and as such, it runs in a
> subshell (different process).
>
> See: http://mywiki.wooledge.org/BashFAQ/024

OK, that makes sense, and of course the same thing applies to "read".

I suggest that it would be worthwhile to mention this issue in the
documentation.
The fact that it's a FAQ suggests that people are likely to run into it.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mapfile doesn't accept input from a pipe

Chet Ramey
On 6/29/17 12:38 PM, Keith Thompson wrote:

> I suggest that it would be worthwhile to mention this issue in the
> documentation.

"Each  command in a pipeline is executed as a separate process (i.e., in
a subshell).  See COMMAND EXECUTION ENVIRONMENT for a description of  a
subshell  environment.   If  the  lastpipe  option is enabled using the
shopt builtin (see the description of shopt below), the last element of
a pipeline may be run by the shell process."

or maybe

"Builtin commands that are invoked as part of a pipeline are also
executed in a subshell environment."

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    [hidden email]    http://cnswww.cns.cwru.edu/~chet/

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mapfile doesn't accept input from a pipe

tetsujin

I think that when programmers first learn shell programming, this is a
hard piece of information to effectively convey. The Bash
documentation provides the important facts:

- Subshells are quietly and automatically constructed for a variety of
shell programming constructs, including pipelines
- Code run in a subshell can't affect the parent shell's execution
environment

But what's important for a new shell programmer, and so easy to miss,
is the implication that comes from these facts: piping a command's
output into "read" seems like a perfectly reasonable thing to do if
you haven't wrapped your head around those two facts and their
implications. And if you do try "cmd | read x", it's not considered to
be any kind of error or anything, the side-effects of "read" are just
carried out and then quietly discarded.

So I look at this not just as a RTFM issue, it's a pitfall built-in to
the design of the language, and programmers need to understand a bit
about the implementation of the language to understand what's going
on. As such I think it may be worth spelling it out a bit more
directly in terms of the implications here. For instance, stick it in
the help for 'read' and 'mapfile':

"Care must be taken when using 'read' in a pipeline or another form of
subshell environment, as this may cause the data that's read to be
lost. See BASH SCRIPTING BASICS (section whatever) for more
information."

It's the same basic information, but it's better targeted: It lives in
the help for the command that's apparently "failing", it says your
data may be "lost", and it points at a relevant section of a newbie
FAQ that explains why.

I'd also wonder if it's maybe worth trying to detect cases of this
kind of thing and flagging them as errors...  Like if "read" or
"mapfile" wind up as the only command in a subshell, the command's
side-effects are going to be lost, so it's an error. Of course that
does nothing for the wide variety of other cases impacted by this
issue, but maybe it's still worth it...  If someone does "cmd | read
x", and "lastpipe" isn't on AND in effect, it's almost certainly a
mistake...  (Of course, there's ALWAYS the possibility that someone
relies on the current behavior - for instance to see if "read" would
succeed, or trigger a SIGPIPE in "cmd" after a certain amount of data
is read...)

The whole issue of sub-shells is kind of a mess IMO - there's all
these cases where one gets created automatically, and the (cmd) syntax
which exists specifically to run a command in a subshell, looks to the
uninitiated like a simple command-grouping syntax, because that's how
parentheses work in C and many other languages...  And if something
winds up in a subshell that shouldn't be, its side-effects on the
shell environment are simply lost without warning. Ideally, forking
the shell shouldn't be so baked-in to the language that people wind up
tripping over it.  Instead these cases that require parallelism
should use threading and synchronized access to a shared environment.
(So a pipeline could contain *multiple* built-ins or shell functions
with side-effects for the shell's environment) But I think that's a
difficult direction to pursue, unfortunately, and I'm guessing it's
not one that will happen in Bash...  (On the bright side it sounds as
though POSIX allows it...)
----- Original Message -----
From: [hidden email]
To:"Keith Thompson" <[hidden email]>,
"Eduardo_A._Bustamante_López" <[hidden email]>
Cc:<[hidden email]>, <[hidden email]>
Sent:Thu, 29 Jun 2017 14:07:32 -0400
Subject:Re: mapfile doesn't accept input from a pipe

 On 6/29/17 12:38 PM, Keith Thompson wrote:

 > I suggest that it would be worthwhile to mention this issue in the
 > documentation.

 "Each command in a pipeline is executed as a separate process (i.e.,
in
 a subshell). See COMMAND EXECUTION ENVIRONMENT for a description of a
 subshell environment. If the lastpipe option is enabled using the
 shopt builtin (see the description of shopt below), the last element
of
 a pipeline may be run by the shell process."

 or maybe

 "Builtin commands that are invoked as part of a pipeline are also
 executed in a subshell environment."

 --
 ``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
 Chet Ramey, UTech, CWRU [hidden email]
http://cnswww.cns.cwru.edu/~chet/


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mapfile doesn't accept input from a pipe

Greg Wooledge
On Thu, Jun 29, 2017 at 03:22:24PM -0400, [hidden email] wrote:
> So I look at this not just as a RTFM issue, it's a pitfall built-in to
> the design of the language, and programmers need to understand a bit
> about the implementation of the language to understand what's going
> on. As such I think it may be worth spelling it out a bit more
> directly in terms of the implications here. For instance, stick it in
> the help for 'read' and 'mapfile':

If you include helpful text about every pitfall in every builtin, the
bash documentation will become three times its current size.

Hell, it's probably less work to write out what things you *can* safely
do, rather than what you can't.  There aren't very many!

But I don't think either of those things belongs in the reference manual.
You can't learn proper shell programming from a reference.  It's just
too big, too convoluted, too full of historical constructs.  You need
a more focused document.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: mapfile doesn't accept input from a pipe

tetsujin

It's a fair point but I think there may be a reasonable middle-ground,
in which common pitfalls are briefly addressed in TFM, but the manual
doesn't become bogged down with exhaustive detail of every possible
pitfall.

After all, information overload would just become another thing
preventing readers from absorbing the information.

----- Original Message -----
From: "Greg Wooledge" <[hidden email]>
To:<[hidden email]>
Cc:
Sent:Thu, 29 Jun 2017 15:39:20 -0400
Subject:Re: mapfile doesn't accept input from a pipe

 On Thu, Jun 29, 2017 at 03:22:24PM -0400, [hidden email]
wrote:
 > So I look at this not just as a RTFM issue, it's a pitfall built-in
to
 > the design of the language, and programmers need to understand a
bit
 > about the implementation of the language to understand what's going
 > on. As such I think it may be worth spelling it out a bit more
 > directly in terms of the implications here. For instance, stick it
in
 > the help for 'read' and 'mapfile':

 If you include helpful text about every pitfall in every builtin, the
 bash documentation will become three times its current size.

Loading...